Product Association
The product association module implements functionality for generating product association rules, a powerful technique in retail analytics and market basket analysis.
Product association rules are used to uncover relationships between different products that customers tend to purchase together. These rules provide valuable insights into consumer behavior and purchasing patterns, which can be leveraged by retail businesses in various ways:
Cross-selling and upselling: By identifying products frequently bought together, retailers can make targeted product recommendations to increase sales and average order value.
Store layout optimization: Understanding product associations helps in strategic product placement within stores, potentially increasing impulse purchases and overall sales.
Inventory management: Knowing which products are often bought together aids in maintaining appropriate stock levels and predicting demand.
Marketing and promotions: Association rules can guide the creation ofeffective bundle offers and promotional campaigns.
Customer segmentation: Patterns in product associations can reveal distinct customer segments with specific preferences.
New product development: Insights from association rules can inform decisions about new product lines or features.
The module uses metrics such as support, confidence, and uplift to quantifythe strength and significance of product associations:
- Support: The frequency of items appearing together in transactions.
- Confidence: The likelihood of buying one product given the purchase of another.
- Uplift: The increase in purchase probability of one product when another is bought.
Setup¶
We'll start by loading some simulated data
import pandas as pd
df = pd.read_parquet("../../data/transactions.parquet")
df.head()
transaction_id | transaction_date | transaction_time | customer_id | product_id | product_name | category_0_name | category_0_id | category_1_name | category_1_id | brand_name | brand_id | unit_quantity | unit_cost | unit_spend | store_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16050 | 2023-01-12 | 17:44:29 | 1 | 15 | Spawn Figure | Toys | 1 | Action Figures | 1 | McFarlane Toys | 3 | 2 | 36.10 | 55.98 | 6 |
1 | 16050 | 2023-01-12 | 17:44:29 | 1 | 1317 | Gone Girl | Books | 8 | Mystery & Thrillers | 53 | Alfred A. Knopf | 264 | 1 | 6.98 | 10.49 | 6 |
2 | 20090 | 2023-02-05 | 09:31:42 | 1 | 509 | Ryzen 3 3300X | Electronics | 3 | Computer Components | 21 | AMD | 102 | 3 | 200.61 | 360.00 | 4 |
3 | 20090 | 2023-02-05 | 09:31:42 | 1 | 735 | Linden Wood Paneled Mirror | Home | 5 | Home Decor | 30 | Pottery Barn | 147 | 1 | 379.83 | 599.00 | 4 |
4 | 20090 | 2023-02-05 | 09:31:42 | 1 | 1107 | Pro-V Daily Moisture Renewal Conditioner | Beauty | 7 | Hair Care | 45 | Pantene | 222 | 1 | 3.32 | 4.99 | 4 |
print(f"Number of unique customers: {df['customer_id'].nunique()}")
print(f"Number of unique transactions: {df['transaction_id'].nunique()}")
Number of unique customers: 4250 Number of unique transactions: 25490
Here we'll see simple example to generate the production association rules.
from pyretailscience.analysis.product_association import ProductAssociation
pa = ProductAssociation(
df,
value_col="product_name",
group_col="transaction_id",
)
pa.df.head()
product_name_1 | product_name_2 | occurrences_1 | occurrences_2 | cooccurrences | support | confidence | uplift | |
---|---|---|---|---|---|---|---|---|
0 | 100 Animals Book | 100% Organic Cold-Pressed Rose Hip Seed Oil | 78 | 78 | 1 | 0.000039 | 0.012821 | 4.189678 |
1 | 100 Animals Book | 20K Sousaphone | 78 | 81 | 3 | 0.000118 | 0.038462 | 12.103514 |
2 | 100 Animals Book | 360 Sport 2.0 Boxer Briefs | 78 | 79 | 1 | 0.000039 | 0.012821 | 4.136644 |
3 | 100 Animals Book | 4-Series 4K UHD | 78 | 82 | 1 | 0.000039 | 0.012821 | 3.985303 |
4 | 100 Animals Book | 700S Eterna Trumpet | 78 | 71 | 1 | 0.000039 | 0.012821 | 4.602745 |
You can also limit the returned items to those that include a specific item.
pa_specific_item = ProductAssociation(
df,
value_col="product_name",
group_col="transaction_id",
target_item="4-Series 4K UHD",
)
pa_specific_item.df.head()
product_name_1 | product_name_2 | occurrences_1 | occurrences_2 | cooccurrences | support | confidence | uplift | |
---|---|---|---|---|---|---|---|---|
0 | 4-Series 4K UHD | 100 Animals Book | 82 | 78 | 1 | 0.000039 | 0.012195 | 3.985303 |
1 | 4-Series 4K UHD | 122HD45 Gas Hedge Trimmer | 82 | 92 | 1 | 0.000039 | 0.012195 | 3.378844 |
2 | 4-Series 4K UHD | 2-in-1 Touch & Learn Tablet | 82 | 81 | 2 | 0.000078 | 0.024390 | 7.675399 |
3 | 4-Series 4K UHD | 20K Sousaphone | 82 | 81 | 1 | 0.000039 | 0.012195 | 3.837699 |
4 | 4-Series 4K UHD | 3 Minute Miracle Deep Conditioner | 82 | 70 | 1 | 0.000039 | 0.012195 | 4.440767 |
You can filter the returned results by,
- Mininum occurrences of an item
- Mininum cooccurrences of pair of items
- Mininum support of a pair of items
- Mininum confidence of a pair of items
- Mininum uplift of a pair of items
pa_min_uplift = ProductAssociation(
df,
value_col="product_name",
group_col="transaction_id",
min_uplift=5,
)
pa_min_uplift.df.head()
product_name_1 | product_name_2 | occurrences_1 | occurrences_2 | cooccurrences | support | confidence | uplift | |
---|---|---|---|---|---|---|---|---|
0 | 100 Animals Book | 20K Sousaphone | 78 | 81 | 3 | 0.000118 | 0.038462 | 12.103514 |
1 | 100 Animals Book | Activia Probiotic Yogurt | 78 | 57 | 2 | 0.000078 | 0.025641 | 11.466487 |
2 | 100 Animals Book | Aether AG 70 Pack | 78 | 72 | 2 | 0.000078 | 0.025641 | 9.077635 |
3 | 100 Animals Book | All Natural Plain Yogurt | 78 | 62 | 1 | 0.000039 | 0.012821 | 5.270885 |
4 | 100 Animals Book | American Ultra Jazz Bass | 78 | 59 | 2 | 0.000078 | 0.025641 | 11.077792 |