Product Association

The product association module implements functionality for generating product association rules, a powerful technique in retail analytics and market basket analysis.

Product association rules are used to uncover relationships between different products that customers tend to purchase together. These rules provide valuable insights into consumer behavior and purchasing patterns, which can be leveraged by retail businesses in various ways:

Cross-selling and upselling: By identifying products frequently bought together, retailers can make targeted product recommendations to increase sales and average order value.
Store layout optimization: Understanding product associations helps in strategic product placement within stores, potentially increasing impulse purchases and overall sales.
Inventory management: Knowing which products are often bought together aids in maintaining appropriate stock levels and predicting demand.
Marketing and promotions: Association rules can guide the creation ofeffective bundle offers and promotional campaigns.
Customer segmentation: Patterns in product associations can reveal distinct customer segments with specific preferences.
New product development: Insights from association rules can inform decisions about new product lines or features.

The module uses metrics such as support, confidence, and uplift to quantifythe strength and significance of product associations:

Support: The frequency of items appearing together in transactions.
Confidence: The likelihood of buying one product given the purchase of another.
Uplift: The increase in purchase probability of one product when another is bought.

Setup¶

We'll start by loading some simulated data

In [ ]:

Copied!

import pandas as pd

df = pd.read_parquet("../../data/transactions.parquet")
df.head()
import pandas as pd

df = pd.read_parquet("../../data/transactions.parquet")
df.head()

Out[ ]:

	transaction_id	transaction_date	transaction_time	customer_id	product_id	product_name	category_0_name	category_0_id	category_1_name	category_1_id	brand_name	brand_id	unit_quantity	unit_cost	unit_spend	store_id
0	16050	2023-01-12	17:44:29	1	15	Spawn Figure	Toys	1	Action Figures	1	McFarlane Toys	3	2	36.10	55.98	6
1	16050	2023-01-12	17:44:29	1	1317	Gone Girl	Books	8	Mystery & Thrillers	53	Alfred A. Knopf	264	1	6.98	10.49	6
2	20090	2023-02-05	09:31:42	1	509	Ryzen 3 3300X	Electronics	3	Computer Components	21	AMD	102	3	200.61	360.00	4
3	20090	2023-02-05	09:31:42	1	735	Linden Wood Paneled Mirror	Home	5	Home Decor	30	Pottery Barn	147	1	379.83	599.00	4
4	20090	2023-02-05	09:31:42	1	1107	Pro-V Daily Moisture Renewal Conditioner	Beauty	7	Hair Care	45	Pantene	222	1	3.32	4.99	4

In [ ]:

Copied!

print(f"Number of unique customers: {df['customer_id'].nunique()}")
print(f"Number of unique transactions: {df['transaction_id'].nunique()}")
print(f"Number of unique customers: {df['customer_id'].nunique()}")
print(f"Number of unique transactions: {df['transaction_id'].nunique()}")

Number of unique customers: 4250
Number of unique transactions: 25490

Here we'll see simple example to generate the production association rules.

In [ ]:

Copied!





from pyretailscience.analysis.product_association import ProductAssociation

pa = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
)
pa.df.head()
from pyretailscience.analysis.product_association import ProductAssociation

pa = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
)
pa.df.head()

Out[ ]:

	product_name_1	product_name_2	occurrences_1	occurrences_2	cooccurrences	support	confidence	uplift
0	100 Animals Book	100% Organic Cold-Pressed Rose Hip Seed Oil	78	78	1	0.000039	0.012821	4.189678
1	100 Animals Book	20K Sousaphone	78	81	3	0.000118	0.038462	12.103514
2	100 Animals Book	360 Sport 2.0 Boxer Briefs	78	79	1	0.000039	0.012821	4.136644
3	100 Animals Book	4-Series 4K UHD	78	82	1	0.000039	0.012821	3.985303
4	100 Animals Book	700S Eterna Trumpet	78	71	1	0.000039	0.012821	4.602745

You can also limit the returned items to those that include a specific item.

In [ ]:

Copied!





pa_specific_item = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
    target_item="4-Series 4K UHD",
)
pa_specific_item.df.head()
pa_specific_item = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
    target_item="4-Series 4K UHD",
)
pa_specific_item.df.head()

Out[ ]:

	product_name_1	product_name_2	occurrences_1	occurrences_2	cooccurrences	support	confidence	uplift
0	4-Series 4K UHD	100 Animals Book	82	78	1	0.000039	0.012195	3.985303
1	4-Series 4K UHD	122HD45 Gas Hedge Trimmer	82	92	1	0.000039	0.012195	3.378844
2	4-Series 4K UHD	2-in-1 Touch & Learn Tablet	82	81	2	0.000078	0.024390	7.675399
3	4-Series 4K UHD	20K Sousaphone	82	81	1	0.000039	0.012195	3.837699
4	4-Series 4K UHD	3 Minute Miracle Deep Conditioner	82	70	1	0.000039	0.012195	4.440767

You can filter the returned results by,

Mininum occurrences of an item
Mininum cooccurrences of pair of items
Mininum support of a pair of items
Mininum confidence of a pair of items
Mininum uplift of a pair of items

In [ ]:

Copied!





pa_min_uplift = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
    min_uplift=5,
)
pa_min_uplift.df.head()
pa_min_uplift = ProductAssociation(
    df,
    value_col="product_name",
    group_col="transaction_id",
    min_uplift=5,
)
pa_min_uplift.df.head()

Out[ ]:

	product_name_1	product_name_2	occurrences_1	occurrences_2	cooccurrences	support	confidence	uplift
0	100 Animals Book	20K Sousaphone	78	81	3	0.000118	0.038462	12.103514
1	100 Animals Book	Activia Probiotic Yogurt	78	57	2	0.000078	0.025641	11.466487
2	100 Animals Book	Aether AG 70 Pack	78	72	2	0.000078	0.025641	9.077635
3	100 Animals Book	All Natural Plain Yogurt	78	62	1	0.000039	0.012821	5.270885
4	100 Animals Book	American Ultra Jazz Bass	78	59	2	0.000078	0.025641	11.077792