Product Associations
Product Association Rules Generation.
This module implements functionality for generating product association rules, a powerful technique in retail analytics and market basket analysis.
Product association rules are used to uncover relationships between different products that customers tend to purchase together. These rules provide valuable insights into consumer behavior and purchasing patterns, which can be leveraged by retail businesses in various ways:
-
Cross-selling and upselling: By identifying products frequently bought together, retailers can make targeted product recommendations to increase sales and average order value.
-
Store layout optimization: Understanding product associations helps in strategic product placement within stores, potentially increasing impulse purchases and overall sales.
-
Inventory management: Knowing which products are often bought together aids in maintaining appropriate stock levels and predicting demand.
-
Marketing and promotions: Association rules can guide the creation ofeffective bundle offers and promotional campaigns.
-
Customer segmentation: Patterns in product associations can reveal distinct customer segments with specific preferences.
-
New product development: Insights from association rules can inform decisions about new product lines or features.
The module uses metrics such as support, confidence, and uplift to quantifythe strength and significance of product associations:
- Support: The frequency of items appearing together in transactions.
- Confidence: The likelihood of buying one product given the purchase of another.
- Uplift: The increase in purchase probability of one product when another is bought.
By leveraging these association rules, retailers can make data-driven decisions to enhance customer experience, optimize operations, and drive business growth.
ProductAssociation
A class for generating and analyzing product association rules.
This class calculates association rules between products based on transaction data, helping to identify patterns in customer purchasing behavior.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The input DataFrame containing transaction data. |
required |
value_col |
str
|
The name of the column in the input DataFrame that contains the product identifiers. |
required |
group_col |
str
|
The name of the column that identifies unique transactions or customers. Defaults to option column.column_id. |
get_option('column.customer_id')
|
target_item |
str or None
|
A specific product to focus the association analysis on. If None, associations for all products are calculated. Defaults to None. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
A DataFrame containing the calculated association rules and their metrics. |
Example
import pandas as pd transaction_df = pd.DataFrame({ ... 'customer_id': [1, 1, 2, 2, 3], ... 'product_id': ['A', 'B', 'B', 'C', 'A'] ... }) pa = ProductAssociation(df=transaction_df, value_col='product_id', group_col='customer_id') print(pa.df) # View the calculated association rules
Note
The resulting DataFrame (pa.df) contains the following columns: - product_1, product_2: The pair of products for which the association is calculated. - occurrences_1, occurrences_2: The number of transactions containing each product. - cooccurrences: The number of transactions containing both products. - support: The proportion of transactions containing both products. - confidence: The probability of buying product_2 given that product_1 was bought. - uplift: The ratio of the observed support to the expected support if the products were independent.
Source code in pyretailscience/analysis/product_association.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 |
|
df: pd.DataFrame
property
Returns the executed DataFrame.
__init__(df, value_col, group_col=get_option('column.customer_id'), target_item=None, min_occurrences=1, min_cooccurrences=1, min_support=0.0, min_confidence=0.0, min_uplift=0.0)
Initialize the ProductAssociation object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame | ibis.Table)
|
The input DataFrame or ibis Table containing transaction data. |
required |
value_col |
str
|
The name of the column in the input DataFrame that contains the product identifiers. |
required |
group_col |
str
|
The name of the column that identifies unique transactions or customers. Defaults to option column.unit_spend. |
get_option('column.customer_id')
|
target_item |
str or None
|
A specific product to focus the association analysis on. If None, associations for all products are calculated. Defaults to None. |
None
|
min_occurrences |
int
|
The minimum number of occurrences required for each product in the association analysis. Defaults to 1. Must be at least 1. |
1
|
min_cooccurrences |
int
|
The minimum number of co-occurrences required for the product pairs in the association analysis. Defaults to 1. Must be at least 1. |
1
|
min_support |
float
|
The minimum support value required for the association rules. Defaults to 0.0. Must be between 0 and 1. |
0.0
|
min_confidence |
float
|
The minimum confidence value required for the association rules. Defaults to 0.0. Must be between 0 and 1. |
0.0
|
min_uplift |
float
|
The minimum uplift value required for the association rules. Defaults to 0.0. Must be greater or equal to 0. |
0.0
|
Raises:
Type | Description |
---|---|
ValueError
|
If the number of combinations is not 2 or 3, or if any of the minimum values are invalid. |
ValueError
|
If the minimum support, confidence, or uplift values are outside the valid range. |
ValueError
|
If the minimum occurrences or cooccurrences are less than 1. |
ValueError
|
If the input DataFrame does not contain the required columns or if they have null values. |