HML Segmentation
This module provides the HMLSegmentation
class for categorizing customers into spend-based segments.
HMLSegmentation extends ThresholdSegmentation
and classifies customers into Heavy, Medium, Light,
and optionally Zero spenders based on the Pareto principle (80/20 rule). It is commonly used in retail
to analyze customer spending behavior and optimize marketing strategies.
HMLSegmentation
Bases: ThresholdSegmentation
Segments customers into Heavy, Medium, Light and Zero spenders based on the total spend.
Source code in pyretailscience/segmentation/hml.py
__init__(df, value_col=None, agg_func='sum', zero_value_customers='separate_segment')
Segments customers into Heavy, Medium, Light and Zero spenders based on the total spend.
HMLSegmentation is a subclass of ThresholdSegmentation and based around an industry standard definition. The thresholds for Heavy (top 20%), Medium (next 30%) and Light (bottom 50%) are chosen based on the pareto distribution, commonly know as the 80/20 rule. It is typically used in retail to segment customers based on their spend, transaction volume or quantities purchased.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
A dataframe with the transaction data. The dataframe must contain a customer_id column. |
required |
value_col |
str
|
The column to use for the segmentation. Defaults to get_option("column.unit_spend"). |
None
|
agg_func |
str
|
The aggregation function to use when grouping by customer_id. Defaults to "sum". |
'sum'
|
zero_value_customers |
Literal['separate_segment', 'exclude', 'include_with_light']
|
How to handle customers with zero spend. Defaults to "separate_segment". |
'separate_segment'
|