NLR (New-Lapsed-Repeating) Segmentation

New-Lapsed-Repeating (NLR) Segmentation for Customer Lifecycle Classification.

Business Context

Understanding customer lifecycle stages is fundamental to retail strategy. Customers naturally move between periods of activity and inactivity, and identifying these transitions enables targeted interventions at each stage.

The Business Problem

Retailers need to understand which customers are growing the base (new), which are loyal (repeating), and which have stopped purchasing (lapsed). Without this classification, marketing budgets are misallocated — spending acquisition dollars on existing customers or ignoring at-risk customers who are about to churn.

Segment Definitions

Given two time periods (P1 and P2), customers are classified based on where they have a positive aggregated value:

New: Positive value in P2 only — these customers were acquired in the later period
Repeating: Positive value in both P1 and P2 — these customers are retained
Lapsed: Positive value in P1 only — these customers have stopped purchasing

A customer must have a positive aggregated value (> 0) in a period to be considered as having "bought" in that period. Zero or negative values do not count. Customers with no positive value in either period are excluded from the results entirely, as they do not meet the criteria for any segment. When using non-spend aggregation functions like count or nunique, the threshold still applies but measures transaction presence rather than spend.

Real-World Applications

New Customers

Measure acquisition effectiveness period-over-period
Design onboarding journeys to convert first-time buyers to repeat customers
Track new customer quality (spend levels, category breadth)

Repeating Customers

Core loyal base driving consistent revenue
Cross-sell and upsell opportunities
Loyalty program engagement and reward optimization

Lapsed Customers

Win-back campaigns with targeted incentives
Churn root cause analysis (price sensitivity, assortment gaps)
Customer lifetime value recalculation and reactivation ROI modeling

`NLRSegmentation`

Segments customers into New, Repeating, and Lapsed based on presence across two periods.

NLRSegmentation compares customer purchasing activity across two defined time periods (P1 and P2) to classify each customer's lifecycle stage. A customer is considered active in a period only if their aggregated value is strictly positive (> 0). This enables retailers to measure acquisition, retention, and churn rates in a single view, and to size the revenue impact of each lifecycle segment.

The segmentation is commonly used in period-over-period reporting (e.g., year-over-year, quarter-over-quarter) to answer questions such as how many customers were retained, how many were lost, and how many are newly acquired. When combined with spend data, it reveals whether revenue growth is driven by new customer acquisition or by increasing spend from repeating customers.

Attributes:

Name	Type	Description
`table`	`Table`	The underlying ibis Table expression containing the segmentation results. Can be used for further ibis operations before materializing to a DataFrame.
`df`	`DataFrame`	The materialized segmentation results as a pandas DataFrame, indexed by customer_id (and group_col if specified). Cached after first access.

Example

import pandas as pd from pyretailscience.segmentation.nlr import NLRSegmentation df = pd.DataFrame({ ... "customer_id": [1, 2, 3, 1, 3, 4], ... "unit_spend": [50.0, 100.0, 80.0, 75.0, 60.0, 150.0], ... "year": [2023, 2023, 2023, 2024, 2024, 2024], ... }) seg = NLRSegmentation(df=df, period_col="year", p1_value=2023, p2_value=2024) seg.df[["segment_name"]] segment_name customer_id 1 Repeating 2 Lapsed 3 Repeating 4 New

Source code in pyretailscience/segmentation/nlr.py

class NLRSegmentation:
    """Segments customers into New, Repeating, and Lapsed based on presence across two periods.

    NLRSegmentation compares customer purchasing activity across two defined time periods (P1 and P2) to classify
    each customer's lifecycle stage. A customer is considered active in a period only if their aggregated value is
    strictly positive (> 0). This enables retailers to measure acquisition, retention, and churn rates in a single
    view, and to size the revenue impact of each lifecycle segment.

    The segmentation is commonly used in period-over-period reporting (e.g., year-over-year, quarter-over-quarter)
    to answer questions such as how many customers were retained, how many were lost, and how many are newly acquired.
    When combined with spend data, it reveals whether revenue growth is driven by new customer acquisition or by
    increasing spend from repeating customers.

    Attributes:
        table (ibis.Table): The underlying ibis Table expression containing the segmentation results.
            Can be used for further ibis operations before materializing to a DataFrame.
        df (pd.DataFrame): The materialized segmentation results as a pandas DataFrame, indexed by
            customer_id (and group_col if specified). Cached after first access.

    Example:
        >>> import pandas as pd
        >>> from pyretailscience.segmentation.nlr import NLRSegmentation
        >>> df = pd.DataFrame({
        ...     "customer_id": [1, 2, 3, 1, 3, 4],
        ...     "unit_spend": [50.0, 100.0, 80.0, 75.0, 60.0, 150.0],
        ...     "year": [2023, 2023, 2023, 2024, 2024, 2024],
        ... })
        >>> seg = NLRSegmentation(df=df, period_col="year", p1_value=2023, p2_value=2024)
        >>> seg.df[["segment_name"]]
                     segment_name
        customer_id
        1               Repeating
        2                  Lapsed
        3               Repeating
        4                     New
    """

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        period_col: str,
        p1_value: str | float | ibis.Scalar,
        p2_value: str | float | ibis.Scalar,
        value_col: str | None = None,
        agg_func: str = "sum",
        group_col: str | list[str] | None = None,
    ) -> None:
        """Segments customers into New, Repeating, and Lapsed based on positive aggregated value across two periods.

        A customer is considered to have "bought" in a period only if their aggregated value_col
        is strictly positive (> 0). Customers are then classified as:
        - New: positive value in P2 only
        - Repeating: positive value in both P1 and P2
        - Lapsed: positive value in P1 only

        Args:
            df (pd.DataFrame | ibis.Table): Transaction data. Must contain customer_id, period_col,
                and value_col columns.
            period_col (str): Column containing period identifiers.
            p1_value (str | float | ibis.Scalar): Value in period_col identifying period 1.
            p2_value (str | float | ibis.Scalar): Value in period_col identifying period 2.
            value_col (str | None): Column to aggregate for determining customer activity.
                Defaults to ColumnHelper().unit_spend.
            agg_func (str): Aggregation function to use when grouping by customer_id.
                Defaults to "sum".
            group_col (str | list[str] | None): Column(s) to group by when calculating segments.
                When specified, segments are calculated within each group independently.
                Defaults to None.

        Raises:
            ValueError: If required columns are missing from the DataFrame, or if agg_func is not
                a supported aggregation function.
        """
        self._df: pd.DataFrame | None = None

        cols = ColumnHelper()
        value_col = cols.unit_spend if value_col is None else value_col

        self._group_col: list[str] | None = [group_col] if isinstance(group_col, str) else group_col

        if isinstance(df, pd.DataFrame):
            df = ibis.memtable(df)

        # Validate required columns
        required_cols = [cols.customer_id, value_col, period_col]
        if self._group_col is not None:
            required_cols.extend(self._group_col)

        missing_cols = set(required_cols) - set(df.columns)
        if len(missing_cols) > 0:
            msg = f"The following columns are required but missing: {missing_cols}"
            raise ValueError(msg)

        p1_expr = p1_value if isinstance(p1_value, ibis.Expr) else ibis.literal(p1_value)
        p2_expr = p2_value if isinstance(p2_value, ibis.Expr) else ibis.literal(p2_value)
        if p1_expr.equals(p2_expr):
            msg = f"p1_value and p2_value must be different, got '{p1_value}' for both"
            raise ValueError(msg)

        if agg_func not in _VALID_AGG_FUNCS:
            msg = f"agg_func must be one of {sorted(_VALID_AGG_FUNCS)}, got '{agg_func}'"
            raise ValueError(msg)

        # Filter to only P1 and P2 rows
        df = df.filter((df[period_col] == p1_value) | (df[period_col] == p2_value))

        # Determine which periods each customer has positive spend in
        group_cols = [cols.customer_id]
        if self._group_col is not None:
            group_cols.extend(self._group_col)

        p1_col = f"{value_col}_p1"
        p2_col = f"{value_col}_p2"

        agg_method = getattr(df[value_col], agg_func)
        p1_agg = agg_method(where=df[period_col] == p1_value)
        p2_agg = agg_method(where=df[period_col] == p2_value)
        customer = df.group_by(*group_cols).aggregate(
            **{
                p1_col: p1_agg.fill_null(0),
                p2_col: p2_agg.fill_null(0),
            },
        )

        # Exclude customers with no positive value in either period
        customer = customer.filter((customer[p1_col] > 0) | (customer[p2_col] > 0))

        # Classify: both periods -> Repeating, P1 only -> Lapsed, P2 only -> New
        # Use of ifelse ensures compatibility with some ibis backends that do not support boolean expressions in cases
        # statements
        in_p1 = (customer[p1_col] > 0).ifelse(1, 0)
        in_p2 = (customer[p2_col] > 0).ifelse(1, 0)
        segment_expr = ibis.cases(
            ((in_p1 == 1) & (in_p2 == 1), SEGMENT_REPEATING),
            (in_p1 == 1, SEGMENT_LAPSED),
            (in_p2 == 1, SEGMENT_NEW),
        )

        self.table: ibis.Table = customer.mutate(segment_name=segment_expr).select(
            *group_cols,
            "segment_name",
            p1_col,
            p2_col,
        )

    @property
    def df(self) -> pd.DataFrame:
        """Returns the DataFrame with segment names, indexed by customer_id (and group_col if specified)."""
        if self._df is None:
            cols = ColumnHelper()
            index_cols = [cols.customer_id]
            if self._group_col is not None:
                index_cols.extend(self._group_col)
            self._df = self.table.execute().set_index(index_cols)
        return self._df

`df: pd.DataFrame` `property`

Returns the DataFrame with segment names, indexed by customer_id (and group_col if specified).

`init(df, period_col, p1_value, p2_value, value_col=None, agg_func='sum', group_col=None)`

Segments customers into New, Repeating, and Lapsed based on positive aggregated value across two periods.

A customer is considered to have "bought" in a period only if their aggregated value_col is strictly positive (> 0). Customers are then classified as: - New: positive value in P2 only - Repeating: positive value in both P1 and P2 - Lapsed: positive value in P1 only

Parameters:

Name	Type	Description	Default
`df`	`DataFrame \| Table`	Transaction data. Must contain customer_id, period_col, and value_col columns.	required
`period_col`	`str`	Column containing period identifiers.	required
`p1_value`	`str \| float \| Scalar`	Value in period_col identifying period 1.	required
`p2_value`	`str \| float \| Scalar`	Value in period_col identifying period 2.	required
`value_col`	`str \| None`	Column to aggregate for determining customer activity. Defaults to ColumnHelper().unit_spend.	`None`
`agg_func`	`str`	Aggregation function to use when grouping by customer_id. Defaults to "sum".	`'sum'`
`group_col`	`str \| list[str] \| None`	Column(s) to group by when calculating segments. When specified, segments are calculated within each group independently. Defaults to None.	`None`

Raises:

Type	Description
`ValueError`	If required columns are missing from the DataFrame, or if agg_func is not a supported aggregation function.