Skip to content

Cross Shop Analysis

Cross-Shopping Analysis Module for Category and Brand Interaction Insights.

Business Context

Cross-shopping analysis reveals how customers navigate between different product categories, brands, or store locations. This intelligence drives critical retail decisions from store layout and category adjacencies to promotional bundling and targeted marketing campaigns.

Real-World Problem

Retailers often make incorrect assumptions about customer behavior. They might place baby products far from beer, not realizing these categories have high cross-shopping rates (the famous "diapers and beer" phenomenon). Cross-shop analysis replaces assumptions with data-driven insights about actual customer purchase patterns.

Business Applications

  1. Store Layout Optimization
  2. Place frequently cross-shopped categories near each other
  3. Create logical customer journey paths through the store
  4. Reduce friction in the shopping experience

  5. Promotional Strategy

  6. Bundle products from highly cross-shopped categories
  7. Time promotions to capture cross-category purchases
  8. Design "buy from A, get discount on B" offers

  9. Category Management

  10. Understand category interdependencies
  11. Identify opportunity categories for existing shoppers
  12. Spot categories at risk when others decline

  13. Multi-Channel Strategy

  14. Analyze cross-shopping between online and physical stores
  15. Optimize channel-specific assortments
  16. Design omnichannel customer journeys

  17. Competitive Analysis

  18. Understand customer loyalty across competing brands
  19. Identify vulnerable competitor segments
  20. Design conquest strategies for shared customers

Business Value

  • Increased Basket Size: Strategic placement increases impulse purchases
  • Customer Retention: Better store experience reduces defection
  • Marketing Efficiency: Target promotions to actual behavior patterns
  • Strategic Insights: Understand true category relationships
  • Competitive Advantage: Leverage unique customer behavior insights

Visualization Output

The module generates Venn diagrams showing: - Exclusive shoppers for each category/brand - Overlap segments with cross-shopping behavior - Relative size of each segment by customer count or spend - Percentage breakdowns for strategic planning

CrossShop

Analyzes customer cross-shopping behavior between categories, brands, or locations.

The CrossShop class reveals hidden relationships in customer purchasing patterns, enabling retailers to optimize everything from store layouts to marketing campaigns. By understanding which products customers buy together across shopping trips, retailers can make smarter decisions about product placement, promotions, and assortment.

Business Problem Solved

Many retail decisions assume customer behavior that may not reflect reality. For example: - Are organic shoppers also buying conventional products? - Do online grocery shoppers still visit physical stores? - Which private label categories attract national brand buyers?

This analysis provides definitive answers with actionable percentages.

Source code in pyretailscience/analysis/cross_shop.py
class CrossShop:
    """Analyzes customer cross-shopping behavior between categories, brands, or locations.

    The CrossShop class reveals hidden relationships in customer purchasing patterns,
    enabling retailers to optimize everything from store layouts to marketing campaigns.
    By understanding which products customers buy together across shopping trips, retailers
    can make smarter decisions about product placement, promotions, and assortment.

    ## Business Problem Solved

    Many retail decisions assume customer behavior that may not reflect reality. For example:
    - Are organic shoppers also buying conventional products?
    - Do online grocery shoppers still visit physical stores?
    - Which private label categories attract national brand buyers?

    This analysis provides definitive answers with actionable percentages.
    """

    @staticmethod
    def _generate_default_labels(count: int) -> list[str]:
        """Generate default alphabetical labels for groups.

        Args:
            count (int): Number of labels to generate.

        Returns:
            list[str]: List of alphabetical labels (A, B, C, etc.).
        """
        return [chr(65 + i) for i in range(count)]

    def __init__(
        self,
        df: pd.DataFrame | ibis.Table,
        group_1_col: str,
        group_1_val: str,
        group_2_val: str,
        group_2_col: str | None = None,
        group_3_col: str | None = None,
        group_3_val: str | None = None,
        labels: list[str] | None = None,
        group_col: str | None = None,
        value_col: str | None = None,
        agg_func: str = "sum",
    ) -> None:
        """Initialize cross-shopping analysis between retail categories, brands, or locations.

        Args:
            df (pd.DataFrame | ibis.Table): Transaction data with customer purchases.
            group_1_col (str): Column identifying first segment (e.g., "category", "brand", "channel").
            group_1_val (str): Value to analyze for first segment (e.g., "organic", "Brand_A", "online").
            group_2_val (str): Value to analyze for second segment.
            group_2_col (str, optional): Column identifying second segment. Defaults to group_1_col.
            group_3_col (str, optional): Column for three-way analysis. Defaults to group_1_col when group_3_val provided.
            group_3_val (str, optional): Value for third segment. Defaults to None.
            labels (list[str], optional): Custom labels for diagram (e.g., ["Organic", "Local"]).
                Defaults to alphabetical labels [A, B, C].
            group_col (str, optional): Grouping column (e.g., customer_id, store_id, segment_name). Defaults to customer_id from options.
            value_col (str, optional): Metric to analyze (sales, units, visits).
                Defaults to spend column from options.
            agg_func (str, optional): How to combine customer values ("sum", "mean", "count").
                Defaults to "sum" for total opportunity sizing.

        Returns:
            None

        Raises:
            ValueError: If required columns missing or label count doesn't match groups.

        Business Examples:
            >>> # Analyze organic vs conventional shoppers
            >>> cross_shop = CrossShop(
            ...     df=transactions,
            ...     group_1_col="product_type",
            ...     group_1_val="organic",
            ...     group_2_val="conventional",
            ...     labels=["Organic", "Conventional"]
            ... )
            ...
            >>> # Three-way analysis
            >>> cross_shop = CrossShop(
            ...     df=transactions,
            ...     group_1_col="channel",
            ...     group_1_val="online",
            ...     group_2_val="store",
            ...     group_3_val="mobile",
            ...     labels=["Online", "Store", "Mobile"]
            ... )
            ...
            >>> # Custom customer column
            >>> cross_shop = CrossShop(
            ...     df=transactions,
            ...     group_1_col="brand",
            ...     group_1_val="Nike",
            ...     group_2_val="Adidas",
            ...     group_col="user_id"
            ... )
        """
        # Apply smart defaults for simplified interface
        group_col = group_col or get_option("column.customer_id")
        value_col = value_col or get_option("column.unit_spend")

        # Default group_2_col and group_3_col to group_1_col when columns are not provided
        if group_2_col is None:
            group_2_col = group_1_col
        if group_3_val is not None and group_3_col is None:
            group_3_col = group_1_col

        required_cols = [group_col, value_col]
        missing_cols = set(required_cols) - set(df.columns)
        if len(missing_cols) > 0:
            msg = f"The following columns are required but missing: {missing_cols}"
            raise ValueError(msg)

        self.group_count = 2 if group_3_col is None else 3

        if (labels is not None) and (len(labels) != self.group_count):
            raise ValueError("The number of labels must be equal to the number of group indexes given")

        self.labels = labels if labels is not None else self._generate_default_labels(self.group_count)

        self.cross_shop_df = self._calc_cross_shop(
            df=df,
            group_1_col=group_1_col,
            group_1_val=group_1_val,
            group_2_col=group_2_col,
            group_2_val=group_2_val,
            group_3_col=group_3_col,
            group_3_val=group_3_val,
            group_col=group_col,
            value_col=value_col,
            agg_func=agg_func,
            labels=labels,
        )
        self.cross_shop_table_df = self._calc_cross_shop_table(
            df=self.cross_shop_df,
            value_col=value_col,
        )

    @staticmethod
    def _calc_cross_shop(
        df: pd.DataFrame | ibis.Table,
        group_1_col: str,
        group_1_val: str,
        group_2_col: str,
        group_2_val: str,
        group_3_col: str | None = None,
        group_3_val: str | None = None,
        group_col: str | None = None,
        value_col: str | None = None,
        agg_func: str = "sum",
        labels: list[str] | None = None,
    ) -> pd.DataFrame:
        """Calculate the cross-shop dataframe that will be used to plot the diagram.

        Args:
            df (pd.DataFrame | ibis.Table):  The input DataFrame or ibis Table containing transactional data.
            group_1_col (str): Column name for the first group.
            group_1_val (str): Value to filter for the first group.
            group_2_col (str): Column name for the second group.
            group_2_val (str): Value to filter for the second group.
            group_3_col (str, optional): Column name for the third group. Defaults to None.
            group_3_val (str, optional): Value to filter for the third group. Defaults to None.
            group_col (str, optional): Grouping column (e.g., customer_id, store_id, segment_name). Defaults to customer_id from options.
            value_col (str, optional): The column to aggregate. Defaults to option column.unit_spend.
            agg_func (str, optional): The aggregation function. Defaults to "sum".
            labels (list[str], optional): The labels for the groups. Defaults to None.

        Returns:
            pd.DataFrame: The cross-shop dataframe.

        Raises:
            ValueError: If group_3_col or group_3_val is populated, then the other must be as well.
        """
        if isinstance(df, pd.DataFrame):
            df: ibis.Table = ibis.memtable(df)
        if (group_3_col is None) != (group_3_val is None):
            raise ValueError("If group_3_col or group_3_val is populated, then the other must be as well")

        # Apply defaults for group_col and value_col
        group_col = group_col or get_option("column.customer_id")
        value_col = value_col or get_option("column.unit_spend")

        # Using a temporary value column to avoid duplicate column errors during selection. This happens when `value_col` has the same name as `group_col`, causing conflicts in `.select()`.
        temp_value_col = "temp_value_col"
        df = df.mutate(**{temp_value_col: df[value_col]})

        group_1 = (df[group_1_col] == group_1_val).cast("int32").name("group_1")
        group_2 = (df[group_2_col] == group_2_val).cast("int32").name("group_2")
        group_3 = (df[group_3_col] == group_3_val).cast("int32").name("group_3") if group_3_col else None

        group_cols = ["group_1", "group_2"]
        select_cols = [df[group_col], group_1, group_2]
        if group_3 is not None:
            group_cols.append("group_3")
            select_cols.append(group_3)

        cs_df = df.select([*select_cols, df[temp_value_col]]).order_by(group_col)
        cs_df = (
            cs_df.group_by(group_col)
            .aggregate(
                **{col: cs_df[col].max().name(col) for col in group_cols},
                **{temp_value_col: getattr(cs_df[temp_value_col], agg_func)().name(temp_value_col)},
            )
            .order_by(group_col)
        ).execute()

        cs_df["groups"] = cs_df[group_cols].apply(lambda x: tuple(x), axis=1)

        # Use default alphabetical labels if none provided
        if labels is None:
            labels = CrossShop._generate_default_labels(len(group_cols))

        group_label_series = cs_df[group_cols].apply(
            lambda x: [labels[i] for i, grp_val in enumerate(x) if grp_val == 1],
            axis=1,
        )
        cs_df["group_labels"] = group_label_series.map(lambda x: "No Groups" if len(x) == 0 else ", ".join(x))

        column_order = [group_col, *group_cols, "groups", "group_labels", temp_value_col]
        cs_df = cs_df[column_order]
        cs_df.set_index(group_col, inplace=True)
        return cs_df.rename(columns={temp_value_col: value_col})

    @staticmethod
    def _calc_cross_shop_table(
        df: pd.DataFrame,
        value_col: str = get_option("column.unit_spend"),
    ) -> pd.DataFrame:
        """Calculate the aggregated cross-shop table that will be used to plot the diagram.

        Args:
            df (pd.DataFrame): The cross-shop dataframe.
            value_col (str, optional): The column to aggregate. Defaults to option column.unit_spend.

        Returns:
            pd.DataFrame: The cross-shop table.
        """
        df = df.groupby(["groups", "group_labels"], dropna=False)[value_col].sum().reset_index().copy()
        df["percent"] = df[value_col] / df[value_col].sum()
        return df

    def plot(
        self,
        title: str | None = None,
        source_text: str | None = None,
        vary_size: bool = False,
        figsize: tuple[int, int] | None = None,
        ax: Axes | None = None,
        subset_label_formatter: Callable | None = None,
        **kwargs: dict[str, any],
    ) -> SubplotBase:
        """Generate Venn diagram showing customer segment overlaps.

        Args:
            title (str, optional): Chart title (e.g., "Cross-Shopping: Organic vs Conventional").
            source_text (str, optional): Data source attribution. Defaults to None.
            vary_size (bool, optional): Scale circles by segment value for visual impact.
                True = larger segments appear bigger. Defaults to False.
            figsize (tuple[int, int], optional): Plot dimensions. Defaults to None.
            ax (Axes, optional): Existing axes for subplot integration. Defaults to None.
            subset_label_formatter (callable, optional): Custom formatting for percentages.
                Default shows one decimal place (e.g., "34.5%").
            **kwargs (dict[str, any]): Additional diagram customization options.

        Returns:
            SubplotBase: Matplotlib axes containing the cross-shop visualization.
        """
        return venn.plot(
            df=self.cross_shop_table_df,
            labels=self.labels,
            title=title,
            source_text=source_text,
            vary_size=vary_size,
            figsize=figsize,
            ax=ax,
            subset_label_formatter=subset_label_formatter if subset_label_formatter else lambda x: f"{x:.1%}",
            **kwargs,
        )

__init__(df, group_1_col, group_1_val, group_2_val, group_2_col=None, group_3_col=None, group_3_val=None, labels=None, group_col=None, value_col=None, agg_func='sum')

Initialize cross-shopping analysis between retail categories, brands, or locations.

Parameters:

Name Type Description Default
df DataFrame | Table

Transaction data with customer purchases.

required
group_1_col str

Column identifying first segment (e.g., "category", "brand", "channel").

required
group_1_val str

Value to analyze for first segment (e.g., "organic", "Brand_A", "online").

required
group_2_val str

Value to analyze for second segment.

required
group_2_col str

Column identifying second segment. Defaults to group_1_col.

None
group_3_col str

Column for three-way analysis. Defaults to group_1_col when group_3_val provided.

None
group_3_val str

Value for third segment. Defaults to None.

None
labels list[str]

Custom labels for diagram (e.g., ["Organic", "Local"]). Defaults to alphabetical labels [A, B, C].

None
group_col str

Grouping column (e.g., customer_id, store_id, segment_name). Defaults to customer_id from options.

None
value_col str

Metric to analyze (sales, units, visits). Defaults to spend column from options.

None
agg_func str

How to combine customer values ("sum", "mean", "count"). Defaults to "sum" for total opportunity sizing.

'sum'

Returns:

Type Description
None

None

Raises:

Type Description
ValueError

If required columns missing or label count doesn't match groups.

Business Examples

Analyze organic vs conventional shoppers

cross_shop = CrossShop( ... df=transactions, ... group_1_col="product_type", ... group_1_val="organic", ... group_2_val="conventional", ... labels=["Organic", "Conventional"] ... ) ...

Three-way analysis

cross_shop = CrossShop( ... df=transactions, ... group_1_col="channel", ... group_1_val="online", ... group_2_val="store", ... group_3_val="mobile", ... labels=["Online", "Store", "Mobile"] ... ) ...

Custom customer column

cross_shop = CrossShop( ... df=transactions, ... group_1_col="brand", ... group_1_val="Nike", ... group_2_val="Adidas", ... group_col="user_id" ... )

Source code in pyretailscience/analysis/cross_shop.py
def __init__(
    self,
    df: pd.DataFrame | ibis.Table,
    group_1_col: str,
    group_1_val: str,
    group_2_val: str,
    group_2_col: str | None = None,
    group_3_col: str | None = None,
    group_3_val: str | None = None,
    labels: list[str] | None = None,
    group_col: str | None = None,
    value_col: str | None = None,
    agg_func: str = "sum",
) -> None:
    """Initialize cross-shopping analysis between retail categories, brands, or locations.

    Args:
        df (pd.DataFrame | ibis.Table): Transaction data with customer purchases.
        group_1_col (str): Column identifying first segment (e.g., "category", "brand", "channel").
        group_1_val (str): Value to analyze for first segment (e.g., "organic", "Brand_A", "online").
        group_2_val (str): Value to analyze for second segment.
        group_2_col (str, optional): Column identifying second segment. Defaults to group_1_col.
        group_3_col (str, optional): Column for three-way analysis. Defaults to group_1_col when group_3_val provided.
        group_3_val (str, optional): Value for third segment. Defaults to None.
        labels (list[str], optional): Custom labels for diagram (e.g., ["Organic", "Local"]).
            Defaults to alphabetical labels [A, B, C].
        group_col (str, optional): Grouping column (e.g., customer_id, store_id, segment_name). Defaults to customer_id from options.
        value_col (str, optional): Metric to analyze (sales, units, visits).
            Defaults to spend column from options.
        agg_func (str, optional): How to combine customer values ("sum", "mean", "count").
            Defaults to "sum" for total opportunity sizing.

    Returns:
        None

    Raises:
        ValueError: If required columns missing or label count doesn't match groups.

    Business Examples:
        >>> # Analyze organic vs conventional shoppers
        >>> cross_shop = CrossShop(
        ...     df=transactions,
        ...     group_1_col="product_type",
        ...     group_1_val="organic",
        ...     group_2_val="conventional",
        ...     labels=["Organic", "Conventional"]
        ... )
        ...
        >>> # Three-way analysis
        >>> cross_shop = CrossShop(
        ...     df=transactions,
        ...     group_1_col="channel",
        ...     group_1_val="online",
        ...     group_2_val="store",
        ...     group_3_val="mobile",
        ...     labels=["Online", "Store", "Mobile"]
        ... )
        ...
        >>> # Custom customer column
        >>> cross_shop = CrossShop(
        ...     df=transactions,
        ...     group_1_col="brand",
        ...     group_1_val="Nike",
        ...     group_2_val="Adidas",
        ...     group_col="user_id"
        ... )
    """
    # Apply smart defaults for simplified interface
    group_col = group_col or get_option("column.customer_id")
    value_col = value_col or get_option("column.unit_spend")

    # Default group_2_col and group_3_col to group_1_col when columns are not provided
    if group_2_col is None:
        group_2_col = group_1_col
    if group_3_val is not None and group_3_col is None:
        group_3_col = group_1_col

    required_cols = [group_col, value_col]
    missing_cols = set(required_cols) - set(df.columns)
    if len(missing_cols) > 0:
        msg = f"The following columns are required but missing: {missing_cols}"
        raise ValueError(msg)

    self.group_count = 2 if group_3_col is None else 3

    if (labels is not None) and (len(labels) != self.group_count):
        raise ValueError("The number of labels must be equal to the number of group indexes given")

    self.labels = labels if labels is not None else self._generate_default_labels(self.group_count)

    self.cross_shop_df = self._calc_cross_shop(
        df=df,
        group_1_col=group_1_col,
        group_1_val=group_1_val,
        group_2_col=group_2_col,
        group_2_val=group_2_val,
        group_3_col=group_3_col,
        group_3_val=group_3_val,
        group_col=group_col,
        value_col=value_col,
        agg_func=agg_func,
        labels=labels,
    )
    self.cross_shop_table_df = self._calc_cross_shop_table(
        df=self.cross_shop_df,
        value_col=value_col,
    )

plot(title=None, source_text=None, vary_size=False, figsize=None, ax=None, subset_label_formatter=None, **kwargs)

Generate Venn diagram showing customer segment overlaps.

Parameters:

Name Type Description Default
title str

Chart title (e.g., "Cross-Shopping: Organic vs Conventional").

None
source_text str

Data source attribution. Defaults to None.

None
vary_size bool

Scale circles by segment value for visual impact. True = larger segments appear bigger. Defaults to False.

False
figsize tuple[int, int]

Plot dimensions. Defaults to None.

None
ax Axes

Existing axes for subplot integration. Defaults to None.

None
subset_label_formatter callable

Custom formatting for percentages. Default shows one decimal place (e.g., "34.5%").

None
**kwargs dict[str, any]

Additional diagram customization options.

{}

Returns:

Name Type Description
SubplotBase SubplotBase

Matplotlib axes containing the cross-shop visualization.

Source code in pyretailscience/analysis/cross_shop.py
def plot(
    self,
    title: str | None = None,
    source_text: str | None = None,
    vary_size: bool = False,
    figsize: tuple[int, int] | None = None,
    ax: Axes | None = None,
    subset_label_formatter: Callable | None = None,
    **kwargs: dict[str, any],
) -> SubplotBase:
    """Generate Venn diagram showing customer segment overlaps.

    Args:
        title (str, optional): Chart title (e.g., "Cross-Shopping: Organic vs Conventional").
        source_text (str, optional): Data source attribution. Defaults to None.
        vary_size (bool, optional): Scale circles by segment value for visual impact.
            True = larger segments appear bigger. Defaults to False.
        figsize (tuple[int, int], optional): Plot dimensions. Defaults to None.
        ax (Axes, optional): Existing axes for subplot integration. Defaults to None.
        subset_label_formatter (callable, optional): Custom formatting for percentages.
            Default shows one decimal place (e.g., "34.5%").
        **kwargs (dict[str, any]): Additional diagram customization options.

    Returns:
        SubplotBase: Matplotlib axes containing the cross-shop visualization.
    """
    return venn.plot(
        df=self.cross_shop_table_df,
        labels=self.labels,
        title=title,
        source_text=source_text,
        vary_size=vary_size,
        figsize=figsize,
        ax=ax,
        subset_label_formatter=subset_label_formatter if subset_label_formatter else lambda x: f"{x:.1%}",
        **kwargs,
    )