Broken Timeline Plot

This module provides functionality for creating broken timeline plots from pandas DataFrames.

A broken timeline plot visualizes data availability across categories over time, showing periods where data is available as horizontal bars, with gaps indicating missing data periods.

Features

Multiple Categories: Support for displaying multiple categories with different colors
Customizable Periods: Aggregate data by different time periods (daily, weekly)
Threshold Filtering: Filter out values below a specified threshold
Date Formatting: Uses matplotlib's ConciseDateFormatter for clean date axis labels

Use Cases

Data Quality Assessment: Visualize data availability gaps across categories/segments over time
Product Availability Analysis: Identify periods with stock outs by store/category
Seasonality Analysis: Assess to look for period of low sales that may indicate seasonality or other trends

`plot(df, category_col, value_col, title=None, x_label=None, y_label=None, ax=None, source_text=None, period='D', agg_func='sum', threshold_value=None, bar_height=0.8, figsize=None, **kwargs)`

Creates a broken timeline plot showing data availability across categories over time.

Shows periods where data is available as horizontal bars, with gaps indicating missing data periods.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The input DataFrame containing the data to be plotted.	required
`category_col`	`str`	The column containing categories to display on y-axis.	required
`value_col`	`str`	The column containing values to determine data availability.	required
`title`	`str`	The title of the plot. Defaults to None.	`None`
`x_label`	`str`	The label for the x-axis. Defaults to None.	`None`
`y_label`	`str`	The label for the y-axis. Defaults to None.	`None`
`ax`	`Axes`	The Matplotlib Axes object to plot on. Defaults to None.	`None`
`source_text`	`str`	Text to be displayed as a source at the bottom of the plot. Defaults to None.	`None`
`period`	`str`	Period for aggregating data using pandas to_period ("D", "W"). Defaults to "D".	`'D'`
`agg_func`	`str`	The aggregation function to apply to the value_col when grouping by period. Defaults to "sum".	`'sum'`
`threshold_value`	`float`	Values below this threshold are considered gaps. Defaults to None.	`None`
`bar_height`	`float`	Height of timeline bars as fraction of available space. Defaults to 0.8.	`0.8`
`figsize`	`tuple[int, int] \| None`	tuple[int, int] \| None = None,	`None`
`**kwargs`	`dict[str, Any]`	Additional keyword arguments for matplotlib broken_barh function.	`{}`

Returns:

Name	Type	Description
`SubplotBase`	`SubplotBase`	The Matplotlib Axes object with the generated plot.

Raises:

Type	Description
`ValueError`	If DataFrame is empty, required columns are missing, or invalid period specified.
`KeyError`	If specified columns don't exist in the DataFrame.

Source code in pyretailscience/plots/broken_timeline.py

def plot(
    df: pd.DataFrame,
    category_col: str,
    value_col: str,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    ax: Axes | None = None,
    source_text: str | None = None,
    period: str = "D",
    agg_func: str = "sum",
    threshold_value: float | None = None,
    bar_height: float = 0.8,
    figsize: tuple[int, int] | None = None,
    **kwargs: dict[str, Any],
) -> SubplotBase:
    """Creates a broken timeline plot showing data availability across categories over time.

    Shows periods where data is available as horizontal bars, with gaps indicating missing data periods.

    Args:
        df (pd.DataFrame): The input DataFrame containing the data to be plotted.
        category_col (str): The column containing categories to display on y-axis.
        value_col (str): The column containing values to determine data availability.
        title (str, optional): The title of the plot. Defaults to None.
        x_label (str, optional): The label for the x-axis. Defaults to None.
        y_label (str, optional): The label for the y-axis. Defaults to None.
        ax (Axes, optional): The Matplotlib Axes object to plot on. Defaults to None.
        source_text (str, optional): Text to be displayed as a source at the bottom of the plot. Defaults to None.
        period (str, optional): Period for aggregating data using pandas to_period ("D", "W").
            Defaults to "D".
        agg_func (str, optional): The aggregation function to apply to the value_col when grouping by period.
            Defaults to "sum".
        threshold_value (float, optional): Values below this threshold are considered gaps. Defaults to None.
        bar_height (float, optional): Height of timeline bars as fraction of available space. Defaults to 0.8.
        figsize: tuple[int, int] | None = None,
        **kwargs (dict[str, Any]): Additional keyword arguments for matplotlib broken_barh function.

    Returns:
        SubplotBase: The Matplotlib Axes object with the generated plot.

    Raises:
        ValueError: If DataFrame is empty, required columns are missing, or invalid period specified.
        KeyError: If specified columns don't exist in the DataFrame.
    """
    date_col = get_option("column.transaction_date")

    # Convert period to uppercase to handle lowercase inputs
    period = period.upper()

    # Validate required columns exist
    _validate_inputs(df, category_col, value_col, date_col, period)

    # Create a copy of the data and ensure date column is datetime
    df_copy = df.copy()
    df_copy[date_col] = pd.to_datetime(df_copy[date_col])

    # Apply threshold filter if specified
    if threshold_value is not None:
        df_copy = df_copy[df_copy[value_col] >= threshold_value]

    df_copy["period"] = df_copy[date_col].dt.to_period(period)
    df_copy = df_copy.groupby([category_col, "period"]).agg({value_col: agg_func}).reset_index()
    df_copy[date_col] = df_copy["period"].dt.start_time

    # Sort by date once for all categories
    df_copy = df_copy.sort_values(date_col)

    # Get unique categories and create y-axis mapping
    categories = sorted(df_copy[category_col].unique())
    category_to_y = {cat: i for i, cat in enumerate(categories)}

    if ax is None:
        _, ax = plt.subplots(figsize=figsize)

    # Use module-level period configuration
    gap_threshold = PERIOD_CONFIG[period]
    bar_color = COLORS["green"][500]

    # Process each category
    for category in categories:
        dates = df_copy[df_copy[category_col] == category][date_col].values

        # Convert to matplotlib date numbers and find segments
        dates_num = mdates.date2num(dates)
        gaps = np.diff(dates_num) > gap_threshold
        date_segments = np.split(dates_num, np.where(gaps)[0] + 1)

        # Calculate appropriate width based on period type
        base_width = PERIOD_CONFIG[period]
        segments = []
        for seg in date_segments:
            if len(seg) > 0:
                # Width should be number of periods * typical period duration
                width = len(seg) * base_width
                segments.append((seg[0], width))
        bar_offset = bar_height / 2
        ax.broken_barh(
            segments,
            (category_to_y[category] - bar_offset, bar_height),
            facecolors=bar_color,
            **kwargs,
        )

    # Configure y-axis
    ax.set_yticks(range(len(categories)))
    ax.set_yticklabels(categories)
    ax.invert_yaxis()

    # Configure x-axis for dates
    ax.xaxis_date()
    ax.xaxis.set_major_locator(mdates.AutoDateLocator())
    ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter(ax.xaxis.get_major_locator()))

    # Apply standard graph styles
    ax = gu.standard_graph_styles(
        ax=ax,
        title=title,
        x_label=x_label,
        y_label=y_label,
    )

    # Add source text if provided
    if source_text:
        gu.add_source_text(ax=ax, source_text=source_text)

    return gu.standard_tick_styles(ax=ax)