Line Plot
This module provides flexible functionality for creating line plots from pandas DataFrames.
It focuses on visualizing sequences that are ordered or sequential but not necessarily categorical, such as "days since an event" or "months since a competitor opened." However, while this module can handle datetime values on the x-axis, the plots.time_line module has additional features that make working with datetimes easier, such as easily resampling the data to alternate time frames.
The sequences used in this module can include values like "days since an event" (e.g., -2, -1, 0, 1, 2) or "months since a competitor store opened." This module is not intended for use with actual datetime values.
Core Features
- Plotting Sequences or Indexes: Plot one or more value columns (
value_col) with support for sequences like -2, -1, 0, 1, 2 (e.g., months since an event), using either the index or a specified x-axis column (x_col). - Custom X-Axis or Index: Use any column as the x-axis (
x_col) or plot based on the index if no x-axis column is specified. - Multiple Lines: Create separate lines for each unique value in
group_col(e.g., categories or product types). - Comprehensive Customization: Easily customize plot titles, axis labels, and legends, with the option to move the legend outside the plot.
- Pre-Aggregated Data: The data must be pre-aggregated before plotting, as no aggregation occurs within the module.
Use Cases
- Daily Trends: Plot trends such as daily revenue or user activity, for example, tracking revenue since the start of the year.
- Event Impact: Visualize how metrics (e.g., revenue, sales, or traffic) change before and after an important event, such as a competitor store opening or a product launch.
- Category Comparison: Compare metrics across multiple categories over time, for example, tracking total revenue for the top categories before and after an event like the introduction of a new competitor.
Limitations and Handling of Temporal Data
- Limited Handling of Temporal Data: This module can plot simple time-based sequences, such as "days since an event," but it cannot manipulate or directly handle datetime or date-like columns. It is not optimized for actual datetime values.
If a datetime column is passed or more complex temporal plotting is needed, consider using the
plots.time_linemodule, which is specifically designed for working with temporal data and performing time-based manipulation. - Pre-Aggregated Data Required: The module does not perform any data aggregation, so all data must be pre-aggregated before being passed in for plotting.
plot(df, value_col=None, x_label=None, y_label=None, title=None, x_col=None, group_col=None, ax=None, source_text=None, legend_title=None, move_legend_outside=False, fill_na_value=None, highlight=None, **kwargs)
Plots the value_col over the specified x_col or index, creating a separate line for each unique value in group_col.
This function supports both pandas DataFrames and Series as input. When a Series is provided,
the Series values are plotted against its index, and value_col must be None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame | Series
|
The dataframe or series to plot. When a Series is provided, it represents the values to plot against its index. |
required |
value_col |
str | list[str]
|
The column(s) to plot. Must be None when df is a Series. Required when df is a DataFrame. |
None
|
x_label |
str
|
The x-axis label. |
None
|
y_label |
str
|
The y-axis label. |
None
|
title |
str
|
The title of the plot. |
None
|
x_col |
str
|
The column to be used as the x-axis. If None, the index is used. |
None
|
group_col |
str
|
The column used to define different lines. |
None
|
legend_title |
str
|
The title of the legend. |
None
|
ax |
Axes
|
Matplotlib axes object to plot on. |
None
|
source_text |
str
|
The source text to add to the plot. |
None
|
move_legend_outside |
bool
|
Move the legend outside the plot. |
False
|
fill_na_value |
float
|
Value to fill NaNs with after pivoting. |
None
|
highlight |
str | list[str]
|
Line(s) to highlight. When using
|
None
|
**kwargs |
dict[str, any]
|
Additional keyword arguments for Pandas' |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If df is a Series and |
ValueError
|
If df is a DataFrame and |
ValueError
|
If df is a Series and |
ValueError
|
If df is a Series and |
ValueError
|
If |
ValueError
|
If |
Examples:
Highlighting specific product categories:
>>> import pandas as pd
>>> from pyretailscience.plots import line
>>> df = pd.DataFrame({
... "month": [1, 2, 3, 1, 2, 3, 1, 2, 3],
... "category": ["Electronics", "Electronics", "Electronics",
... "Clothing", "Clothing", "Clothing",
... "Home", "Home", "Home"],
... "revenue": [100, 120, 140, 80, 85, 90, 60, 65, 70]
... })
>>> line.plot(
... df=df,
... x_col="month",
... value_col="revenue",
... group_col="category",
... highlight=["Electronics", "Clothing"], # Home will be muted
... title="Revenue by Category (Electronics & Clothing Highlighted)"
... )
Highlighting specific value columns:
>>> df = pd.DataFrame({
... "day": range(1, 6),
... "revenue": [100, 110, 120, 130, 140],
... "units_sold": [50, 55, 60, 65, 70],
... "avg_order_value": [2.0, 2.0, 2.0, 2.0, 2.0],
... "profit_margin": [0.2, 0.22, 0.24, 0.26, 0.28]
... })
>>> line.plot(
... df=df,
... x_col="day",
... value_col=["revenue", "units_sold", "avg_order_value", "profit_margin"],
... highlight=["revenue", "profit_margin"], # Other metrics muted
... title="Daily Metrics (Revenue & Profit Margin Highlighted)"
... )
Single highlighted line:
>>> line.plot(
... df=df,
... x_col="month",
... value_col="revenue",
... group_col="category",
... highlight="Electronics", # str is acceptable for single highlight
... title="Revenue by Category (Electronics Highlighted)"
... )
Source code in pyretailscience/plots/line.py
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |