Index Plot
This module provides functionality for creating index plots in retail analytics.
Index plots are useful for comparing the performance of different categories or segments against a baseline or average, typically set at 100. The module supports customization of the plot's appearance, sorting of data, and filtering by specific groups, offering valuable insights into retail operations.
Features
- Index Plot Creation: Visualize how categories or segments perform relative to a baseline value, typically set at 100. Useful for comparing performance across products, regions, or customer segments.
- Flexible Sorting: Sort data by either group or value to highlight specific trends in the data.
- Data Filtering: Filter data based on specified groups to focus on specific categories or exclude unwanted data.
- Highlighting Range: Highlight specific ranges of values (e.g., performance range between 80-120) to focus on performance.
- Series Support: Optionally include a
series_col
for plotting multiple series (e.g., time periods) within the same plot. - Graph Customization: Adjust titles, axis labels, legend titles, and styling to match the specific context of the analysis.
Use Cases
- Retail Performance Comparison: Compare product or regional performance to the company average or baseline using an index plot.
- Customer Segment Analysis: Evaluate customer segment behavior against overall performance, helping identify high-performing segments.
- Operational Insights: Identify areas of concern or opportunity by comparing store, region, or product performance against the baseline.
- Visualizing Retail Strategy: Support decision-making by visualizing which categories or products overperform or underperform relative to a baseline.
Limitations and Handling of Data
- Data Grouping and Aggregation: Supports aggregation functions such as sum, average, etc., for calculating the index.
- Sorting: Sorting can be applied by group or value, allowing analysts to focus on specific trends. If
series_col
is provided, sorting bygroup
is applied. - Group Filtering: Users can exclude or include specific groups for focused analysis, with error handling to ensure conflicting options are not used simultaneously.
Functionality Details
- plot(): Generates the index plot, which can be customized with multiple options such as sorting, filtering, and styling.
- get_indexes(): Helper function for calculating the index of the value column for a given subset of the dataframe based on filters and aggregation.
filter_by_groups(df, group_col, exclude_groups=None, include_only_groups=None)
Filter dataframe by groups.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe to filter. |
required |
group_col |
str
|
The column name for grouping. |
required |
exclude_groups |
list[any]
|
Groups to exclude. Defaults to None. |
None
|
include_only_groups |
list[any]
|
Groups to include. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The filtered dataframe. |
Source code in pyretailscience/plots/index.py
filter_by_value_thresholds(df, filter_above=None, filter_below=None)
Filter dataframe by index value thresholds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe to filter. |
required |
filter_above |
float
|
Only keep indices above this value. Defaults to None. |
None
|
filter_below |
float
|
Only keep indices below this value. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The filtered dataframe. |
Raises:
Type | Description |
---|---|
ValueError
|
If filtering results in an empty dataset. |
Source code in pyretailscience/plots/index.py
filter_top_bottom_n(df, top_n=None, bottom_n=None)
Filter dataframe to include only top N and/or bottom N rows by index value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe to filter. |
required |
top_n |
int
|
Number of top items to include. Defaults to None. |
None
|
bottom_n |
int
|
Number of bottom items to include. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The filtered dataframe. |
Raises:
Type | Description |
---|---|
ValueError
|
If top_n or bottom_n exceed the available groups. |
ValueError
|
If the sum of top_n and bottom_n exceeds the total number of groups. |
ValueError
|
If filtering results in an empty dataset. |
Source code in pyretailscience/plots/index.py
get_indexes(df, value_to_index, index_col, value_col, group_col, index_subgroup_col=None, agg_func='sum', offset=0)
Calculates the index of the value_col using Ibis for efficient computation at scale.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame | Table
|
The dataframe or Ibis table to calculate the index on. Can be a pandas dataframe or an Ibis table. |
required |
value_to_index |
str
|
The baseline category or value to index against (e.g., "A"). |
required |
index_col |
str
|
The column to calculate the index on (e.g., "category"). |
required |
value_col |
str
|
The column to calculate the index on (e.g., "sales"). |
required |
group_col |
str
|
The column to group the data by (e.g., "region"). |
required |
index_subgroup_col |
str
|
The column to subgroup the index by (e.g., "store_type"). Defaults to None. |
None
|
agg_func |
str
|
The aggregation function to apply to the |
'sum'
|
offset |
int
|
The offset value to subtract from the index. This allows for adjustments to the index values. Defaults to 0. |
0
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: The calculated index values with grouping columns. |
Source code in pyretailscience/plots/index.py
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 |
|
plot(df, value_col, group_col, index_col, value_to_index, agg_func='sum', series_col=None, title=None, x_label='Index', y_label=None, legend_title=None, highlight_range='default', sort_by='group', sort_order='ascending', ax=None, source_text=None, exclude_groups=None, include_only_groups=None, drop_na=False, top_n=None, bottom_n=None, filter_above=None, filter_below=None, **kwargs)
Creates an index plot.
Index plots are visual tools used in retail analytics to compare different categories or segments against a baseline or average value, typically set at 100. Index plots allow analysts to:
- Quickly identify which categories over- or underperform relative to the average
- Compare performance across diverse categories on a standardized scale
- Highlight areas of opportunity or concern in retail operations
- Easily communicate relative performance to stakeholders without revealing sensitive absolute numbers
In retail contexts, index plots are valuable for:
- Comparing sales performance across product categories
- Analyzing customer segment behavior against the overall average
- Evaluating store or regional performance relative to company-wide metrics
- Identifying high-potential areas for growth or investment
By normalizing data to an index, these plots facilitate meaningful comparisons and help focus attention on significant deviations from expected performance, supporting more informed decision-making in retail strategy and operations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The dataframe to plot. |
required |
value_col |
str
|
The column to plot. |
required |
group_col |
str
|
The column to group the data by. |
required |
index_col |
str
|
The column to calculate the index on (e.g., "category"). |
required |
value_to_index |
str
|
The baseline category or value to index against (e.g., "A"). |
required |
agg_func |
str
|
The aggregation function to apply to the value_col. Defaults to "sum". |
'sum'
|
series_col |
str
|
The column to use as the series. Defaults to None. |
None
|
title |
str
|
The title of the plot. Defaults to None. When None the title is set to
|
None
|
x_label |
str
|
The x-axis label. Defaults to "Index". |
'Index'
|
y_label |
str
|
The y-axis label. Defaults to None. When None the y-axis label is set to the title
case of |
None
|
legend_title |
str
|
The title of the legend. Defaults to None. When None the legend title is set to
the title case of |
None
|
highlight_range |
Literal['default'] | tuple[float, float] | None
|
The range to highlight. Defaults to "default". When "default" the range is set to (80, 120). When None no range is highlighted. |
'default'
|
sort_by |
Literal['group', 'value'] | None
|
The column to sort by. Defaults to "group". When None the data is not sorted. When "group" the data is sorted by group_col. When "value" the data is sorted by the value_col. When series_col is not None this option is ignored. |
'group'
|
sort_order |
Literal['ascending', 'descending']
|
The order to sort the data. Defaults to "ascending". |
'ascending'
|
ax |
Axes
|
The matplotlib axes object to plot on. Defaults to None. |
None
|
source_text |
str
|
The source text to add to the plot. Defaults to None. |
None
|
exclude_groups |
list[any]
|
The groups to exclude from the plot. Defaults to None. |
None
|
include_only_groups |
list[any]
|
The groups to include in the plot. Defaults to None. When None all groups are included. When not None only the groups in the list are included. Can not be used with exclude_groups. |
None
|
drop_na |
bool
|
Whether to drop NA index values. Defaults to False. |
False
|
top_n |
int
|
Display only the top N indexes by value. Only applicable when series_col is None. Defaults to None. |
None
|
bottom_n |
int
|
Display only the bottom N indexes by value. Only applicable when series_col is None. Defaults to None. |
None
|
filter_above |
float
|
Only display indexes above this value. Only applicable when series_col is None. Defaults to None. |
None
|
filter_below |
float
|
Only display indexes below this value. Only applicable when series_col is None. Defaults to None. |
None
|
**kwargs |
dict[str, any]
|
Additional keyword arguments to pass to the Pandas plot function. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |
Raises:
Type | Description |
---|---|
ValueError
|
If sort_by is not either "group" or "value" or None. |
ValueError
|
If sort_order is not either "ascending" or "descending". |
ValueError
|
If exclude_groups and include_only_groups are used together. |
ValueError
|
If both top_n and bottom_n are provided but their sum exceeds the total number of groups. |
ValueError
|
If top_n or bottom_n exceed the number of available groups. |
ValueError
|
If top_n, bottom_n, filter_above, or filter_below are used when series_col is provided. |
ValueError
|
If filtering results in an empty dataset. |
Source code in pyretailscience/plots/index.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 |
|