Scatter Plot
This module provides functionality for creating scatter plots from pandas DataFrames.
It is designed to visualize relationships between variables, highlight distributions, and compare different categories using scatter points.
Core Features
- Flexible X-Axis Handling: Uses an index or a specified x-axis column (
x_col) for plotting. - Multiple Scatter Groups: Supports plotting multiple columns (
value_col) or groups (group_col). - Point Labels: Supports adding text labels to individual scatter points with automatic positioning to avoid overlaps.
- Dynamic Color Mapping: Automatically selects a colormap based on the number of groups.
- Legend Customization: Supports custom legend titles and the option to move the legend outside the plot.
- Source Text: Provides an option to add source attribution to the plot.
Use Cases
- Category-Based Scatter Plots: Compare different categories using scatter points.
- Trend Analysis: Identify patterns and outliers in datasets.
- Multi-Value Scatter Plots: Show multiple data series in a single scatter chart.
- Labeled Scatter Plots: Identify specific data points with text labels (e.g., product names, store IDs).
Label Support
- Single Series Labeling: When using a single
value_col, labels can be added vialabel_colparameter. - Group-Based Labeling: When using
group_col, each point gets labeled from the original DataFrame. - Automatic Label Positioning: Uses textalloc library to prevent label overlaps and optimize readability.
- Clean Label Display: Labels are positioned without connecting lines to maintain a clean appearance.
- Customizable Label Styling: Control label appearance through
label_kwargsparameter.
Limitations and Warnings
- Pre-Aggregated Data Required: The module does not perform data aggregation; data should be pre-aggregated before being passed to the function.
- Label Limitations: Point labels are not supported when
value_colis a list (raises ValueError).
plot(df, value_col, x_label=None, y_label=None, title=None, x_col=None, group_col=None, ax=None, source_text=None, legend_title=None, move_legend_outside=False, label_col=None, label_kwargs=None, **kwargs)
Plots a scatter chart for the given value_col over x_col or index, with optional grouping by group_col.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df |
DataFrame or Series
|
The dataframe or series to plot. |
required |
value_col |
str or list of str
|
The column(s) to plot. |
required |
x_label |
str
|
The x-axis label. |
None
|
y_label |
str
|
The y-axis label. |
None
|
title |
str
|
The title of the plot. |
None
|
x_col |
str
|
The column to be used as the x-axis. If None, the index is used. |
None
|
group_col |
str
|
The column used to define different scatter groups. |
None
|
legend_title |
str
|
The title of the legend. |
None
|
ax |
Axes
|
Matplotlib axes object to plot on. |
None
|
source_text |
str
|
The source text to add to the plot. |
None
|
move_legend_outside |
bool
|
Move the legend outside the plot. |
False
|
label_col |
str
|
Column name containing text labels for each point. Not supported when value_col is a list. Defaults to None. |
None
|
label_kwargs |
dict
|
Keyword arguments passed to textalloc.allocate(). Common options: textsize, nbr_candidates, min_distance, max_distance, draw_lines. By default, draw_lines=False to avoid lines connecting labels to points. Defaults to None. |
None
|
**kwargs |
dict[str, any]
|
Additional keyword arguments for matplotlib scatter function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If |
KeyError
|
If |
Source code in pyretailscience/plots/scatter.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | |