Histogram Plot
This module provides flexible functionality for creating histograms from pandas DataFrames or Series.
It allows you to visualize distributions of one or more value columns and optionally group them by a categorical column. The module is designed to handle both DataFrames and Series, allowing you to create simple histograms or compare distributions across categories by splitting the data into multiple histograms.
Core Features
- Single or Multiple Histograms: Plot one or more value columns (
value_col
) as histograms. For example, visualize the distribution of a single metric or compare multiple metrics simultaneously. - Grouped Histograms: Create separate histograms for each unique value in
group_col
(e.g., product categories or regions), allowing for easy comparison of distributions across groups. - Range Clipping and Filling: Use
range_lower
andrange_upper
to limit the values being plotted by clipping them or filling values outside the range with NaN. This is particularly useful when visualizing specific data ranges. - Comprehensive Customization: Customize plot titles, axis labels, and legends, with the option to move the legend outside the plot.
Use Cases
- Distribution Analysis: Visualize the distribution of key metrics like revenue, sales, or user activity using single or multiple histograms.
- Group Comparisons: Compare distributions across different groups, such as product categories, geographic regions, or customer segments. For instance, plot histograms to show how sales vary across different product categories.
- Trends and Ranges: Use range_lower and range_upper to visualize data within specific ranges, filtering out outliers or focusing on key metrics for analysis.
Limitations and Handling of Data
- Pre-Aggregated Data Required: This module does not perform any data aggregation, so all data must be pre-aggregated before being passed in for plotting.
- Grouped Histograms: If
group_col
is provided, the data will be pivoted so that each unique value ingroup_col
becomes a separate histogram. Otherwise, a single histogram is plotted. - Series Support: The module can also handle pandas Series, though
group_col
cannot be provided when plotting a Series.
Additional Features
- Range Clipping or Filling: You can control how the data is visualized by specifying bounds. If data points fall outside the defined range, you can either clip them to the boundary values or fill them with NaN for exclusion.
- Legend Customization: For multiple histograms, you can add legends, including the option to move the legend outside the plot for clarity.
plot(df, value_col=None, group_col=None, title=None, x_label=None, y_label=None, legend_title=None, ax=None, source_text=None, move_legend_outside=False, range_lower=None, range_upper=None, range_method='clip', use_hatch=False, **kwargs)
Plots a histogram of value_col
, optionally split by group_col
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame | Series
|
The dataframe (or series) to plot. |
required |
value_col |
str or list of str
|
The column(s) to plot. Can be a list of columns for multiple histograms. |
None
|
group_col |
str
|
The column used to define different histograms. |
None
|
title |
str
|
The title of the plot. |
None
|
x_label |
str
|
The x-axis label. |
None
|
y_label |
str
|
The y-axis label. |
None
|
legend_title |
str
|
The title of the legend. |
None
|
ax |
Axes
|
Matplotlib axes object to plot on. |
None
|
source_text |
str
|
The source text to add to the plot. |
None
|
move_legend_outside |
bool
|
Move the legend outside the plot. |
False
|
range_lower |
float
|
Lower bound for clipping or filling NA values. |
None
|
range_upper |
float
|
Upper bound for clipping or filling NA values. |
None
|
range_method |
str
|
Whether to "clip" values outside the range or "fillna". Defaults to "clip". |
'clip'
|
use_hatch |
bool
|
Whether to use hatching for the bars. |
False
|
**kwargs |
dict[str, Any]
|
Additional keyword arguments for Pandas' |
{}
|
Returns:
Name | Type | Description |
---|---|---|
SubplotBase |
SubplotBase
|
The matplotlib axes object. |