Scatter Plot Gallery¶
The scatter plot is used for visualizing relationships between variables, highlighting distributions, and comparing different categories using scatter points. It supports flexible grouping, multiple data series, and point labeling capabilities.
Scatter plots excel at:
- Relationship Analysis: Identify patterns and correlations between variables
- Category Comparison: Compare different groups using distinct colors
- Multi-Value Visualization: Show multiple data series in a single chart
- Point Identification: Label specific data points for detailed analysis
In [ ]:
Copied!
import matplotlib.pyplot as plt
import pandas as pd
from pyretailscience.plots import scatter
import matplotlib.pyplot as plt
import pandas as pd
from pyretailscience.plots import scatter
Multiple Scatter Groups¶
Use group_col parameter to create separate scatter series for different categories. Each group gets a distinct color and legend entry.
In [ ]:
Copied!
# Create sample sales data with different product categories
# fmt: off
sales_data = pd.DataFrame({
"product_id": range(1, 21),
"price": [25, 45, 35, 65, 55, 75, 85, 95, 105, 115, 30, 50, 40, 70, 60, 80, 90, 100, 110, 120],
"units_sold": [120, 85, 95, 60, 70, 45, 40, 35, 25, 20, 110, 80, 90, 55, 65, 42, 38, 32, 22, 18],
"category": ["Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics",
"Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel"],
})
# fmt: on
scatter.plot(
sales_data,
x_col="price",
value_col="units_sold",
group_col="category",
title="Sales Performance by Product Category",
)
plt.show()
# Create sample sales data with different product categories
# fmt: off
sales_data = pd.DataFrame({
"product_id": range(1, 21),
"price": [25, 45, 35, 65, 55, 75, 85, 95, 105, 115, 30, 50, 40, 70, 60, 80, 90, 100, 110, 120],
"units_sold": [120, 85, 95, 60, 70, 45, 40, 35, 25, 20, 110, 80, 90, 55, 65, 42, 38, 32, 22, 18],
"category": ["Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics",
"Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel", "Home", "Electronics", "Apparel"],
})
# fmt: on
scatter.plot(
sales_data,
x_col="price",
value_col="units_sold",
group_col="category",
title="Sales Performance by Product Category",
)
plt.show()
Multiple Value Columns¶
Plot multiple metrics by passing a list to value_col. Note: This cannot be combined with group_col.
In [ ]:
Copied!
# Create sample store performance data
store_data = pd.DataFrame(
{
"month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
"revenue": [125000, 135000, 142000, 138000, 155000, 148000],
"profit": [15000, 18000, 21000, 19000, 24000, 22000],
"customers": [1200, 1350, 1420, 1380, 1580, 1460],
},
)
scatter.plot(
store_data,
x_col="month",
value_col=["revenue", "profit"],
title="Store Performance Metrics",
)
plt.show()
# Create sample store performance data
store_data = pd.DataFrame(
{
"month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
"revenue": [125000, 135000, 142000, 138000, 155000, 148000],
"profit": [15000, 18000, 21000, 19000, 24000, 22000],
"customers": [1200, 1350, 1420, 1380, 1580, 1460],
},
)
scatter.plot(
store_data,
x_col="month",
value_col=["revenue", "profit"],
title="Store Performance Metrics",
)
plt.show()
Point Labels¶
Add text labels to individual points using label_col. Only works with single value_col (not with lists or multiple groups).
In [ ]:
Copied!
# Create sample product performance data
product_data = pd.DataFrame(
{
"product_name": ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones", "Tablet"],
"price": [899, 25, 75, 299, 149, 399],
"satisfaction": [4.5, 4.2, 4.0, 4.3, 4.1, 4.4],
},
)
scatter.plot(
product_data,
x_col="price",
value_col="satisfaction",
label_col="product_name",
title="Product Price vs Customer Satisfaction",
)
plt.show()
# Create sample product performance data
product_data = pd.DataFrame(
{
"product_name": ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones", "Tablet"],
"price": [899, 25, 75, 299, 149, 399],
"satisfaction": [4.5, 4.2, 4.0, 4.3, 4.1, 4.4],
},
)
scatter.plot(
product_data,
x_col="price",
value_col="satisfaction",
label_col="product_name",
title="Product Price vs Customer Satisfaction",
)
plt.show()