Skip to content

Scatter stats

fleur.scatterstats.ScatterStats

Statistical correlation and plotting class for numerical variables.

Attributes:

Name Type Description
n_obs int

Total number of observations.

correlation float

Value of the correlation (Pearson, etc).

alpha float

Probability of rejecting a true null hypothesis.

dof int

Degrees of freedom for t-test.

pvalue float

P-value of the t-test.

intercept float

The intercept (estimation of beta2) in the model.

slope float

The slope (estimation of beta1) in the model.

stderr_slope float

Standard error of the slope.

ci_lower float

Lower bound of the confidence interval.

ci_upper float

Upper bound of the confidence interval.

ax Axes

The main matplotlib axes.

fig Figure

The matplotlib figure.

__init__(x, y, data=None, alternative='two-sided', effect_size='pearson', ci=95)

Initialize a ScatterStats() instance.

Parameters:

Name Type Description Default
x str | SeriesT | Iterable

Colname of data or a Series or array-like.

required
y str | SeriesT | Iterable

Colname of data or a Series or array-like.

required
data Frame | None

An optional dataframe.

None
alternative str

Defines the alternative hypothesis. Default is 'two-sided'. Must be one of 'two-sided', 'less' and 'greater'.

'two-sided'
effect_size str

The correlation measure to use. Default is 'pearson'. Must be one of 'pearson', 'kendall', 'spearman'.

'pearson'
ci int | float

Confidence level for the label and the regression plot. The default value is 95 (for a 95% confidence level).

95

plot(*, bins=None, hist=True, scatter=True, line=True, area=True, scatter_kws=None, line_kws=None, area_kws=None, hist_kws=None, subplot_mosaic_kwargs=None, show_stats=True)

Plot a scatter plot of two variables, with a linear regression line and annotate it with main statistical results.

Parameters:

Name Type Description Default
bins int | list[int] | None

Number of bins for the marginal distributions. This can be an integer or a list of two integers (the first for the top distribution and the second for the other).

None
hist bool

Whether to include histograms of marginal distributions.

True
scatter bool

Whether to include the scatter plot.

True
line bool

Whether to include the line of the regression.

True
area bool

Whether to include the area of the confidence interval.

True
line_kws dict | None

Additional parameters which will be passed to the plot() function in matplotlib.

None
scatter_kws dict | None

Additional parameters which will be passed to the scatter() function in matplotlib.

None
area_kws dict | None

Additional parameters which will be passed to the fill_between() function in matplotlib.

None
hist_kws dict | None

Additional parameters which will be passed to the hist() function in matplotlib.

None
subplot_mosaic_kwargs dict | None

Additional keyword arguments to pass to plt.subplot_mosaic(). Default is None.

None
show_stats bool

If True, display statistics on the plot.

True


Examples

  • Minimalist example
# mkdocs: render
from fleur import ScatterStats
from fleur import data

df = data.load_iris()

ScatterStats(x=df["sepal_length"], y=df["sepal_width"]).plot()