Between stats

State

Type	No. of groups	Test	Effect	Function used	Implemented
Parametric	2	Student/Welch	Cohen's d/Hedge's g	Test:`scipy.stats.ttest_ind`	❌
Non-parametric	2	Mann-Whitney U	r (rank-biserial correlation)	Test:`scipy.stats.mannwhitneyu`	❌
Robust	2	Yuen	Algina-Keselman-Penfield	Test:`scipy.stats.ttest_ind`	❌

Reference

`fleur.betweenstats.BetweenStats`

Statistical comparison and plotting class for between-group analysis.

This class provides functionality to visualize and statistically compare numerical data across two or more categorical groups. It supports t-tests for two groups and one-way ANOVA for three or more groups. Visualization options include violin plots, box plots, and swarm plots.

Attributes:

Name	Type	Description
`statistic`	`float`	The computed test statistic (t or F).
`pvalue`	`float`	The p-value of the statistical test.
`main_stat`	`str`	The formatted test statistic string for display.
`expression`	`str`	Full LaTeX-style annotation string.
`is_ANOVA`	`bool`	True if test is ANOVA, False if t-test.
`is_paired`	`bool`	Whether a paired test was used.
`dof`	`int`	Degrees of freedom for t-tests.
`dof_between`	`int`	Between-group degrees of freedom (for ANOVA).
`dof_within`	`int`	Within-group degrees of freedom (for ANOVA).
`n_cat`	`int`	Number of unique categories in the group column.
`n_obs`	`int`	Total number of observations.
`ax`	`Axes`	The matplotlib axes used for plotting.

`init(x, y, data=None, paired=False, **kwargs)`

Initialize a BetweenStats() instance.

Parameters:

Name	Type	Description	Default
`x`	`Union[str, SeriesT, Iterable]`	Colname of `data` or a Series or array-like.	required
`y`	`Union[str, SeriesT, Iterable]`	Colname of `data` or a Series or array-like.	required
`data`	`Optional[Frame]`	An optional dataframe.	`None`
`paired`	`bool`	If True, perform paired t-test (only for 2 groups).	`False`
`kwargs`		Additional arguments passed to the scipy test function. Either `scipy.stats.ttest_rel()`, `scipy.stats.ttest_ind()`, or `scipy.stats.f_oneway()`.	`{}`

`plot(*, orientation='vertical', colors=None, show_stats=True, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, ax=None)`

Plot and fit the BetweenStats class to data and render a statistical comparison plot. It detects how many groups you have and apply the required test for this number. All arguments must be passed as keyword arguments.

Parameters:

Name	Type	Description	Default
`orientation`	`str`	'vertical' or 'horizontal' orientation of plots.	`'vertical'`
`colors`	`Optional[list]`	List of colors for each group.	`None`
`show_stats`	`bool`	If True, display statistics on the plot.	`True`
`violin`	`bool`	Whether to include violin plot.	`True`
`box`	`bool`	Whether to include box plot.	`True`
`scatter`	`bool`	Whether to include scatter plot of raw data.	`True`
`violin_kws`	`Union[dict, None]`	Keyword args for violinplot customization.	`None`
`box_kws`	`Union[dict, None]`	Keyword args for boxplot customization.	`None`
`scatter_kws`	`Union[dict, None]`	Keyword args for scatter plot customization.	`None`
`ax`	`(Axes,)`	Existing Axes to plot on. If None, uses current Axes.	`None`

Returns:

Type	Description
`Figure`	A matplotlib Figure.

`summary()`

Print a text summary of the statistical test performed.

Displays the type of test conducted (t-test or ANOVA), number of groups, and the formatted test statistic with p-value and sample size.

Examples

Minimalist example

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot()

Change colors

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  colors=["#005f73", "#ee9b00", "#9b2226"]
)

Change orientation

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  orientation="horizontal"
)

Remove elements

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  box=False,
  scatter=False,
)

Hide statistics

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(show_stats=False)

Print summary statistics

# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).summary()

Between stats comparison

Test: One-way ANOVA with 3 groups
F(2, 147) = 119.26, p = 0.0000, n_obs = 150

Statistical details

When trying to compare groups, you should first answer the following questions:

Number of groups: the two cases are when there are 2 groups and when there 3 or more groups.
Independence of sample: are the group we're comparing the same person?
Paired groups: comparing the same people before and after giving them a drug
Independent groups: comparing a placebo and a treatment group
Data distribution:
Normal distribution: we use parametric tests (rely on a statistical law)
- Equality of variance: in parametric tests, we need to know if the variance in each group is the same or not
Not normal distribution: we use non-parametric tests (don't assume any statistical law)
Sample size: a too small sample size (n < 30) can be an issue because we lack statistical power

Comparing 2 groups

Independent samples

There are 2 cases here: whether we assume the data distribution is normal or not. Many time, not assuming normality is more realistic, but it also reduces the power of the test (the probability of detecting a given effect if that effect actually exists).

ParametricNon-parametric

Here we assume the data distribution is normal.

Equal variance: if the groups have equal variances: independent t-test.
Unequal variance: if the groups have unequal variances: Welch's t-test.

In both cases, we use the scipy.stats.ttest_ind() function.

Here we don't assume anything about the distribution and we need to use the Mann-Whitney U test.

Note that the Mann-Whitney U test compares distributions and not means. But this makes sense since not assuming normality (e.g having skewed distributions, for instance) implies that comparing means is not the best way to compare groups, which is what we want to do at the end.

In this case, we use the scipy.stats.mannwhitneyu() function.

Dependent (paired) samples

ParametricNon-parametric

Here we assume the data distribution is normal and we need to use a paired t-test.

Here we don't assume anything about the distribution and we need to use the Wilcoxon signed-rank test.

Comparing 3 or more groups

Independent samples

Again, there are parametric and non-parametric approaches depending on the assumption of normality. When normality is assumed, these tests compare group means; otherwise, they compare distributions more generally.

ParametricNon-parametric

Equal variance: if the groups have equal variances and normal distributions, use one-way ANOVA.
Unequal variance: if the groups have unequal variances, use Welch’s ANOVA.

Use the Kruskal-Wallis test, which does not assume normality and compares the overall distributions across groups.

Dependent (repeated measures) samples

ParametricNon-parametric

Assuming normality, use repeated measures ANOVA to compare means across related groups.

If normality is not assumed, use the Friedman test, which compares distributions across related groups without assuming normality.

Between stats

State

Reference

fleur.betweenstats.BetweenStats

__init__(x, y, data=None, paired=False, **kwargs)

plot(*, orientation='vertical', colors=None, show_stats=True, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, ax=None)

summary()

Examples

Statistical details

Comparing 2 groups

Independent samples

Dependent (paired) samples

Comparing 3 or more groups

Independent samples

Dependent (repeated measures) samples

`fleur.betweenstats.BetweenStats`

`init(x, y, data=None, paired=False, **kwargs)`

`plot(*, orientation='vertical', colors=None, show_stats=True, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, ax=None)`

`summary()`