Skip to content

Between stats

fleur.betweenstats.BetweenStats

Statistical comparison and plotting class for between-group analysis.

This class provides functionality to visualize and statistically compare numerical data across two or more categorical groups. It supports t-tests for two groups and one-way ANOVA for three or more groups. Visualization options include violin plots, box plots, and swarm plots.

Attributes:

Name Type Description
statistic float

The computed test statistic (t or F).

pvalue float

The p-value of the statistical test.

main_stat str

The formatted test statistic string for display.

expression str

Full LaTeX-style annotation string.

is_ANOVA bool

True if test is ANOVA, False if t-test.

is_paired bool

Whether a paired test was used.

dof int

Degrees of freedom for t-tests.

dof_between int

Between-group degrees of freedom for ANOVA.

dof_within int

Within-group degrees of freedom for ANOVA.

n_cat int

Number of unique categories in the group column.

n_obs int

Total number of observations.

means list

A list with means.

test_output list

The output of the statistical test.

ax Axes

The matplotlib axes used for plotting.

__init__(x, y, data=None, paired=False, approach='parametric', **kwargs)

Initialize a BetweenStats() instance.

Parameters:

Name Type Description Default
x str | SeriesT | Iterable

Colname of data or a Series or array-like.

required
y str | SeriesT | Iterable

Colname of data or a Series or array-like.

required
data Frame | None

An optional dataframe used if x and y are colnames.

None
paired bool

Whether comparing the same observations or not.

False
approach str

A character specifying the type of statistical approach: "parametric" (default), "nonparametric", "robust", "bayes".

'parametric'
kwargs Any

Additional arguments passed to the scipy test function. Either scipy.stats.ttest_rel(), scipy.stats.ttest_ind(), scipy.stats.f_oneway(), scipy.stats.wilcoxon()

{}

plot(*, orientation='vertical', colors=None, show_stats=True, show_means=True, jitter_amount=0.25, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, mean_kws=dict(fontsize=7, color='black', bbox=dict(boxstyle='round', facecolor='#fefae0', alpha=0.7), zorder=50), mean_line_kws=dict(ls='--', lw=0.6, color='black'), ax=None)

Plot and fit the BetweenStats class to data and render a statistical comparison plot. It detects how many groups you have and apply the required test for this number. All arguments must be passed as keyword arguments.

Parameters:

Name Type Description Default
orientation str

'vertical' or 'horizontal' orientation of plots.

'vertical'
colors list | None

List of colors for each group.

None
show_stats bool

If True, adds statistics on the plot.

True
show_means bool

If True, adds mean labels on the plot.

True
jitter_amount float

Controls the horizontal spread of dots to prevent overlap; 0 aligns them, higher values increase spacing.

0.25
violin bool

Whether to include violin plot.

True
box bool

Whether to include box plot.

True
scatter bool

Whether to include scatter plot of raw data.

True
violin_kws dict | None

Keyword args for violinplot customization.

None
box_kws dict | None

Keyword args for boxplot customization.

None
scatter_kws dict | None

Keyword args for scatter plot customization.

None
mean_kws dict | None

Keyword args for mean labels customization.

dict(fontsize=7, color='black', bbox=dict(boxstyle='round', facecolor='#fefae0', alpha=0.7), zorder=50)
mean_line_kws dict | None

Keyword arguments for the line connecting the mean point and the mean label.

dict(ls='--', lw=0.6, color='black')
ax (Axes,)

Existing Axes to plot on. If None, uses current Axes.

None

Returns:

Type Description
Figure

A matplotlib Figure.


Examples

# mkdocs: render
from fleur import BetweenStats
from fleur import data

df = data.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot()
# mkdocs: render
from fleur import BetweenStats
from fleur import data

df = data.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
   colors=["#005f73", "#ee9b00", "#9b2226"]
)
# mkdocs: render
from fleur import BetweenStats
from fleur import data

df = data.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
   orientation="horizontal"
)
# mkdocs: render
from fleur import BetweenStats
from fleur import data

df = data.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  show_stats=False
)
# mkdocs: render
from fleur import BetweenStats
from fleur import data

df = data.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  box=False,
  scatter=False,
  violin=True, # default
)



Statistical details

✅ means it's already implemented in fleur.

❌ means it's not implemented in fleur yet.

Comparing 2 groups

Independent samples

There are 2 cases here: whether we assume the data distribution is normal or not. Many time, not assuming normality is more realistic, but it also reduces the power of the test (the probability of detecting a given effect if that effect actually exists).

Here we assume the data distribution is normal.

  • Equal variance: if the groups have equal variances: independent t-test.
  • Unequal variance: if the groups have unequal variances: Welch's t-test.

Here we don't assume anything about the distribution and we need to use the Mann-Whitney U test.

Note that the Mann-Whitney U test compares distributions and not means. But this makes sense since not assuming normality (e.g having skewed distributions, for instance) implies that comparing means is not the best way to compare groups, which is what we want to do at the end.

Here we don't assume anything about the distribution and we need to use the "Yuen's t-test".

Dependent (paired) samples

Here we assume the data distribution is normal and we need to use a paired t-test.

Here we don't assume anything about the distribution and we need to use the Wilcoxon signed-rank test.

Here we don't assume anything about the distribution and we need to use the "Yuen's t-test" for dependent samples.

Comparing 3 or more groups

Independent samples

Again, there are parametric and non-parametric approaches depending on the assumption of normality. When normality is assumed, these tests compare group means; otherwise, they compare distributions more generally.

  • Equal variance: if the groups have equal variances and normal distributions, use one-way ANOVA.
  • Unequal variance: if the groups have unequal variances, use Welch’s ANOVA.

Use the Kruskal-Wallis test, which does not assume normality and compares the overall distributions across groups.

TODO

Dependent (repeated measures) samples

Assuming normality, use repeated measures ANOVA to compare means across related groups.

If normality is not assumed, use the Friedman test, which compares distributions across related groups without assuming normality.

TODO