Skip to content

Between stats

State

Type No. of groups Test Effect Function used Implemented
Parametric 2 Student/Welch Cohen's d/Hedge's g Test:scipy.stats.ttest_ind
Non-parametric 2 Mann-Whitney U r (rank-biserial correlation) Test:scipy.stats.mannwhitneyu
Robust 2 Yuen Algina-Keselman-Penfield Test:scipy.stats.ttest_ind

Reference

fleur.betweenstats.BetweenStats

Statistical comparison and plotting class for between-group analysis.

This class provides functionality to visualize and statistically compare numerical data across two or more categorical groups. It supports t-tests for two groups and one-way ANOVA for three or more groups. Visualization options include violin plots, box plots, and swarm plots.

Attributes:

Name Type Description
statistic float

The computed test statistic (t or F).

pvalue float

The p-value of the statistical test.

main_stat str

The formatted test statistic string for display.

expression str

Full LaTeX-style annotation string.

is_ANOVA bool

True if test is ANOVA, False if t-test.

is_paired bool

Whether a paired test was used.

dof int

Degrees of freedom for t-tests.

dof_between int

Between-group degrees of freedom (for ANOVA).

dof_within int

Within-group degrees of freedom (for ANOVA).

n_cat int

Number of unique categories in the group column.

n_obs int

Total number of observations.

ax Axes

The matplotlib axes used for plotting.

__init__(x, y, data=None, paired=False, **kwargs)

Initialize a BetweenStats() instance.

Parameters:

Name Type Description Default
x Union[str, SeriesT, Iterable]

Colname of data or a Series or array-like.

required
y Union[str, SeriesT, Iterable]

Colname of data or a Series or array-like.

required
data Optional[Frame]

An optional dataframe.

None
paired bool

If True, perform paired t-test (only for 2 groups).

False
kwargs

Additional arguments passed to the scipy test function. Either scipy.stats.ttest_rel(), scipy.stats.ttest_ind(), or scipy.stats.f_oneway().

{}

plot(*, orientation='vertical', colors=None, show_stats=True, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, ax=None)

Plot and fit the BetweenStats class to data and render a statistical comparison plot. It detects how many groups you have and apply the required test for this number. All arguments must be passed as keyword arguments.

Parameters:

Name Type Description Default
orientation str

'vertical' or 'horizontal' orientation of plots.

'vertical'
colors Optional[list]

List of colors for each group.

None
show_stats bool

If True, display statistics on the plot.

True
violin bool

Whether to include violin plot.

True
box bool

Whether to include box plot.

True
scatter bool

Whether to include scatter plot of raw data.

True
violin_kws Union[dict, None]

Keyword args for violinplot customization.

None
box_kws Union[dict, None]

Keyword args for boxplot customization.

None
scatter_kws Union[dict, None]

Keyword args for scatter plot customization.

None
ax (Axes,)

Existing Axes to plot on. If None, uses current Axes.

None

Returns:

Type Description
Figure

A matplotlib Figure.

summary()

Print a text summary of the statistical test performed.

Displays the type of test conducted (t-test or ANOVA), number of groups, and the formatted test statistic with p-value and sample size.


Examples

  • Minimalist example
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot()


  • Change colors
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  colors=["#005f73", "#ee9b00", "#9b2226"]
)


  • Change orientation
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  orientation="horizontal"
)


  • Remove elements
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(
  box=False,
  scatter=False,
)


  • Hide statistics
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).plot(show_stats=False)


  • Print summary statistics
# mkdocs: render
from fleur import BetweenStats
from fleur import datasets

df = datasets.load_iris()

BetweenStats(df["sepal_length"], df["species"]).summary()
Between stats comparison

Test: One-way ANOVA with 3 groups
F(2, 147) = 119.26, p = 0.0000, n_obs = 150



Statistical details

When trying to compare groups, you should first answer the following questions:

  • Number of groups: the two cases are when there are 2 groups and when there 3 or more groups.
  • Independence of sample: are the group we're comparing the same person?
  • Paired groups: comparing the same people before and after giving them a drug
  • Independent groups: comparing a placebo and a treatment group
  • Data distribution:
  • Normal distribution: we use parametric tests (rely on a statistical law)
    • Equality of variance: in parametric tests, we need to know if the variance in each group is the same or not
  • Not normal distribution: we use non-parametric tests (don't assume any statistical law)
  • Sample size: a too small sample size (n < 30) can be an issue because we lack statistical power

Comparing 2 groups

Independent samples

There are 2 cases here: whether we assume the data distribution is normal or not. Many time, not assuming normality is more realistic, but it also reduces the power of the test (the probability of detecting a given effect if that effect actually exists).

Here we assume the data distribution is normal.

  • Equal variance: if the groups have equal variances: independent t-test.
  • Unequal variance: if the groups have unequal variances: Welch's t-test.

In both cases, we use the scipy.stats.ttest_ind() function.

Here we don't assume anything about the distribution and we need to use the Mann-Whitney U test.

Note that the Mann-Whitney U test compares distributions and not means. But this makes sense since not assuming normality (e.g having skewed distributions, for instance) implies that comparing means is not the best way to compare groups, which is what we want to do at the end.

In this case, we use the scipy.stats.mannwhitneyu() function.

Dependent (paired) samples

Here we assume the data distribution is normal and we need to use a paired t-test.

Here we don't assume anything about the distribution and we need to use the Wilcoxon signed-rank test.

Comparing 3 or more groups

Independent samples

Again, there are parametric and non-parametric approaches depending on the assumption of normality. When normality is assumed, these tests compare group means; otherwise, they compare distributions more generally.

  • Equal variance: if the groups have equal variances and normal distributions, use one-way ANOVA.
  • Unequal variance: if the groups have unequal variances, use Welch’s ANOVA.

Use the Kruskal-Wallis test, which does not assume normality and compares the overall distributions across groups.

Dependent (repeated measures) samples

Assuming normality, use repeated measures ANOVA to compare means across related groups.

If normality is not assumed, use the Friedman test, which compares distributions across related groups without assuming normality.