Between stats
fleur.betweenstats.BetweenStats
Statistical comparison and plotting class for between-group analysis.
This class provides functionality to visualize and statistically compare numerical data across two or more categorical groups. It supports t-tests for two groups and one-way ANOVA for three or more groups. Visualization options include violin plots, box plots, and swarm plots.
Attributes:
Name | Type | Description |
---|---|---|
statistic |
float
|
The computed test statistic (t or F). |
pvalue |
float
|
The p-value of the statistical test. |
main_stat |
str
|
The formatted test statistic string for display. |
expression |
str
|
Full LaTeX-style annotation string. |
is_ANOVA |
bool
|
True if test is ANOVA, False if t-test. |
is_paired |
bool
|
Whether a paired test was used. |
dof |
int
|
Degrees of freedom for t-tests. |
dof_between |
int
|
Between-group degrees of freedom for ANOVA. |
dof_within |
int
|
Within-group degrees of freedom for ANOVA. |
n_cat |
int
|
Number of unique categories in the group column. |
n_obs |
int
|
Total number of observations. |
means |
list
|
A list with means. |
test_output |
list
|
The output of the statistical test. |
ax |
Axes
|
The matplotlib axes used for plotting. |
__init__(x, y, data=None, paired=False, approach='parametric', **kwargs)
Initialize a BetweenStats()
instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
str | SeriesT | Iterable
|
Colname of |
required |
y
|
str | SeriesT | Iterable
|
Colname of |
required |
data
|
Frame | None
|
An optional dataframe used if |
None
|
paired
|
bool
|
Whether comparing the same observations or not. |
False
|
approach
|
str
|
A character specifying the type of statistical approach: "parametric" (default), "nonparametric", "robust", "bayes". |
'parametric'
|
kwargs
|
Any
|
Additional arguments passed to the scipy test function.
Either |
{}
|
plot(*, orientation='vertical', colors=None, show_stats=True, show_means=True, jitter_amount=0.25, violin=True, box=True, scatter=True, violin_kws=None, box_kws=None, scatter_kws=None, mean_kws=dict(fontsize=7, color='black', bbox=dict(boxstyle='round', facecolor='#fefae0', alpha=0.7), zorder=50), mean_line_kws=dict(ls='--', lw=0.6, color='black'), ax=None)
Plot and fit the BetweenStats
class to data and render a statistical
comparison plot. It detects how many groups you have and apply the required
test for this number. All arguments must be passed as keyword arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
orientation
|
str
|
'vertical' or 'horizontal' orientation of plots. |
'vertical'
|
colors
|
list | None
|
List of colors for each group. |
None
|
show_stats
|
bool
|
If True, adds statistics on the plot. |
True
|
show_means
|
bool
|
If True, adds mean labels on the plot. |
True
|
jitter_amount
|
float
|
Controls the horizontal spread of dots to prevent overlap; 0 aligns them, higher values increase spacing. |
0.25
|
violin
|
bool
|
Whether to include violin plot. |
True
|
box
|
bool
|
Whether to include box plot. |
True
|
scatter
|
bool
|
Whether to include scatter plot of raw data. |
True
|
violin_kws
|
dict | None
|
Keyword args for violinplot customization. |
None
|
box_kws
|
dict | None
|
Keyword args for boxplot customization. |
None
|
scatter_kws
|
dict | None
|
Keyword args for scatter plot customization. |
None
|
mean_kws
|
dict | None
|
Keyword args for mean labels customization. |
dict(fontsize=7, color='black', bbox=dict(boxstyle='round', facecolor='#fefae0', alpha=0.7), zorder=50)
|
mean_line_kws
|
dict | None
|
Keyword arguments for the line connecting the mean point and the mean label. |
dict(ls='--', lw=0.6, color='black')
|
ax
|
(Axes,)
|
Existing Axes to plot on. If None, uses current Axes. |
None
|
Returns:
Type | Description |
---|---|
Figure
|
A matplotlib Figure. |
Examples
Statistical details
✅ means it's already implemented in fleur
.
❌ means it's not implemented in fleur
yet.
Comparing 2 groups
Independent samples
There are 2 cases here: whether we assume the data distribution is normal or not. Many time, not assuming normality is more realistic, but it also reduces the power of the test (the probability of detecting a given effect if that effect actually exists).
Here we assume the data distribution is normal.
- Equal variance: if the groups have equal variances: independent t-test.
- Unequal variance: if the groups have unequal variances: Welch's t-test.
Here we don't assume anything about the distribution and we need to use the Mann-Whitney U test.
Note that the Mann-Whitney U test compares distributions and not means. But this makes sense since not assuming normality (e.g having skewed distributions, for instance) implies that comparing means is not the best way to compare groups, which is what we want to do at the end.
Here we don't assume anything about the distribution and we need to use the "Yuen's t-test".
Dependent (paired) samples
Here we assume the data distribution is normal and we need to use a paired t-test.
Here we don't assume anything about the distribution and we need to use the Wilcoxon signed-rank test.
Here we don't assume anything about the distribution and we need to use the "Yuen's t-test" for dependent samples.
Comparing 3 or more groups
Independent samples
Again, there are parametric and non-parametric approaches depending on the assumption of normality. When normality is assumed, these tests compare group means; otherwise, they compare distributions more generally.
- Equal variance: if the groups have equal variances and normal distributions, use one-way ANOVA.
- Unequal variance: if the groups have unequal variances, use Welch’s ANOVA.
Use the Kruskal-Wallis test, which does not assume normality and compares the overall distributions across groups.
TODO
Dependent (repeated measures) samples
Assuming normality, use repeated measures ANOVA to compare means across related groups.
If normality is not assumed, use the Friedman test, which compares distributions across related groups without assuming normality.
TODO