# Bonferroni correction

In statistics, the Bonferroni correction is one of several methods used to counteract the problem of multiple comparisons.

## Background

The Bonferroni correction is named after Italian mathematician Carlo Emilio Bonferroni for its use of Bonferroni inequalities,[1] but modern usage is often credited to Olive Jean Dunn, who described the procedure's application to confidence intervals.[2][3]

Statistical hypothesis testing is based on rejecting the null hypothesis if the likelihood of the observed data under the null hypotheses is low. If multiple comparisons are done or multiple hypotheses are tested, the chance of a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a Type I error) increases.[4][better source needed]

The Bonferroni correction[5] compensates for that increase by testing each individual hypothesis at a significance level of ${\displaystyle \alpha /m}$, where ${\displaystyle \alpha }$ is the desired overall alpha level and ${\displaystyle m}$ is the number of hypotheses.[6][citation needed] For example, if a trial is testing ${\displaystyle m=20}$ hypotheses with a desired ${\displaystyle \alpha =0.05}$, then the Bonferroni correction would test each individual hypothesis at ${\displaystyle \alpha =0.05/20=0.0025}$.[7]

## Definition

Let ${\displaystyle H_{1},\ldots ,H_{m}}$ be a family of hypotheses and ${\displaystyle p_{1},\ldots ,p_{m}}$ their corresponding p-values. Let ${\displaystyle m}$ be the total number of null hypotheses and ${\displaystyle m_{0}}$ the number of true null hypotheses. The familywise error rate (FWER) is the probability of rejecting at least one true ${\displaystyle H_{i}}$, that is, of making at least one type I error. The Bonferroni correction rejects the null hypothesis for each ${\displaystyle p_{i}\leq {\frac {\alpha }{m}}}$, thereby controlling the FWER at ${\displaystyle \leq \alpha }$. Proof of this control follows from Boole's inequality, as follows:

${\displaystyle {\text{FWER}}=P\left\{\bigcup _{i=1}^{m_{0}}\left(p_{i}\leq {\frac {\alpha }{m}}\right)\right\}\leq \sum _{i=1}^{m_{0}}\left\{P\left(p_{i}\leq {\frac {\alpha }{m}}\right)\right\}\leq m_{0}{\frac {\alpha }{m}}\leq m{\frac {\alpha }{m}}=\alpha .}$

This control does not require any assumptions about dependence among the p-values or about how many of the null hypotheses are true.[8]

## Extensions

### Generalization

Rather than testing each hypothesis at the ${\displaystyle \alpha /m}$ level, the hypotheses may be tested at any other combination of levels that add up to ${\displaystyle \alpha }$, provided that the level of each test is determined before looking at the data.[9] For example, for two hypothesis tests, an overall ${\displaystyle \alpha }$ of .05 could be maintained by conducting one test at .04 and the other at .01.

### Confidence intervals

The Bonferroni correction can be used to adjust confidence intervals. If one establishes ${\displaystyle m}$ confidence intervals, and wishes to have an overall confidence level of ${\displaystyle 1-\alpha }$, each individual confidence interval can be adjusted to the level of ${\displaystyle 1-{\frac {\alpha }{m}}}$.[10]

## Alternatives

There are alternative ways to control the familywise error rate. For example, the Holm–Bonferroni method and the Šidák correction are universally more powerful procedures than the Bonferroni correction, meaning that they are always at least as powerful. Unlike the Bonferroni procedure, these methods do not control the expected number of Type I errors per family (the per-family Type I error rate).[11]

## Criticism

With respect to FWER control, the Bonferroni correction can be conservative if there are a large number of tests and/or the test statistics are positively correlated.

The correction comes at the cost of increasing the probability of producing false negatives, i.e., reducing statistical power.[citation needed]

There is not a definitive consensus on how to define a family in all cases, and adjusted test results may vary depending on the number of tests included in the family of hypotheses.[citation needed]

Note that these criticisms apply to FWER control in general, and are not specific to the Bonferroni correction.[citation needed]

## References

1. ^ Bonferroni, C. E., Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 1936
2. ^ Dunn, Olive Jean (1959). "Estimation of the Medians for Dependent Variables". Annals of Mathematical Statistics. 30 (1): 192–197. doi:10.1214/aoms/1177706374. JSTOR 2237135.
3. ^ Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association. 56 (293): 52–64. doi:10.1080/01621459.1961.10482090.
4. ^ Mittelhammer, Ron C.; Judge, George G.; Miller, Douglas J. (2000). Econometric Foundations. Cambridge University Press. pp. 73–74. ISBN 0-521-62394-4.
5. ^ Ijsmi, Editor (2016-11-14). "Post-hoc and multiple comparison test – An overview with SAS and R Statistical Package". International Journal of Statistics and Medical Informatics. 1 (1): 1–9.
6. ^ Field, Andy (2009). Discovering Statistics Using SPSS. London: Sage. pp. 372–373. ISBN 978-1-84787-906-6.
7. ^ Why is multiple testing a problem ? Megan Goldman University of Berkeley course notes
8. ^ Goeman, Jelle J.; Solari, Aldo (2014). "Multiple Hypothesis Testing in Genomics". Statistics in Medicine. 33 (11). doi:10.1002/sim.6082.
9. ^ Neuwald, AF; Green, P (1994). "Detecting patterns in protein sequences". J. Mol. Biol. 239 (5): 698–712. doi:10.1006/jmbi.1994.1407. PMID 8014990.
10. ^ Dunn, Olive Jean (1961). "Multiple Comparisons Among Means" (PDF). Journal of the American Statistical Association. 56 (293): 52–64. doi:10.1080/01621459.1961.10482090.
11. ^ Frane, Andrew (2015). "Are per-family Type I error rates relevant in social and behavioral science?". Journal of Modern Applied Statistical Methods. 14 (1): 12–23.