正态性检验：修订间差异

2024年1月20日 (六) 01:35的最新版本

在统计学中，正态性检验用于确定一个数据集是否能被正态分布良好地建模，以及计算该数据集底层的随机变量是正态分布的可能性有多高。

更准确地说，这些检验是一种模型选择的形式，可以根据个人对概率解释的理解，有多种解释方式：

在描述统计学术语中，人们测量一个正态模型对数据的拟合优度——如果拟合效果不佳，则表示数据在这方面不适合用正态分布来建模，而不对任何潜在变量做出判断。
在频率统计学的统计假设检验中，数据会被测试是否符合正态分布的零假设。
在贝叶斯统计学中，人们并不直接“检验正态性”，而是计算数据来自给定参数μ,σ（对于所有的μ,σ）的正态分布的可能性，并将其与数据来自其他备选分布的可能性进行比较，最简单的方法是使用贝叶斯因子（给出了在不同模型下看到数据的相对可能性），或者更细致地，对可能的模型和参数采用先验分布，并在计算出的可能性基础上计算后验分布。

正态性检验用于确定样本数据是否来自（在某种容忍度内的）正态分布总体。许多统计检验，如学生的t检验、单因素和双因素方差分析（ANOVA），要求样本总体是正态分布的。

图形方法

检验正态性的非正式方法是将样本数据的直方图与正态概率曲线进行比较。数据的经验分布（即直方图）应呈钟形，并与正态分布相似。如果样本量较小，这可能难以判断。在这种情况下，可以通过将数据对分位数进行回归来进行处理，这些分位数来自具有与样本相同均值和方差的正态分布。与回归线的拟合不佳表明数据偏离了正态性（参见安德森-达林系数和Minitab）。

评估正态性的图形工具是正态概率图，这是将标准化数据与标准正态分布进行的分位数-分位数图（QQ图）。这里样本数据与正态分位数之间的相关性（一种拟合优度的衡量）衡量了数据被正态分布建模的程度。对于正态数据，QQ图中绘制的点应大致落在一条直线上，表明有高度的正向相关性。这些图表易于解读，并且有一个优点是异常值很容易被识别。

简易口袋计算检验

简单的口袋计算检验取样本最大值和最小值，计算它们的z分数，或更准确地说是t统计量（样本距离样本均值的样本标准差数），并将其与68–95–99.7法则进行比较：如果有一个3σ事件（准确来说是3s事件）且样本量远少于300个，或者有一个4s事件且样本量远少于15,000个，则正态分布会低估样本数据中的最大偏差幅度。

这种检验在面对峰度风险——即大偏差很重要的情况下——非常有用，并且具有易于计算和传达的优点：非统计学家可以很容易地理解“在正态分布中6σ事件是非常罕见的”。

频率主义检验

单变量正态性的检验包括以下几种：

D'Agostino的K平方检验，
Jarque–Bera检验，
Anderson–Darling检验，
Cramér–von Mises准则，
Kolmogorov–Smirnov检验（此项检验仅在零假设下假设正态分布的均值和方差已知时有效），
Lilliefors检验（基于Kolmogorov–Smirnov检验，调整用于估算数据的均值和方差），
Shapiro–Wilk检验，以及
Pearson卡方检验。

2011年的一项研究得出结论，Shapiro–Wilk检验在给定显著性水平下具有最佳的检验功效，其次是Anderson–Darling检验，特别是在比较Shapiro–Wilk、Kolmogorov–Smirnov、Lilliefors和Anderson–Darling检验时。^[1]

一些发表的作品推荐Jarque–Bera检验，^[2]^[3] 但这种检验存在弱点。特别是，对于短尾分布，尤其是双峰分布，该检验的功效较低。^[4] 由于整体表现不佳，一些作者选择不在其研究中包含此检验的结果。^[5]

在历史上，标准化矩的第三和第四（偏度和峰度）是最早的检验正态性的方法之一。Lin–Mudholkar检验专门针对不对称的备选方案。^[6] Jarque–Bera检验本身就是从偏度和峰度估计中衍生出来的。Mardia的多变量偏度和峰度测试将矩测试推广到多变量情况。^[7] 早期的其他检验统计量包括平均绝对偏差与标准差的比值，以及极差与标准差的比值。^[8]

最近的正态性检验包括能量测试^[9]（Székely和Rizzo）和基于经验特征函数（ECF）的测试（例如Epps和Pulley,^[10] Henze–Zirkler,^[11] BHEP测试^[12]）。能量和ECF测试是强有力的测试，适用于检验单变量或多变量正态性，且在统计上对于一般的备选方案具有一致性。

正态分布在给定标准差的所有分布中具有最高的差分熵。基于这一特性的正态性检验有很多，第一个归功于Vasicek。^[13]

贝叶斯测试

Kullback–Leibler散度在斜率和方差的整体后验分布之间的差异并不表明非正态性。然而，这些后验的期望比值和比值的期望给出的结果与Shapiro–Wilk统计量类似，除非样本量很小且使用了非信息性先验。^[14]

Spiegelhalter建议使用贝叶斯因子来比较正态性与不同类别的分布备选方案。^[15] 这种方法被Farrell和Rogers-Stewart进一步扩展。^[16]

Spiegelhalter建议使用贝叶斯因子来比较正态分布与不同类别的分布替代方案。^[17] 这种方法后来由Farrell和Rogers-Stewart进一步扩展。^[18]

应用

正态性检验的一个应用是对线性回归模型的残差进行测试。^[19] 如果这些残差不是正态分布的，那么它们不应该用于Z检验或任何其他基于正态分布的检验，如t检验、F检验和卡方检验。如果残差不是正态分布的，那么因变量或至少一个解释变量可能有错误的函数形式，或者可能缺少重要变量等。纠正这些系统误差中的一个或多个可能会产生正态分布的残差；换句话说，残差的非正态性通常是模型缺陷而不是数据问题。^[20]

引用

↑ Razali, Nornadiah; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests". Journal of Statistical Modeling and Analytics. 2 (1): 21–33.
↑ Judge, George G.; Griffiths, W. E.; Hill, R. Carter; Lütkepohl, Helmut; Lee, T. (1988). Introduction to the Theory and Practice of Econometrics (Second ed.). Wiley. pp. 890–892. ISBN 978-0-471-08277-4.
↑ Gujarati, Damodar N. (2002). Basic Econometrics (Fourth ed.). McGraw Hill. pp. 147–148. ISBN 978-0-07-123017-9.
↑ Thadewald, Thorsten; Büning, Herbert (1 January 2007). "Jarque–Bera Test and its Competitors for Testing Normality – A Power Comparison". Journal of Applied Statistics. 34 (1): 87–105. CiteSeerX 10.1.1.507.1186. doi:10.1080/02664760600994539. S2CID 13866566.
↑ Sürücü, Barış (1 September 2008). "A power comparison and simulation study of goodness-of-fit tests". Computers & Mathematics with Applications. 56 (6): 1617–1625. doi:10.1016/j.camwa.2008.03.010.
↑ Lin, C. C.; Mudholkar, G. S. (1980). "A simple test for normality against asymmetric alternatives". Biometrika. 67 (2): 455–461. doi:10.1093/biomet/67.2.455.
↑ Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530.
↑ Filliben, J. J. (February 1975). "The Probability Plot Correlation Coefficient Test for Normality". Technometrics. 17 (1): 111–117. doi:10.2307/1268008. JSTOR 1268008.
↑ Székely, G. J. and Rizzo, M. L. (2005) A new test for multivariate normality, Journal of Multivariate Analysis 93, 58–80.
↑ Epps, T. W., and Pulley, L. B. (1983). A test for normality based on the empirical characteristic function. Biometrika 70, 723–726.
↑ Henze, N., and Zirkler, B. (1990). A class of invariant and consistent tests for multivariate normality. Communications in Statistics – Theory and Methods 19, 3595–3617.
↑ Henze, N., and Wagner, T. (1997). A new approach to the BHEP tests for multivariate normality. Journal of Multivariate Analysis 62, 1–23.
↑ Vasicek, Oldrich (1976). "A Test for Normality Based on Sample Entropy". Journal of the Royal Statistical Society. Series B (Methodological). 38 (1): 54–59. JSTOR 2984828.
↑ Young K. D. S. (1993), "Bayesian diagnostics for checking assumptions of normality". Journal of Statistical Computation and Simulation, 47 (3–4),167–180
↑ Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi:10.1093/biomet/67.2.493
↑ Farrell, P.J., Rogers-Stewart, K. (2006) "Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test". Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi:10.1080/10629360500109023
↑ Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi:10.1093/biomet/67.2.493
↑ Farrell, P.J., Rogers-Stewart, K. (2006) "Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test". Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi:10.1080/10629360500109023
↑ Portney, L.G. & Watkins, M.P. (2000). Foundations of clinical research: applications to practice. New Jersey: Prentice Hall Health. pp. 516–517. ISBN 0838526950.{{cite book}}: CS1 maint: multiple names: authors list (link)
↑ Pek, Jolynn; Wong, Octavia; Wong, Augustine C. M. (2018-11-06). "How to Address Non-normality: A Taxonomy of Approaches, Reviewed, and Illustrated". Frontiers in Psychology. 9: 2104. doi:10.3389/fpsyg.2018.02104. ISSN 1664-1078. PMC 6232275. PMID 30459683.

[1] Razali, Nornadiah; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests". Journal of Statistical Modeling and Analytics. 2 (1): 21–33.

[2] Judge, George G.; Griffiths, W. E.; Hill, R. Carter; Lütkepohl, Helmut; Lee, T. (1988). Introduction to the Theory and Practice of Econometrics (Second ed.). Wiley. pp. 890–892. ISBN 978-0-471-08277-4.

[3] Gujarati, Damodar N. (2002). Basic Econometrics (Fourth ed.). McGraw Hill. pp. 147–148. ISBN 978-0-07-123017-9.

[4] Thadewald, Thorsten; Büning, Herbert (1 January 2007). "Jarque–Bera Test and its Competitors for Testing Normality – A Power Comparison". Journal of Applied Statistics. 34 (1): 87–105. CiteSeerX 10.1.1.507.1186. doi:10.1080/02664760600994539. S2CID 13866566.

[5] Sürücü, Barış (1 September 2008). "A power comparison and simulation study of goodness-of-fit tests". Computers & Mathematics with Applications. 56 (6): 1617–1625. doi:10.1016/j.camwa.2008.03.010.

[6] Lin, C. C.; Mudholkar, G. S. (1980). "A simple test for normality against asymmetric alternatives". Biometrika. 67 (2): 455–461. doi:10.1093/biomet/67.2.455.

[7] Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530.

[8] Filliben, J. J. (February 1975). "The Probability Plot Correlation Coefficient Test for Normality". Technometrics. 17 (1): 111–117. doi:10.2307/1268008. JSTOR 1268008.

[9] Székely, G. J. and Rizzo, M. L. (2005) A new test for multivariate normality, Journal of Multivariate Analysis 93, 58–80.

[10] Epps, T. W., and Pulley, L. B. (1983). A test for normality based on the empirical characteristic function. Biometrika 70, 723–726.

[11] Henze, N., and Zirkler, B. (1990). A class of invariant and consistent tests for multivariate normality. Communications in Statistics – Theory and Methods 19, 3595–3617.

[12] Henze, N., and Wagner, T. (1997). A new approach to the BHEP tests for multivariate normality. Journal of Multivariate Analysis 62, 1–23.

[13] Vasicek, Oldrich (1976). "A Test for Normality Based on Sample Entropy". Journal of the Royal Statistical Society. Series B (Methodological). 38 (1): 54–59. JSTOR 2984828.

[14] Young K. D. S. (1993), "Bayesian diagnostics for checking assumptions of normality". Journal of Statistical Computation and Simulation, 47 (3–4),167–180

[15] Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi:10.1093/biomet/67.2.493

[16] Farrell, P.J., Rogers-Stewart, K. (2006) "Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test". Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi:10.1080/10629360500109023

[17] Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi:10.1093/biomet/67.2.493

[18] Farrell, P.J., Rogers-Stewart, K. (2006) "Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test". Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi:10.1080/10629360500109023

[19] Portney, L.G. & Watkins, M.P. (2000). Foundations of clinical research: applications to practice. New Jersey: Prentice Hall Health. pp. 516–517. ISBN 0838526950.{{cite book}}: CS1 maint: multiple names: authors list (link)

[20] Pek, Jolynn; Wong, Octavia; Wong, Augustine C. M. (2018-11-06). "How to Address Non-normality: A Taxonomy of Approaches, Reviewed, and Illustrated". Frontiers in Psychology. 9: 2104. doi:10.3389/fpsyg.2018.02104. ISSN 1664-1078. PMC 6232275. PMID 30459683.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

@@ 第33行： / 第33行： @@
 * [[Pearson's chi-squared test|Pearson卡方检验]]。
-年的一项研究得出结论，Shapiro–Wilk检验在给定显著性水平下具有最佳的[[Power of a test|检验功效]]，其次是Anderson–Darling检验，特别是在比较Shapiro–Wilk、Kolmogorov–Smirnov、Lilliefors和Anderson–Darling检验时。<ref>{{cite journal |last1=Razali |first1=Nornadiah |last2=Wah |first2=Yap Bee |title=Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests |journal=Journal of Statistical Modeling and Analytics |year=2011 |volume=2 |issue=1 |pages=21–33 |url= <!--|accessdate=5 June 2012--> |archiveurl= |archivedate=2015-06-30 }}</ref>
+年的一项研究得出结论，Shapiro–Wilk检验在给定显著性水平下具有最佳的[[Power of a test|检验功效]]，其次是Anderson–Darling检验，特别是在比较Shapiro–Wilk、Kolmogorov–Smirnov、Lilliefors和Anderson–Darling检验时。<ref>{{cite journal |last1=Razali |first1=Nornadiah |last2=Wah |first2=Yap Bee |title=Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests |journal=Journal of Statistical Modeling and Analytics |year=2011 |volume=2 |issue=1 |pages=21–33 |url= <!--|accessdate=5 June 2012--> }}</ref>
 一些发表的作品推荐Jarque–Bera检验，<ref>{{cite book |last1=Judge |first1=George G. |last2=Griffiths |first2=W. E. |last3=Hill |first3=R. Carter |last4=Lütkepohl |first4=Helmut |authorlink4=Helmut Lütkepohl |last5=Lee |first5=T. |year=1988 |title=Introduction to the Theory and Practice of Econometrics |edition=Second |pages=890–892 |publisher=Wiley |isbn=978-0-471-08277-4 |url=https://books.google.com/books?id=Iyy7AAAAIAAJ&pg=PA890 }}</ref><ref>{{cite book |last=Gujarati |first=Damodar N. |year=2002 |title=Basic Econometrics |edition=Fourth |pages=147–148 |publisher=McGraw Hill |isbn=978-0-07-123017-9 }}</ref> 但这种检验存在弱点。特别是，对于短尾分布，尤其是双峰分布，该检验的功效较低。<ref>{{cite journal|last=Thadewald|first=Thorsten|author2=Büning, Herbert|title=Jarque–Bera Test and its Competitors for Testing Normality – A Power Comparison|journal=Journal of Applied Statistics|date=1 January 2007|volume=34|issue=1|pages=87–105 |doi=10.1080/02664760600994539 |citeseerx=10.1.1.507.1186|s2cid=13866566 }}</ref> 由于整体表现不佳，一些作者选择不在其研究中包含此检验的结果。<ref>{{cite journal |last=Sürücü |first=Barış |title=A power comparison and simulation study of goodness-of-fit tests |journal=Computers & Mathematics with Applications |date=1 September 2008 |volume=56 |issue=6 |pages=1617–1625 |doi=10.1016/j.camwa.2008.03.010 |doi-access=free }}</ref>