相关性之所以有用,是因为它们可以指示出可以在实践中利用的预测关系。例如,电力公司可能在温和的天气下根据电力需求和天气之间的相关性减少发电量。在这个例子中,存在一个[[causality|因果关系]],因为[[extreme weather|极端天气]]导致人们使用更多的电力进行取暖或制冷。然而,一般而言,存在相关性并不足以推断出存在因果关系(即[[correlation does not imply causation|相关性并不意味着因果性]])。
正式地说,如果随机变量不满足[[independence (probability theory)|概率独立性]]的数学特性,它们就是''依赖的''。在非正式用语中,''相关性''与''依赖性''是同义的。然而,在技术意义上使用时,相关性指的是[[Conditional expectation|测试变量及其各自期望值]]之间的几种特定类型的数学运算之一。本质上,相关性是衡量两个或多个变量之间关系的度量。有几种[[correlation coefficient|相关系数]],通常用[math]\rho[/math]或[math]r[/math]表示,用以衡量相关程度。其中最常见的是''[[Pearson product-moment correlation coefficient|皮尔逊相关系数]]'',它只对两个变量之间的线性关系敏感(即使当其中一个变量是另一个的非线性函数时,这种关系也可能存在)。其他的相关系数——如''[[Spearman's rank correlation coefficient|斯皮尔曼等级相关]]''——已被开发出来,比皮尔逊的更加[[robust statistics|健壮]],即对非线性关系更敏感。<ref>Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) ''Applied General Statistics'', Pitman. {{ISBN|9780273403159}} (page 625)</ref><ref>Dietrich, Cornelius Frank (1991) ''Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement'' 2nd Edition, A. Higler. {{ISBN|9780750300605}} (Page 331)</ref><ref>Aitken, Alexander Craig (1957) ''Statistical Mathematics'' 8th Edition. Oliver & Boyd. {{ISBN|9780050013007}} (Page 95)</ref> [[Mutual information|互信息]]也可以用来衡量两个变量之间的依赖性。
* {{cite book |author=Cohen, J. |author2=Cohen P. |author3=West, S.G. |author4=Aiken, L.S.|author4-link= Leona S. Aiken |name-list-style=amp |year=2002 |title=Applied multiple regression/correlation analysis for the behavioral sciences |edition=3rd |publisher=Psychology Press |isbn= 978-0-8058-2223-6 }}
* {{springer|title=Correlation (in statistics)|id=p/c026560}}
* {{cite book|last1=Oestreicher|first1=J. & D. R.|title=Plague of Equals: A science thriller of international disease, politics and drug discovery|date=February 26, 2015|publisher=Omega Cat Press|location=California|isbn=978-0963175540|pages=408}}
==外部链接==
{{Wiktionary|correlation|dependence}}
{{Commons category|Correlation}}
{{Wikiversity|Correlation}}
* [http://mathworld.wolfram.com/CorrelationCoefficient.html MathWorld page on the (cross-)correlation coefficient/s of a sample]
* [http://peaks.informatik.uni-erlangen.de/cgi-bin/usignificance.cgi Compute significance between two correlations], for the comparison of two correlation values.
* {{cite web|url=http://www.mathworks.com/matlabcentral/fileexchange/20846|title=A MATLAB Toolbox for computing Weighted Correlation Coefficients|archive-url=https://web.archive.org/web/20210424091029/https://www.mathworks.com/matlabcentral/fileexchange/20846-weighted-correlation-matrix|archive-date=24 April 2021}}
* [https://www.scribd.com/doc/299546673/Proof-that-the-Sample-Bivariate-Correlation-has-limits-plus-or-minus-1 Proof that the Sample Bivariate Correlation has limits plus or minus 1]
* [http://nagysandor.eu/AsimovTeka/correlation_en/index.html Interactive Flash simulation on the correlation of two normally distributed variables] by Juha Puranen.
如果 <math>x</math> 和 <math>y</math> 是包含测量误差的测量结果,相关系数的实际限制不是−1到+1,而是更小的范围。<ref>{{cite journal|last=Francis|first=DP|author2=Coats AJ|author3=Gibson D|title=How high can a correlation coefficient be?|journal=Int J Cardiol|year=1999|volume=69|pages=185–199|doi=10.1016/S0167-5273(99)00028-5|issue=2|pmid=10549842}}</ref> 对于单独自变量的线性模型,[[Coefficient of determination|决定系数(R平方)]]是 <math>r_{xy}</math>,即皮尔逊积矩系数的平方。
==示例==
考虑下表中给出的 {{mvar|X}} 和 {{mvar|Y}} 的[[joint probability distribution|联合概率分布]]。
{{Main|Spearman's rank correlation coefficient|Kendall tau rank correlation coefficient}}
像[[Spearman's rank correlation coefficient]](斯皮尔曼等级相关系数)和[[Kendall's tau|Kendall的秩相关系数 (τ)]]这样的秩相关系数,用来衡量随着一个变量增加,另一个变量也倾向于增加的程度,而不要求这种增加必须通过线性关系来表示。如果当一个变量增加时,另一个变量''减少'',秩相关系数将是负的。通常认为这些秩相关系数是皮尔逊系数的替代品,用于减少计算量或使系数对分布中的非正态性不那么敏感。然而,这种观点在数学上并无太多依据,因为秩相关系数衡量的是与[[Pearson product-moment correlation coefficient]](皮尔逊积矩相关系数)不同类型的关系,并且最好被视为不同类型的关联度量,而不是人口相关系数的另一种度量。<ref name="Yule and Kendall">Yule, G.U and Kendall, M.G. (1950), "An Introduction to the Theory of Statistics", 14th Edition (5th Impression 1968). Charles Griffin & Co. pp 258–270</ref><ref name="Kendall Rank Correlation Methods">Kendall, M. G. (1955) "Rank Correlation Methods", Charles Griffin & Co.</ref>
仅靠相关系数所提供的信息并不足以定义随机变量之间的依赖结构。<ref name="wilmottM.com">{{cite journal|author=Mahdavi Damghani B.|title=The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model|journal=Wilmott Magazine|volume=2013|issue=67|pages=50–61|year=2013|doi=10.1002/wilm.10252 }}</ref> 在某些特殊情况下,例如当分布是[[multivariate normal distribution]]时,相关系数完全定义了依赖结构。在[[elliptical distribution]]的情况下,它描述了等密度的(超)椭圆;然而,它并不完全描述依赖结构(例如,[[multivariate t-distribution]]的自由度决定了尾部依赖的程度)。
[[Distance correlation]]<ref>{{cite journal | last1 = Székely | first1 = G. J. Rizzo | last2 = Bakirov | first2 = N. K. | year = 2007 | title = Measuring and testing independence by correlation of distances | journal = [[Annals of Statistics]] | volume = 35 | issue = 6| pages = 2769–2794 | doi = 10.1214/009053607000000505 | arxiv = 0803.4101 | s2cid = 5661488 }}</ref><ref>{{cite journal | last1 = Székely | first1 = G. J. | last2 = Rizzo | first2 = M. L. | year = 2009 | title = Brownian distance covariance | journal = Annals of Applied Statistics | volume = 3 | issue = 4| pages = 1233–1303 | doi = 10.1214/09-AOAS312 | pmid = 20574547 | pmc = 2889501 | arxiv = 1010.0297 }}</ref> 被引入是为了弥补Pearson相关系数的不足,即它可能对依赖的随机变量为零;零距离相关性意味着独立。
随机依赖系数<ref>Lopez-Paz D. and Hennig P. and Schölkopf B. (2013). "The Randomized Dependence Coefficient", "[[Conference on Neural Information Processing Systems]]" [ Reprint]</ref> 是一个基于[[Copula (probability theory)|copula]]的多变量随机变量之间依赖性的计算效率高的衡量方法。RDC对随机变量的非线性缩放不变,能够发现广泛的功能关联模式,并在独立时取值为零。
2002年,Higham<ref>{{cite journal|title=Computing the nearest correlation matrix—a problem from finance|journal=IMA Journal of Numerical Analysis|date=2002|first=Nicholas J.|last=Higham|volume=22|issue=3|pages=329–343|doi=10.1093/imanum/22.3.329|citeseerx=10.1.1.661.2180}}</ref> 使用[[Frobenius norm|弗罗贝尼乌斯范数]]定义了接近度的概念,并提供了一种使用[[Dykstra's projection algorithm|迪克斯特拉投影算法]]计算最近相关矩阵的方法,该方法可以通过在线Web API实现。<ref>{{Cite web|url=https://portfoliooptimizer.io/|title=Portfolio Optimizer |website=portfoliooptimizer.io|access-date=2021-01-30}}</ref>
这激发了人们对该主题的兴趣,随后几年中产生了新的理论(例如,计算具有因子结构的最近相关矩阵<ref>{{cite journal|title=Computing a Nearest Correlation Matrix with Factor Structure.|journal= SIAM J. Matrix Anal. Appl.|date=2010|first1=Rudiger|last1=Borsdorf|first2=Nicholas J.|last2=Higham|first3=Marcos|last3=Raydan|volume=31|issue=5|pages=2603–2622|doi=10.1137/090776718|url= http://eprints.maths.manchester.ac.uk/1523/1/SML002603.pdf}}</ref>)和数值(例如,使用[[Newton's method]]计算最近相关矩阵<ref>{{cite journal|title=A quadratically convergent Newton method for computing the nearest correlation matrix.|journal= SIAM J. Matrix Anal. Appl.|date=2006|first1=HOUDUO|last1=Qi|first2=DEFENG|last2=Sun|volume=28|issue=2|pages=360–385|doi=10.1137/050624509}}</ref>)方面的结果。
==随机过程的不相关性和独立性==
类似地,对于两个随机过程[math]\left\{ X_t \right\}_{t\in\mathcal{T}}[/math]和[math]\left\{ Y_t \right\}_{t\in\mathcal{T}}[/math]:如果它们是独立的,那么它们是不相关的。<ref name=KunIlPark>{{cite book | author=Park, Kun Il| title=Fundamentals of Probability and Stochastic Processes with Applications to Communications| publisher=Springer | year=2018 | isbn=978-3-319-68074-3}}</ref>{{rp|p. 151}} 这个陈述的相反可能不成立。即使两个变量不相关,它们也可能不是彼此独立的。
==常见误解==
===相关性和因果关系===
{{Main|Correlation does not imply causation}} {{See also|Normally distributed and uncorrelated does not imply independent}}
常规的格言“[[correlation does not imply causation]]”意味着不能单独使用相关性来推断变量之间的因果关系。<ref>{{cite journal | last=Aldrich | first=John | journal=Statistical Science | volume=10 | issue=4 | year=1995 | pages=364–376 | title=Correlations Genuine and Spurious in Pearson and Yule | jstor=2246135 | doi=10.1214/ss/1177009870| doi-access=free }}</ref> 这个格言不应被理解为相关性不能指示因果关系的潜在存在。然而,相关性背后的原因(如果有的话)可能是间接和未知的,高相关性也与[[identity (mathematics)|同一性]]关系([[tautology (logic)|重言式]])重叠,其中不存在因果过程。因此,两个变量之间的相关性不是建立因果关系(无论哪个方向)的充分条件。
常规的格言“correlation does not imply causation”意味着不能单独使用相关性来推断变量之间的因果关系。[20] 这个格言不应被理解为相关性不能指示因果关系的潜在存在。然而,相关性背后的原因(如果有的话)可能是间接和未知的,高相关性也与同一性关系(重言式)重叠,其中不存在因果过程。因此,两个变量之间的相关性不是建立因果关系(无论哪个方向)的充分条件。
↑Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN9780273403159 (page 625)
↑Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler. ISBN9780750300605 (Page 331)
↑Aitken, Alexander Craig (1957) Statistical Mathematics 8th Edition. Oliver & Boyd. ISBN9780050013007 (Page 95)
↑ 7.07.1Yule, G.U and Kendall, M.G. (1950), "An Introduction to the Theory of Statistics", 14th Edition (5th Impression 1968). Charles Griffin & Co. pp 258–270
↑Kendall, M. G. (1955) "Rank Correlation Methods", Charles Griffin & Co.
↑Mahdavi Damghani B. (2013). "The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model". Wilmott Magazine. 2013 (67): 50–61. doi:10.1002/wilm.10252.
↑Thorndike, Robert Ladd (1947). Research problems and techniques (Report No. 3). Washington DC: US Govt. print. off.
↑Nikolić, D; Muresan, RC; Feng, W; Singer, W (2012). "Scaled correlation analysis: a better way to compute a cross-correlogram". European Journal of Neuroscience. 35 (5): 1–21. doi:10.1111/j.1460-9568.2011.07987.x. PMID22324876. S2CID4694570.
↑Qi, HOUDUO; Sun, DEFENG (2006). "A quadratically convergent Newton method for computing the nearest correlation matrix". SIAM J. Matrix Anal. Appl. 28 (2): 360–385. doi:10.1137/050624509.
↑Park, Kun Il (2018). Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer. ISBN978-3-319-68074-3.
Oestreicher, J. & D. R. (February 26, 2015). Plague of Equals: A science thriller of international disease, politics and drug discovery. California: Omega Cat Press. p. 408. ISBN978-0963175540.