Testwiki:Reference desk/Archives/Mathematics/2023 July 7

From testwiki
Jump to navigation Jump to search

Template:Error:not substituted

{| width = "100%"

|- ! colspan="3" align="center" | Mathematics desk |- ! width="20%" align="left" | < July 6 ! width="25%" align="center"|<< Jun | July | Aug >> ! width="20%" align="right" |Current desk > |}

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


July 7

Regression to the mean

Regression to the mean is a pretty old concept but I couldn't see something I thought would be mentioned there. Am I missing something? The article gives an estimate that the regression coefficient for height is 2/3. Assuming the next generation has the same distribution as their parents and the distribution for each child is the same, then I work out that the standard deviation of the heights of the children for each family as sqrt(1 - (2/3)^2) or about 3/4 of the overall standard deviation. Hope I've got that right! Does this come under something else perhaps? NadVolum (talk) 22:39, 7 July 2023 (UTC)

Let's assume, to keep this tractable, that the heights of any two parents are iid random variables with the same distribution as the population. Also, for simplicity, introduce the concept of z-height, which is a linear transformation of height h to z-height z=(hμ)/σ, so that the z-height can be treated as having the standard normal distribution. Also, let's simply define the mid-parent height as the arithmetic mean (z+z)/2 of the two heights of the parents of a child. Let c be the linear regression coefficient of child height (the dependent variable) with respect to mid-parent height, so we can write the child height as ζ=c(z+z)/2+r, in which the random variable r is the residual. By definition, the expected value of r is 0; moreover, it may be assumed to be independent of the mid-parent height. Now Var(ζ)=c2(Var()+Var(z))/4+Var(r). We have defined z-height such that Var()=Var(z)=1. If the height-reproducing process is stationary (the z-height ζ of offspring also has the standard normal distribution), also Var(ζ)=1. This then implies that Var(r)=1c2/2. Specifically, when c=2/3, this comes out as 7/9, not as 1(2/3)2=5/9.  --Lambiam 12:03, 8 July 2023 (UTC)
Yes I'd assumed in effect a single parent with the original distribution rather than two parents at random from it which would make the standard deviation of height for the children neary 90% of that of the population as a whole - which I think is a bit surprising. But don't you think this is interesting enough it is surprising people don't seem to have made the calculation never mind shown how it works out in a case with two parents like this? And for height I must admit it does seem to me the association is rather random rather than tall people marrying tall ones and short marrying short! In other cases like batters hitting in a second season compared to the first there would not be two parents but there may be other factors and they may be interesting. NadVolum (talk) 12:41, 8 July 2023 (UTC)
It is interesting, but see WP:NOR. Trying to orient implication in the direction of cause → effect, it may be better to interpret the relation as Var(z)=Var(r)/(1c2/2) giving the steady-state population variance, given the regression coefficient and residual variance. BTW, many studies have found that preferences for similar height in mating selection are reflected in a correlation between the heights of actual couples.[1]  --Lambiam 14:02, 8 July 2023 (UTC)