Disons qu'il existe une "vraie" relation entre et telle que , où et sont des constantes et est un bruit normal. Lorsque je génère de manière aléatoire des données à partir de ce code R: puis que je rentre dans un modèle , je reçois évidemment des estimations raisonnablement bonnes pour et .x <- 1:100; y <- ax + b + rnorm(length(x))
y ~ x
(x ~ y)
Cependant, si je modifie le rôle des variables comme dans , puis que je réécris le résultat pour que soit fonction de , la pente résultante est toujours plus raide (plus négative ou plus positive) que celle estimée par la régression. J'essaie de comprendre exactement pourquoi et j'apprécierais que quelqu'un me donne une intuition sur ce qui se passe là-bas.y ~ x
regression
Greg Aponte
la source
la source
Réponses:
Soit points de données , dans le plan, traçons une droite . Si nous prédisons comme la valeur de , alors l' erreur est , l' erreur au carré est , et l' erreur quadratique totale . Nous demandonsn (xi,yi),i=1,2,…n y=ax+b axi+b y^i yi (yi−y^i)=(yi−axi−b) (yi−axi−b)2 ∑ni=1(yi−axi−b)2
Puisque est la distance verticale de partir de la droite, nous demandons la ligne telle que la somme des carrés des distances verticales des points à partir de la droite soit aussi petite que possible. Maintenant, est une fonction quadratique de et et atteint sa valeur minimale lorsque et sont tels que À partir de la deuxième équation, nous obtenons où(yi−axi−b) (xi,yi) S a b a b
If we interchange the roles ofx and y , draw a line
x=a^y+b^ , and ask for the values of
a^ and b^ that minimize
Note that both lines pass through the point(μx,μy)
but the slopes are
la source
Just to illustrate Dilip’s answer: on the following pictures,
y ~ x
, which minimize the squares of the length of the red segments;x ~ y
, which minimize the squares of the length of the red segments.Edit (least rectangles regression)
If there is no natural way to chose a "response" and a "covariate", but rather the two variables are interdependent you may wish to conserve a symmetrical role fory and x ; in this case you can use "least rectangles regression."
Here is an illustration with the same data points, for each point, a "rectangle" is computed as the product of the length of two red segments, and the sum of rectangles is minimized. I don’t know much about the properties of this regression and I don’t find much with google.
la source
Just a brief note on why you see the slope smaller for one regression. Both slopes depend on three numbers: standard deviations ofx and y (sx and sy ), and correlation between x and y (r ). The regression with y as response has slope rsysx and the regression with x as response has slope rsxsy , hence the ratio of the first slope to the reciprocal of the second is equal to r2≤1 .
So the greater the proportion of variance explained, the closer the slopes obtained from each case. Note that the proportion of variance explained is symmetric and equal to the squared correlation in simple linear regression.
la source
A simple way to look at this is to note that, if for the true modely=α+βx+ϵ , you run two regressions:
Then we have, usingby∼x=cov(x,y)var(x)=cov(x,y)var(y)var(y)var(x) :
So whether you get a steeper slope or not just depends on the ratiovar(y)var(x) . This ratio is equal to, based on the assumed true model:
Link with other answers
You can connect this result with the answers from others, who said that whenR2=1 , it should be the reciprocal. Indeed, R2=1⇒var(ϵ)=0 , and also, by∼x=β (no estimation error), Hence:
Sobx∼y=1/β
la source
It becomes interesting when there is also noise on your inputs (which we could argue is always the case, no command or observation is ever perfect).
I have built some simulations to observe the phenomenon, based on a simple linear relationshipx=y , with Gaussian noise on both x and y. I generated the observations as follows (python code):
See the different results (odr here is orthogonal distance regression, i.e. the same as least rectangles regression):
All the code is in there:
https://gist.github.com/jclevesque/5273ad9077d9ea93994f6d96c20b0ddd
la source
Regression line is not (always) the same as true relationship
You may have some 'true' causal relationship like
but fitted regression lines
y ~ x
orx ~ y
do not mean the same as that causal relationship (even when in practice the expression for one of the regression line may coincide with the expression for the causal 'true' relationship)More precise relationship between slopes
For two switched simple linear regressions:
you can relate the slopes as following:
So the slopes are not each other inverse.
Intuition
The reason is that
You can imagine that the conditional probability relates to the strength of the relationship. Regression lines reflect this and the slopes of the lines may be both shallow when the strength of the relationship is small or both steep when the strength of the relationship is strong. The slopes are not simply each others inverse.
Example
If two variablesX and Y relate to each other by some (causal) linear relationship Y=a little bit of X+ a lot of error
Then you can imagine that it would not be good to entirely reverse that relationship in case you wish to express X based on a given value of Y .
Instead of
it would be better to also use
See the following example distributions with their respective regression lines. The distributions are multivariate normal withΣ11Σ22=1 and Σ12=Σ21=ρ
The conditional expected values (what you would get in a linear regression) are
and in this case withX,Y a multivariate normal distribution, then the marginal distributions are
So you can see the variable Y as being a partρX and a part noise with variance 1−ρ2 . The same is true the other way around.
The larger the correlation coefficientρ , the closer the two lines will be. But the lower the correlation, the less strong the relationship, the less steep the lines will be (this is true for both lines
Y ~ X
andX ~ Y
)la source
The short answer
The goal of a simple linear regression is to come up with the best predictions of the
y
variable, given values of thex
variable. This is a different goal than trying to come up with the best prediction of thex
variable, given values of they
variable.Simple linear regression of
y ~ x
gives you the 'best' possible model for predictingy
givenx
. Hence, if you fit a model forx ~ y
and algebraically inverted it, that model could at its very best do only as well as the model fory ~ x
. But inverting a model fit forx ~ y
will usually do worse at predictingy
givenx
, compared to the 'optimal'y ~ x
model, because the "invertedx ~ y
model" was created to fulfill a different objective.Illustration
Imagine you have the following dataset:
When you run an OLS regression of
y ~ x
, you come up with the following modelThis optimizes predictions of
y
by making the following predictions, which have associated errors:The OLS regression's predictions are optimal in the sense that the sum of the values in the rightmost column (i.e. the sum of squares) is as small as can be.
When you run an OLS regression of
x ~ y
, you come up with a different model:This optimizes predictions of x by making the following predictions, with associated errors.
Again, this is optimal in the sense that the sum of the values of the rightmost column are as small as possible (equal to
0.071
).Now, imagine you tried to just invert the first model,
y = 0.167 + 1.5*x
, using algebra, giving you the modelx = -0.11 + 0.67*x
.This would give you the following predictions and associated errors:
The sum of the values in the rightmost column is
0.074
, which is larger than the corresponding sum from the model you get from regressing x on y, i.e. thex ~ y
model. In other words, the "invertedy ~ x
model" is doing a worse job at predicting x than the OLS model ofx ~ y
.la source