Relation entre

40

Disons que j'ai deux tableaux à dimension, $a_1$ et . Chacun contient 100 points de données. correspond aux données réelles et la prédiction du modèle. Dans ce cas, la valeur de serait: Entre-temps, cela serait égal à la valeur carrée du coefficient de corrélation, Maintenant, si j'échange les deux: correspond aux données réelles et correspond à la prédiction du modèle. À partir de l'équation $a_2$ $a_1$ $a_2$ $R^2$

R^{2} = 1 - \frac{S S_{r e s}}{S S_{t o t}} (1) .

$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} \quad\quad\quad\quad\quad\ \ \quad\quad(1).$

R^{2} = (Correlation Coefficient)^{2} (2) .

$R^2 = (\text{Correlation Coefficient})^2 \quad (2).$

a_{2}

$a_2$

a_{1}

$a_1$

(2)

$(2)$ , parce que le coefficient de corrélation importe peu ce qui vient en premier, la valeur

R^{2}

$R^2$ serait la même. Cependant, d'après l'équation

(1)

$(1)$ ,

S S_{t o t} = \sum_{i} (y_{i} - \bar{y})^{2}

$SS_{tot}=\sum_i(y_i - \bar y )^2$ , la valeur

R^{2}

$R^2$ changera, car

S S_{t o t}

$SS_{tot}$ a changé si nous passons

y

$y$ de

a_{1}

$a_1$ à

a_{2}

$a_2$ ; pendant ce temps,

S S_{r e s} = \sum_{i} (f_{i} - \bar{y})^{2}

$SS_{res}=\sum_i(f_i-\bar y)^2$ ne change pas.

Ma question est la suivante: comment peuvent-ils se contredire?

Modifier :

Je me demandais cela, sera la relation dans Eq. (2) tient toujours, s'il ne s'agit pas d'une simple régression linéaire, c'est-à-dire que la relation entre IV et DV n'est pas linéaire (pourrait être exponentielle / log)?
Cette relation sera-t-elle toujours valide si la somme des erreurs de prédiction n'est pas égale à zéro?

correlation r-squared Shawn Wang
la source

J'ai trouvé cette présentation très utile et non technique: google.com/…

ihadanny

19

Cela est vrai que changera ... mais vous avez oublié le fait que la somme de régression des carrés changera aussi. Considérons donc le modèle de régression simple et désignons le coefficient de corrélation par $SS_{tot}$ , où j'ai utilisé le sous-indicepour souligner le fait queest la variable indépendante etla variable dépendante. Évidemment, est inchangé si vous échangezavec. On peut facilement montrer que, oùest la somme de régression des carrés et $r_{xy}^2=\dfrac{S_{xy}^2}{S_{xx}S_{yy}}$ $xy$ $x$ $y$ $r_{xy}^2$ $x$ $y$ $SSR_{xy}=S_{yy}(R_{xy}^2)$ $SSR_{xy}$ est la somme totale des carrés où est indépendant et est une variable dépendante. Donc: $S_{yy}$ $x$ $y$ oùest la somme résiduelle de carrés correspondant oùest indépendant etest variable dépendante. Notez que dans ce cas, nous avonsavec

R_{x y}^{2} = \frac{S S R_{x y}}{S_{y y}} = \frac{S_{y y} - S S E_{x y}}{S_{y y}},

$R_{xy}^2=\dfrac{SSR_{xy}}{S_{yy}}=\dfrac{S_{yy}-SSE_{xy}}{S_{yy}},$

S S E_{x y}

$SSE_{xy}$

x

$x$

y

$y$

S S E_{x y} = b_{x y}^{2} S_{x x}

$SSE_{xy}=b^2_{xy}S_{xx}$

(Voir, par exemple, l'équation (34) - (41)ici.) Par conséquent:

b = \frac{S_{x y}}{S_{x x}}

$b=\dfrac{S_{xy}}{S_{xx}}$

Clairement, l'équation ci-dessus est symétrique par rapport à

et

. En d'autres termes:

En résumé, lorsque vous modifiez

avec

dans le modèle de régression simple, le numérateur et le dénominateur de

R_{x y}^{2} = \frac{S_{y y} - \frac{S_{x y}^{2}}{S_{x x}^{2}} . S_{x x}}{S_{y y}} = \frac{S_{y y} S_{x x} - S_{x y}^{2}}{S_{x x} . S_{y y}} .

$R_{xy}^2=\dfrac{S_{yy}-\dfrac{S^2_{xy}}{S^2_{xx}}.S_{xx}}{S_{yy}}=\dfrac{S_{yy}S_{xx}-S^2_{xy}}{S_{xx}.S_{yy}}.$

x

$x$

y

$y$

R_{x y}^{2} = R_{y x}^{2} .

$R_{xy}^2=R_{yx}^2.$

x

$x$

y

$y$

changera de manière à ce que

R_{x y}^{2} = \frac{S S R_{x y}}{S_{y y}}

$R_{xy}^2=\dfrac{SSR_{xy}}{S_{yy}}$

R_{x y}^{2} = R_{y x}^{2} .

$R_{xy}^2=R_{yx}^2.$

Stat
la source

Merci beaucoup! J'ai remarqué que c'était peut-être là que j'avais tort:

n'est valable que si 1) la prédiction du modèle est une ligne droite et 2) la moyenne de la prédiction du modèle est égale à la moyenne des points de l'échantillon. Si la relation entre DV et IV n'est pas une ligne droite ou si la somme des erreurs de prédiction est non nulle, la relation ne sera pas maintenue. Pourriez-vous s'il vous plaît laissez-moi savoir si cela est correct?

R^{2} = r^{2}

$R^2 = r^2$

Shawn Wang

1

Je pensais à cela parce que vous avez utilisé

, alors que j'utilisais l'équation que je poste dans le OP. Ces deux équations ne sont équivalentes que lorsque la somme des erreurs de prédiction est égale à zéro. Par conséquent, dans mon OP,

ne change pas pendant

changé, et par conséquent le

R^{2} = S S_{r e g} / S S_{t o t}

$R^2=SS_{reg}/SS_{tot}$

S S_{r e s} = \sum_{i} (f_{i} - \bar{y})^{2}

$SS_{res}=\sum_i(f_i-\bar y)^2$

S S_{t o t}

$SS_{tot}$

R^{2}

$R^2$ est changé.

Shawn Wang

Avez-vous une référence pour savoir comment résoudre ce problème dans le cas général des Gaussiennes p-variables?

jmb

26

Une façon d'interpréter le coefficient de détermination est de regarder comme le coefficient de corrélation Pearson au carré entre les valeurs observées et les valeurs ajustées . $R^{2}$ $y_{i}$ $\hat{y}_{i}$

La preuve complète permettant de déduire le coefficient de détermination R2 du coefficient de corrélation de Squared Pearson entre les valeurs observées yi et les valeurs ajustées y ^ i est disponible sous le lien suivant:

http://economictheoryblog.wordpress.com/2014/11/05/proof/

In my eyes it should be pretty easy to understand, just follow the single steps. I guess looking at it is essential to understand how the realtionship between the two key figures actually works.

Andreas Dibiasi
la source

6

In case of simple linear regression with only one predictor $R^2 = r^2 = Corr(x,y)^2$ . But in multiple linear regression with more than one predictors the concept of correlation between the predictors and the response does not extend automatically. The formula gets:

R^{2} = C o r r (y_{e s t i m a t e d}, y_{o b s e r v e d})^{2}

$R^2 = Corr(y_{estimated},y_{observed})^2$

The square of the correlation between the response and the fitted linear model.

aman
la source

5

@Stat has provided a detailed answer. In my short answer I'll show briefly in somewhat different way what is the similarity and difference between $r$ and $r^2$ .

$r$ is the standardized regression coefficient beta of $Y$ by $X$ or of $X$ by $Y$ and as such, it is a measure of the (mutual) effect size. Which is most clearly seen when the variables are dichotomous. Then $r$ , for example, $.30$ means that 30% of cases will change its value to opposite in one variable when the other variable changes its value to the opposite.

$r^2$ , on the other hand, is the expression of the proportion of co-variability in the total variability: $r^2 = (\frac {cov}{\sigma_x \sigma_y})^2 = \frac {|cov|} {\sigma_x^2} \frac {|cov|} {\sigma_y^2}$ . Note that this is a product of two proportions, or, more precise to say, two ratios (a ratio can be >1). If loosely imply any proportion or ratio to be a quasi-probability or propensity, then $r^2$ expresses "joint probability (propensity)". Another and as valid expression for the joint product of two proportions (or ratios) would be their geometric mean, $\sqrt{prop*prop}$ , which is very $r$ .

(The two ratios are multiplicative, not additive, to stress the idea that they collaborate and cannot compensate for each other, in their teamwork. They have to be multiplicative because the magnitude of $cov$ is dependent on both magnitudes $\sigma_x^2$ and $\sigma_y^2$ and, conformably, $cov$ has to be divided two times in once - in order to convert itself to a proper "proportion of the shared variance". But $cov$ , the "cross-variance", shares the same measurement units with both $\sigma_x^2$ and $\sigma_y^2$ , the "self-variances", and not with $\sigma_x \sigma_y$ , the "hybrid variance"; that is why $r^2$ , not $r$ , is more adequate as the "proportion of shared variance".)

So, you see that meaning of $r$ and $r^2$ as a measure of the quantity of the association is different (both meanings valid), but still these coefficients in no way contradict each other. And both are the same whether you predict $Y\text~X$ or $X\text~Y$ .

ttnphns
la source

Thank you so much! I am starting to wonder whether I am using the wrong definition, that two definitions of

R^{2}

$R^2$ co-exist and they are not equivalent to each other. Could you please help me in the question that - if I am thinking about more generalized cases where the model is not a simple linear regression (could be exponential) - is my equation in the OP still correct for calculating

R^{2}

$R^2$ ? Is this a different quantity, also called

R^{2}

$R^2$ , but different from the "coefficient of determination"?

Shawn Wang

Coefficient of determination or R-square is a wider concept than r^2 which is only about simple linear regression. Please read wikipedia en.wikipedia.org/wiki/Coefficient_of_determination.

ttnphns

Thanks again! That I do understand. My question is: for more complex regressions, can I still square the r value to get the coefficient of determination?

Shawn Wang

1

For a "complex regression", you get R-square, but you don't get r.

ttnphns

1

I think you might be mistaken. If $R^2=r^2$ , I assume you have a bivariate model: one DV, one IV. I don't think $R^2$ will change if you swap these, nor if you replace the IV with the predictions of the DV that are based on the IV. Here's code for a demonstration in R:

x=rnorm(1000); y=rnorm(1000)              # store random data
summary(lm(y~x))                          # fit a linear regression model (a)
summary(lm(x~y))                          # swap variables and fit the opposite model (b)
z=lm(y~x)$fitted.values; summary(lm(y~z)) # substitute predictions for IV in model (a)

If you aren't working with a bivariate model, your choice of DV will affect $R^2$ ...unless your variables are all identically correlated, I suppose, but this isn't much of an exception. If all the variables have identical strengths of correlation and also share the same portions of the DV's variance (e.g. [or maybe "i.e."], if some of the variables are completely identical), you could just reduce this to a bivariate model without losing any information. Whether you do or don't, $R^2$ still wouldn't change.

In all other cases I can think of with more than two variables, $R^2\ne r^2$ where $R^2$ is the coefficient of determination and $r$ is a bivariate correlation coefficient of any kind (not necessarily Pearson's; e.g., possibly also a Spearman's $\rho$ ).

Nick Stauner
la source

1

I recently did Theil linear regression then calculated

R^{2} = - 0.1468

$R^2=–0.1468$ and

S S R > S S T

$SSR>SST$ . I have seen Excel produce

- R^{2}

$-R^2$ -values as well, and at first I laughed at it, then slowly came understanding and it ceased to be funny. So is the general definition of

R^{2}

$R^2$ correct? What gives.

Carl

Relation entre

Réponses: