Pourquoi RSS est-il distribué chi carré times np?

Je voudrais comprendre pourquoi, sous le modèle OLS, le RSS (somme résiduelle des carrés) est distribué

χ^{2} \cdot (n - p)

$\chi^2\cdot (n-p)$ (

p

$p$ étant le nombre de paramètres dans le modèle, le nombre d'observations).

n

$n$

Je m'excuse d'avoir posé une question aussi fondamentale, mais il semble que je ne puisse pas trouver la réponse en ligne (ou dans mes manuels, plus orientés vers les applications).

regression distributions least-squares Tal Galili
la source

Notez que les réponses démontrent que l'assertion n'est pas tout à fait correcte: la distribution de RSS est

σ^{2}

$\sigma^2$ (pas

n - p

$n-p$ ) fois une distribution

χ^{2} (n - p)

$\chi^2(n-p)$ où

σ^{2}

$\sigma^2$ est la vraie variance des erreurs.

whuber

Réponses:

Je considère le modèle linéaire suivant: . ${y} = X \beta + \epsilon$

Le vecteur des résidus est estimé par

\hat{ϵ} = y - X \hat{β} = (I - X (X^{'} X)^{- 1} X^{'}) y = Q y = Q (X β + ϵ) = Q ϵ

$\hat{\epsilon} = y - X \hat{\beta} = (I - X (X'X)^{-1} X') y = Q y = Q (X \beta + \epsilon) = Q \epsilon$

où . $Q = I - X (X'X)^{-1} X'$

Observez que (la trace est invariante sous permutation cyclique) et que . Les valeurs propres de sont donc et (quelques détails ci-dessous). Il existe donc une matrice unitaire telle que (les matrices sont diagonalisables par des matrices unitaires si et seulement si elles sont normales. $\textrm{tr}(Q) = n - p$ $Q'=Q=Q^2$ $Q$ $0$ $1$ $V$ )

V^{'} Q V = Δ = diag (\underset{n - p times}{\underset{⏟}{1, \dots, 1}}, \underset{p times}{\underset{⏟}{0, \dots, 0}})

$V'QV = \Delta = \textrm{diag}(\underbrace{1, \ldots, 1}_{n-p \textrm{ times}}, \underbrace{0, \ldots, 0}_{p \textrm{ times}})$

Maintenant, nous allons . $K = V' \hat{\epsilon}$

Etant donné , on a et donc . Ainsi $\hat{\epsilon} \sim N(0, \sigma^2 Q)$ $K \sim N(0, \sigma^2 \Delta)$ $K_{n-p+1}=\ldots=K_n=0$

\frac{‖ K ‖^{2}}{σ^{2}} = \frac{‖ K^{⋆} ‖^{2}}{σ^{2}} \sim χ_{n - p}^{2}

$\frac{\|K\|^2}{\sigma^2} = \frac{\|K^{\star}\|^2}{\sigma^2} \sim \chi^2_{n-p}$

avec . $K^{\star} = (K_1, \ldots, K_{n-p})'$

De plus, comme est une matrice unitaire, nous avons également $V$

‖ \hat{ϵ} ‖^{2} = ‖ K ‖^{2} = ‖ K^{⋆} ‖^{2}

$\|\hat{\epsilon}\|^2 = \|K\|^2=\|K^{\star}\|^2$

Ainsi

\frac{RSS}{σ^{2}} \sim χ_{n - p}^{2}

$\frac{\textrm{RSS}}{\sigma^2} \sim \chi^2_{n-p}$

Enfin, notez que ce résultat implique que

E (\frac{RSS}{n - p}) = σ^{2}

$E\left(\frac{\textrm{RSS}}{n-p}\right) = \sigma^2$

Puisque , le polynôme minimal de divise le polynôme . Ainsi, les valeurs propres de sont entre et . Puisque est également la somme des valeurs propres multipliées par leur multiplicité, nous avons nécessairement que est une valeur propre avec la multiplicité et zéro est une valeur propre avec la multiplicité . $Q^2 - Q =0$ $Q$ $z^2 - z$ $Q$ $0$ $1$ $\textrm{tr}(Q) = n-p$ $1$ $n-p$ $p$

ocram
la source

(+1) Good answer. One can restrict attention to orthogonal, instead of unitary,

V

$V$ since

Q

$Q$ is real and symmetric. Also, what is

S C R

$\mathrm{SCR}$ ? I do not see it defined. By slightly rejiggering the argument, one can also avoid the use of a degenerate normal, in case that causes some consternation to those not familiar with it.

cardinal

@Cardinal. Good point. SCR ('Somme des Carrés Résiduels' in french) should have been RSS.

ocram

Thank you for the detailed answer Ocram! Some steps will require me to look more, but I have an outline to think about now - thanks!

Tal Galili

@Glen_b: Oh, I made an edit a couple of days ago to change SCR to SRR. I didn't remember that SCR is mentionned in my comment. Sorry for the confusion.

ocram

@Glen_b: It was supposed to mean RSS :-S Edited again. Thx

ocram

IMHO, the matricial notation $Y=X\beta+\epsilon$ complicates things. Pure vector space language is cleaner. The model can be written $\boxed{Y=\mu + \sigma G}$ where $G$ has the standard normal distributon on $\mathbb{R}^n$ and $\mu$ is assumed to belong to a vector subspace $W \subset \mathbb{R}^n$ .

Now the language of elementary geometry comes into play. The least-squares estimator $\hat\mu$ of $\mu$ is nothing but $P_WY$ : the orthogonal projection of the observable $Y$ on the space $W$ to which $\mu$ is assumed to belong. The vector of residuals is $P^\perp_WY$ : projection on the orthogonal complement $W^\perp$ of $W$ in $\mathbb{R^n}$ . The dimension of $W^\perp$ is $\dim(W^\perp)=n-\dim(W)$ .

Finally,

P_{W}^{⊥} Y = P_{W}^{⊥} (μ + σ G) = 0 + σ P_{W}^{⊥} G,

$P^\perp_WY = P^\perp_W(\mu + \sigma G) = 0 + \sigma P^\perp_WG,$ and

P_{W}^{⊥} G

$P^\perp_WG$ has the standard normal distribution on

W^{⊥}

$W^\perp$ , hence its squared norm has the

χ^{2}

$\chi^2$ distribution with

\dim (W^{⊥})

$\dim(W^\perp)$ degrees of freedom.

This demonstration uses only one theorem, actually a definition-theorem:

Definition and theorem. A random vector in $\mathbb{R}^n$ has the standard normal distribution on a vector space $U \subset \mathbb{R}^n$ if it takes its values in $U$ and its coordinates in one ( $\iff$ in all) orthonormal basis of $U$ are independent one-dimensional standard normal distributions

(from this definition-theorem, Cochran's theorem is so obvious that it is not worth to state it)

Stéphane Laurent
la source

Once we've established that $\hat\epsilon=(I-H)\epsilon$ , we can apply the following lemma:

Lemma: If $A_{n\times n}$ is a symmetric and idempotent real matrix, then there exists a matrix $U$ with orthonormal columns such that $A=UU^T$ . The matrix $U$ is $n\times r$ , where $r$ equals the rank of $A$ .

Proof: The spectral theorem for symmetric matrices asserts $A=UDU^T$ where $D_{n\times n}$ is a diagonal matrix of the eigenvalues $\lambda_1,\ldots,\lambda_n$ of $A$ and $U_{n\times n}$ is an orthogonal matrix whose columns are the corresponding eigenvectors $u_1,\ldots,u_n$ . Since $A$ is idempotent, each eigenvalue is either zero or one (reason: $Au=\lambda u$ implies $\lambda u =Au=A(Au)=A\lambda u=\lambda^2u$ ). Delete from $U$ the columns corresponding to zero eigenvalue, leaving an $n\times r$ matrix; the diagonal matrix $D$ becomes the identity. To determine $r$ , note that the columns remaining in $U$ each satisfy $Au_i=u_i$ , hence they form a basis for the range of $A$ ; so $\operatorname {rank}(A)=r$ .

Applying the lemma, write $I-H=UU^T$ where $U_{n\times r}$ has orthonormal columns and $r=\operatorname{rank}(I-H)$ . Then $\hat\epsilon:=(I-H)\epsilon=U(U^T\epsilon)$ . Observe that $N:=U^T\epsilon$ is an $r$ -dimensional random vector having multivariate normal distribution with mean zero and covariance matrix

Var (N) = E (U^{T} ϵ) (U^{T} ϵ)^{T} = U^{T} E (ϵ ϵ^{T}) U = σ^{2} (U^{T} U) = σ^{2} I_{r \times r}

$\operatorname{Var}(N)=E(U^T\epsilon)(U^T\epsilon)^T=U^TE(\epsilon\epsilon^T)U=\sigma^2 (U^TU)=\sigma^2I_{r\times r}$ and that

RSS := {\hat{ϵ}}^{T} \hat{ϵ} = (U N)^{T} (U N) = N^{T} (U^{T} U) N = N^{T} N .

$\operatorname{RSS}:=\hat\epsilon^T\hat\epsilon=(UN)^T(UN)=N^T(U^TU)N=N^TN.$ Conclude

RSS / σ^{2}

$\operatorname{RSS}/\sigma^2$ is the sum of squares of

r

$r$ IID standard normal variables and therefore has chi-square(

r

$r$ ) distribution.

To finish, we find $r=n-\operatorname{rank}(X)$ : Consider for $v\in{\mathbb R}^n$ the decomposition $v=Hv + (I-H)v$ . Idempotency of $H$ implies $(Hv)^T(I-H)v'=0$ for all $v,v'$ , whence ${\mathbb R}^n$ is the direct sum of the subspaces $\operatorname{range}(H)$ and $\operatorname{range}(I-H)$ , and so

n = \dim range (H) + \dim range (I - H) = rank (H) + rank (I - H) = rank (X) + r .

$n=\operatorname{dim}\operatorname{range}(H) + \operatorname{dim}\operatorname{range}(I-H) =\operatorname{rank}(H) + \operatorname{rank}(I-H) = \operatorname{rank}(X)+r.$

grand_chat
la source