D'après An Introduction to Statistical Learning de James et al., L'estimation de validation croisée avec oubli (LOOCV) est définie par
where .
Without proof, equation (5.2) states that for a least-squares or polynomial regression (whether this applies to regression on just one variable is unknown to me),
where " is the th fitted value from the original least squares fit (no idea what this means, by the way, does it mean from using all of the points in the data set?) and is the leverage" which is defined by
How does one prove this?
My attempt: one could start by noticing that
but apart from this (and if I recall, that formula for is only true for simple linear regression...), I'm not sure how to proceed from here.
regression
self-study
cross-validation
least-squares
Clarinetist
la source
la source
Réponses:
I'll show the result for any multiple linear regression, whether the regressors are polynomials ofXt or not. In fact, it shows a little more than what you asked, because it shows that each LOOCV residual is identical to the corresponding leverage-weighted residual from the full regression, not just that you can obtain the LOOCV error as in (5.2) (there could be other ways in which the averages agree, even if not each term in the average is the same).
Let me take the liberty to use slightly adapted notation.
We first show that
The proof uses the following matrix algebraic result.
LetA be a nonsingular matrix, b a vector and λ a scalar. If
The proof of (B) follows immediately from verifying
The following result is helpful to prove (A)
Proof of (C): By (B) we have, using∑Tt=1X′tXt=X′X ,
The proof of (A) now follows from (C): As
Now, noteht=Xt(X′X)−1X′t . Multiply through in (A) by Xt , add yt on both sides and rearrange to get, with u^(t) the residuals resulting from using β^(t) (yt−Xtβ^(t) ),
la source