Un problème d'estimabilité des paramètres

Soit et quatre variables aléatoires telles que , où sont des paramètres inconnus. Supposons également que ,Alors lequel est vrai? $Y_1,Y_2,Y_3$ $Y_4$ $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$ $\theta_1,\theta_2,\theta_3$ $Var(Y_i)=\sigma^2$ $i=1,2,3,4.$

A. $\theta_1,\theta_2,\theta_3$ sont estimables.

B. $\theta_1+\theta_3$ est estimable.

C. est estimable et est la meilleure estimation linéaire sans biais de . $\theta_1-\theta_3$ $\dfrac{1}{2}(Y_1+Y_3)$ $\theta_1-\theta_3$

D. $\theta_2$ est estimable.

La réponse est donnée est C qui me semble étrange (parce que j'ai eu D).

Pourquoi j'ai eu D? Depuis, $E(Y_2-Y_4)=2\theta_2$ .

Pourquoi je ne comprends pas que C pourrait être une réponse? D'accord, je vois que $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ est un estimateur non biaisé de $\theta_1-\theta_3$ , et sa 'variance est inférieure à $\dfrac{Y_1+Y_3}{2}$ .

Veuillez me dire où je me trompe.

Également publié ici: /math/2568894/a-problem-on-estimability-of-parameters

self-study estimation inference Stat_prob_001
la source

Mettez le self-studytag ou quelqu'un viendra et fermera votre question.

Carl

@Carl c'est fait mais pourquoi?

Stat_prob_001

Ce sont les règles du site, pas mes règles, les règles du site.

Carl

Est-ce que

Y1≠Y3 $Y_1\neq Y_3$ ?

Carl

@Carl, vous pouvez penser de cette façon:

Y1=θ1−θ3+ϵ1 $Y_1=\theta_1-\theta_3+\epsilon_1$ où

ϵ1 $\epsilon_1$ est un rv avec une moyenne de

0 $0$ et une variance

σ2 $\sigma^2$ . Et,

Y3=θ1−θ3+ϵ3 $Y_3=\theta_1-\theta_3+\epsilon_3$ où

ϵ3 $\epsilon_3$ est un rv avec une moyenne de

0 $0$ et une variance

σ2 $\sigma^2$

Stat_prob_001

Réponses:

Cette réponse met l'accent sur la vérification de l'estimabilité. La propriété de variance minimale est de ma considération secondaire.

Pour commencer, résumez les informations en termes de forme matricielle d'un modèle linéaire comme suit:

Y : = ⎡ ⎣ ⎢ ⎢ ⎢ Y 1 Y 2 Y 3 Y 4 ⎤ ⎦ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 - 1 - 1 - 1 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ θ 1 θ 2 θ 3 ⎤ ⎦ ⎥ + ⎡ ⎣ ⎢ ⎢ ⎢ ε 1 ε 2 ε 3 ε 4 ⎤ ⎦ ⎥ ⎥ ⎥ : = X β + ε, (1)

$\begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align}$ où

E(ε)=0,Var(ε)=σ2I $E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$ (pour discuter de l'estimabilité, l'hypothèse de sphérité n'est pas nécessaire. Mais pour discuter de la propriété de Gauss-Markov, nous devons supposer la sphérité de

ε $\varepsilon$ ).

Si la matrice du modèle $X$ est de plein rang, le paramètre original $\beta$ admet une des moindres carrés uniques estimer . En conséquence, tout paramètre , défini comme une fonction linéaire de est estimable dans le sens où elle peut être de manière non ambiguë estimée par les données via les moindres carrés estimer en tant que . $\hat{\beta} = (X'X)^{-1}X'Y$ $\phi$ $\phi(\beta)$ $\beta$ $\hat{\beta}$ $\hat{\phi} = p'\hat{\beta}$

La subtilité apparaît lorsque $X$ n'est pas de plein rang. Pour avoir une discussion approfondie, nous fixons d'abord quelques notations et termes ci-dessous (je respecte la convention de l'approche sans coordonnées des modèles linéaires , section 4.8. Certains des termes semblent inutilement techniques). De plus, la discussion s'applique au modèle linéaire général $Y = X\beta + \varepsilon$ avec $X \in \mathbb{R}^{n \times k}$ et $\beta \in \mathbb{R}^k$ .

Un collecteur de régression est la collection de vecteurs moyens lorsque $\beta$ varie sur $\mathbb{R}^k$ : $M = {X β : β \in R k} .$ $M = \{X\beta: \beta \in \mathbb{R}^k\}.$

Une fonction paramétrique $\phi = \phi(\beta)$ est une fonction linéaire de $\beta$ , $ϕ (β) = p' β = p 1 β 1 + \dots + p k β k .$ $\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$

Comme mentionné ci-dessus, lorsque le , toutes les fonctions paramétriques sont pas estimables. Mais, attendez, quelle est la définition du terme estimable techniquement? Il semble difficile de donner une définition claire sans déranger un peu l'algèbre linéaire. Une définition, qui je pense est la plus intuitive, est la suivante (à partir de la même référence susmentionnée): $\text{rank}(X) < k$ $\phi(\beta)$

Définition 1. Une fonction paramétrique est estimable si elle est déterminée uniquement par dans le sens où chaque fois que satisfont . $\phi(\beta)$ $X\beta$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1,\beta_2 \in \mathbb{R}^k$ $X\beta_1 = X\beta_2$

Interprétation. La définition ci-dessus stipule que le mappage de la variété de régression à l'espace de paramètres de doit être un à un, ce qui est garanti lorsque (c'est -à- dire lorsque lui-même est un à un). Lorsque , on sait qu'il existe tel que $M$ $\phi$ $\text{rank}(X) = k$ $X$ $\text{rank}(X) < k$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$ . La définition estimable ci-dessus exclut en effet les fonctionnelles paramétriques structurellement déficientes qui entraînent elles-mêmes des valeurs différentes même avec la même valeur sur , ce qui n'a pas de sens naturellement. En revanche, une fonction paramétrique estimable permet le cas avec , tant que la condition est remplie. $M$ $\phi(\cdot)$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$

There are other equivalent conditions to check the estimability of a parametric functional given in the same reference, Proposition 8.4.

After such a verbose background introduction, let's come back to your question.

A. $\beta$ itself is non-estimable for the reason that $\text{rank}(X) < 3$ , which entails $X\beta_1 = X\beta_2$ with $\beta_1 \neq \beta_2$ . Although the above definition is given for scalar functionals, it is easily generalized to vector-valued functionals.

B. $\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ is non-estimable. To wit, consider $\beta_1 = (0, 1, 0)'$ and $\beta_2 = (1, 1, 1)'$ , which gives $X\beta_1 = X\beta_2$ but $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$ .

C. $\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ is estimable. Because $X\beta_1 = X\beta_2$ trivially implies $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$ , i.e., $\phi_2(\beta_1) = \phi_2(\beta_2)$ .

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$ . Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$ , by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$ .

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is
$⎡ ⎣ ⎢ 40 - 4 020 - 4 04 ⎤ ⎦ ⎥ β^= ⎡ ⎣ ⎢ 10 - 1 11 - 1 10 - 1 1 - 1 - 1 ⎤ ⎦ ⎥ Y,$ $\begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation}$ which, after simplification, is $⎡ ⎣ ⎢ ⎢ ϕ (β^) θ^2 / 2 - ϕ (β^) ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ Y ¯ (Y 2 - Y 4) / 4 - Y ¯ ⎤ ⎦ ⎥,$ $\begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation}$ i.e., $\phi(\hat{\beta}) = \bar{Y}$ .

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$ , regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$ , it can be reformulated as

E [Y] = X β, β \in R k . (2)

$\begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}$

It is such a simple model that we only specified the first moment form of the response vector $Y$ . When $\text{rank}(X) = k$ , model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$ .).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$ , how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$ , by definition, this is $\phi_1 = \phi_2$ , hence model $(3)$ is identifiable when indexing with $\phi$ . Conversely, suppose model $(3)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$ , which is trivially $\phi_1(\beta) = \phi_2(\beta)$ .

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows

E [Y] = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ [ϕ 2 ϕ 3] = X ~ γ .

$\begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}$

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

Zhanxiong
la source

If you need a proof for the second part of option C, I will supplement my answer.

Zhanxiong

thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not

14(Y1+Y2+Y3+Y4) $\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"?

Stat_prob_001

Oh, I don't know why I thought it is the estimator in C. Actually

$(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer

Zhanxiong

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$ . We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form

$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$ for constants

$\lambda = (\lambda_i)$ .

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$ . By linearity of expectation,

$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$

Comparing coefficients of the unknown quantities $\theta_i$ reveals

$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$ .

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$ . All we have to do is minimize $(2)$ , which can be written

$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$

No constraints apply to $(u,v)$ . Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$ , it is now obvious that the unique solution is

$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

whuber
la source