Quelle est la différence intuitive entre une variable aléatoire convergeant en probabilité et une variable aléatoire convergeant en distribution?
J'ai lu de nombreuses définitions et équations mathématiques, mais cela n'aide pas vraiment. (Veuillez garder à l'esprit que je suis un étudiant de premier cycle étudiant en économétrie.)
Comment une variable aléatoire peut-elle converger vers un seul nombre, mais aussi converger vers une distribution?
Réponses:
Disons que vous avez N balles dans la boîte. Vous pouvez les choisir un par un. Après avoir choisi k balles, je vous demande: quel est le poids moyen des balles dans la boîte? Votre meilleure réponse serait ˉ x k = 1N k k ∑ k i = 1 xi. Vous vous rendez compte que ˉ x klui-même est la valeur aléatoire? Cela dépend deskballes que vous avez choisies en premier.x¯k=1k∑ki=1xi x¯k k
Maintenant, si vous continuez à tirer les balles, à un moment donné, il ne restera plus de balles dans la boîte et vous obtiendrez ˉ x N ≡ μ .x¯N≡μ
Donc, ce que nous avons est la séquence aléatoire ˉ x 1 , … , ˉ x k , … , ˉ x N , ˉ x N , ˉ x N , … qui converge vers la constante ˉ x N = μ . Donc, la clé pour comprendre votre problème de convergence des probabilités est de réaliser que nous parlons d' une séquence de variables aléatoires, construites d'une certaine manière .
Ensuite, obtenons des nombres aléatoires uniformes e 1 , e 2 , … , où e i ∈ [ 0 , 1 ] . Regardons la séquence aléatoire ξ 1 , ξ 2 , … , où ξ k = 1e1,e2,… ei∈[0,1] ξ1,ξ2,… √k12 ∑ki=1(ei-12 ). Leξkest une valeur aléatoire, car tous ses termes sont des valeurs aléatoires. Nous ne pouvons pas prédire ce que seraξk. Cependant, il s'avère que nous pouvons affirmer que les distributions de probabilité deξkressembleront de plus en plus à la normale normaleN(0,1). C'est ainsi que les distributions convergent.ξk=1k12√∑ki=1(ei−12) ξk ξk ξk N(0,1)
la source
On ne sait pas combien d'intuition un lecteur de cette question pourrait avoir sur la convergence de quoi que ce soit, encore moins de variables aléatoires, donc j'écrirai comme si la réponse était "très peu". Quelque chose qui pourrait aider: plutôt que de penser "comment une variable aléatoire peut -elle converger", demandez comment une séquence de variables aléatoires peut converger. En d'autres termes, ce n'est pas seulement une variable unique, mais une liste (infiniment longue!) De variables, et plus tard dans la liste se rapprochent de plus en plus de ... quelque chose. Peut-être un seul numéro, peut-être une distribution entière. Pour développer une intuition, nous devons déterminer ce que signifie "de plus en plus". La raison pour laquelle il existe tant de modes de convergence pour les variables aléatoires est qu'il existe plusieurs types de "
Récapitulons d'abord la convergence des séquences de nombres réels. Dans R, nous pouvons utiliser la distance euclidienne | x - y | pour mesurer la proximité de x avec y . Considérons x n = n + 1R |x−y| x y n =1+1n . Ensuite, la séquencex1,xn=n+1n=1+1n x 2 ,x 3 , … commence 2 , 3x1,x2,x3,… 2 ,43 ,54 ,65,…2,32,43,54,65,… and I claim that xnxn converges to 11 . Clearly xnxn is getting closer to 11 , but it's also true that xnxn is getting closer to 0.90.9 . For instance, from the third term onwards, the terms in the sequence are a distance of 0.50.5 or less from 0.90.9 . What matters is that they are getting arbitrarily close to 11 , but not to 0.90.9 . No terms in the sequence ever come within 0.050.05 of 0.90.9 , let alone stay that close for subsequent terms. In contrast x20=1.05x20=1.05 so is 0.050.05 from 11 , and all subsequent terms are within 0.050.05 of 11 , as shown below.
I could be stricter and demand terms get and stay within 0.0010.001 of 11 , and in this example I find this is true for the terms N=1000N=1000 and onwards. Moreover I could choose any fixed threshold of closeness ϵϵ , no matter how strict (except for ϵ=0ϵ=0 , i.e. the term actually being 11 ), and eventually the condition |xn−x|<ϵ|xn−x|<ϵ will be satisfied for all terms beyond a certain term (symbolically: for n>Nn>N , where the value of NN depends on how strict an ϵϵ I chose). For more sophisticated examples, note that I'm not necessarily interested in the first time that the condition is met - the next term might not obey the condition, and that's fine, so long as I can find a term further along the sequence for which the condition is met and stays met for all later terms. I illustrate this for xn=1+sin(n)nxn=1+sin(n)n , which also converges to 11 , with ϵ=0.05ϵ=0.05 shaded again.
Now consider X∼U(0,1)X∼U(0,1) and the sequence of random variables Xn=(1+1n)XXn=(1+1n)X . This is a sequence of RVs with X1=2XX1=2X , X2=32XX2=32X , X3=43XX3=43X and so on. In what senses can we say this is getting closer to XX itself?
Since XnXn and XX are distributions, not just single numbers, the condition |Xn−X|<ϵ|Xn−X|<ϵ is now an event: even for a fixed nn and ϵϵ this might or might not occur. Considering the probability of it being met gives rise to convergence in probability. For Xnp→XXn→pX we want the complementary probability P(|Xn−X|≥ϵ)P(|Xn−X|≥ϵ) - intuitively, the probability that XnXn is somewhat different (by at least ϵϵ ) to XX - to become arbitrarily small, for sufficiently large nn . For a fixed ϵϵ this gives rise to a whole sequence of probabilities, P(|X1−X|≥ϵ)P(|X1−X|≥ϵ) , P(|X2−X|≥ϵ)P(|X2−X|≥ϵ) , P(|X3−X|≥ϵ)P(|X3−X|≥ϵ) , …… and if this sequence of probabilities converges to zero (as happens in our example) then we say XnXn converges in probability to XX . Note that probability limits are often constants: for instance in regressions in econometrics, we see plim(ˆβ)=βplim(β^)=β as we increase the sample size nn . But here plim(Xn)=X∼U(0,1)plim(Xn)=X∼U(0,1) . Effectively, convergence in probability means that it's unlikely that XnXn and XX will differ by much on a particular realisation - and I can make the probability of XnXn and XX being further than ϵϵ apart as small as I like, so long as I pick a sufficiently large nn .
A different sense in which XnXn becomes closer to XX is that their distributions look more and more alike. I can measure this by comparing their CDFs. In particular, pick some xx at which FX(x)=P(X≤x)FX(x)=P(X≤x) is continuous (in our example X∼U(0,1)X∼U(0,1) so its CDF is continuous everywhere and any xx will do) and evaluate the CDFs of the sequence of XnXn s there. This produces another sequence of probabilities, P(X1≤x)P(X1≤x) , P(X2≤x)P(X2≤x) , P(X3≤x)P(X3≤x) , …… and this sequence converges to P(X≤x)P(X≤x) . The CDFs evaluated at xx for each of the XnXn become arbitrarily close to the CDF of XX evaluated at xx . If this result holds true regardless of which xx we picked, then XnXn converges to XX in distribution. It turns out this happens here, and we should not be surprised since convergence in probability to XX implies convergence in distribution to XX . Note that it can't be the case that XnXn converges in probability to a particular non-degenerate distribution, but converges in distribution to a constant. (Which was possibly the point of confusion in the original question? But note a clarification later.)
For a different example, let Yn∼U(1,n+1n)Yn∼U(1,n+1n) . We now have a sequence of RVs, Y1∼U(1,2)Y1∼U(1,2) , Y2∼U(1,32)Y2∼U(1,32) , Y3∼U(1,43)Y3∼U(1,43) , …… and it is clear that the probability distribution is degenerating to a spike at y=1y=1 . Now consider the degenerate distribution Y=1Y=1 , by which I mean P(Y=1)=1P(Y=1)=1 . It is easy to see that for any ϵ>0ϵ>0 , the sequence P(|Yn−Y|≥ϵ)P(|Yn−Y|≥ϵ) converges to zero so that YnYn converges to YY in probability. As a consequence, YnYn must also converge to YY in distribution, which we can confirm by considering the CDFs. Since the CDF FY(y)FY(y) of YY is discontinuous at y=1y=1 we need not consider the CDFs evaluated at that value, but for the CDFs evaluated at any other yy we can see that the sequence P(Y1≤y)P(Y1≤y) , P(Y2≤y)P(Y2≤y) , P(Y3≤y)P(Y3≤y) , …… converges to P(Y≤y)P(Y≤y) which is zero for y<1y<1 and one for y>1y>1 . This time, because the sequence of RVs converged in probability to a constant, it converged in distribution to a constant also.
Some final clarifications:
la source
In my mind, the existing answers all convey useful points, but they do not make an important distinction clear between the two modes of convergence.
Let XnXn , n=1,2,…n=1,2,… , and YY be random variables. For intuition, imagine XnXn are assigned their values by some random experiment that changes a little bit for each nn , giving an infinite sequence of random variables, and suppose YY gets its value assigned by some other random experiment.
If Xnp→YXn→pY , we have, by definition, that the probability of YY and XnXn differing from each other by some arbitrarily small amount approaches zero as n→∞n→∞ , for as small amount as you like. Loosely speaking, far out in the sequence of XnXn , we are confident XnXn and YY will take values very close to each other.
On the other hand, if we only have convergence in distribution and not convergence in probability, then we know that for large nn , P(Xn≤x)P(Xn≤x) is almost the same as P(Y≤x)P(Y≤x) , for almost any xx . Note that this does not say anything about how close the values of XnXn and YY are to each other. For example, if Y∼N(0,1010)Y∼N(0,1010) , and thus XnXn is also distributed pretty much like this for large nn , then it seems intuitively likely that the values of XnXn and YY will differ by quite a lot in any given observation. After all, if there is no restriction on them other than convergence in distribution, they may very well for all practical reasons be independent N(0,1010)N(0,1010) variables.
(In some cases it may not even make sense to compare XnXn and YY , maybe they're not even defined on the same probability space. This is a more technical note, though.)
la source
If you're learning econometrics, you're probably wondering about this in the context of a regression model. It converges to a degenerate distribution, to a constant. But something else does have a non-degenerate limiting distribution.
ˆβnβ^n converges in probability to ββ if the necessary assumptions are met. This means that by choosing a large enough sample size NN , the estimator will be as close as we want to the true parameter, with the probability of it being farther away as small as we want. If you think of plotting the histogram of ˆβnβ^n for various nn , it will eventually be just a spike centered on ββ .
In what sense does ˆβnβ^n converge in distribution? It also converges to a constant. Not to a normally distributed random variable. If you compute the variance of ˆβnβ^n you see that it shrinks with nn . So eventually it will go to zero in large enough nn , which is why the estimator goes to a constant. What does converge to a normally distributed random variable is
√n(ˆβn−β)n−−√(β^n−β) . If you take the variance of that you'll see that it does not shrink (nor grow) with nn . In very large samples, this will be approximately N(0,σ2)N(0,σ2) under standard assumptions. We can then use this approximation to approximate the distribution of ˆβnβ^n in that large sample.
But you are right that the limiting distribution of ˆβnβ^n is also a constant.
la source
Let me try to give a very short answer, using some very simple examples.
Convergence in distribution
Let Xn∼N(1n,1)Xn∼N(1n,1) , for all n, then XnXn converges to X∼N(0,1) in distribution. However, the randomness in the realization of Xn does not change over time. If we have to predict the value of Xn, the expectation of our error does not change over time.
Convergence in probability
Now, consider the random variable Yn that takes value 0 with probability 1−1n and 1 otherwise. As n goes to infinity, we are more and more sure that Yn will equal 0. Hence, we say Yn converges in probability to 0. Note that this also implies Yn converges in distribution to 0.
la source