Explication intuitive de la convergence dans la distribution et de la convergence dans la probabilité

26

Quelle est la différence intuitive entre une variable aléatoire convergeant en probabilité et une variable aléatoire convergeant en distribution?

J'ai lu de nombreuses définitions et équations mathématiques, mais cela n'aide pas vraiment. (Veuillez garder à l'esprit que je suis un étudiant de premier cycle étudiant en économétrie.)

Comment une variable aléatoire peut-elle converger vers un seul nombre, mais aussi converger vers une distribution?

nicefella
la source
1
"Comment une variable aléatoire peut -elle converger vers un seul nombre mais aussi converger vers une distribution?" - Je pense que vous gagneriez à clarifier si votre confusion est que les véhicules récréatifs en général peuvent converger vers des nombres uniques ou vers une distribution entière (moins de mystère une fois que vous réalisez que le "numéro unique" est essentiellement un type spécial de distribution) ou si votre confusion est de savoir comment un seul RV pourrait converger vers une constante selon un mode de convergence, mais vers une distribution selon un autre mode de convergence?
Silverfish
1
Comme @CloseToC Je me demande si vous êtes dans les régressions où d'une part , vous avez été dit β est « asymptotiquement normal » , mais d'autre part , vous avez été dit qu'il converge vers la vraie β . β^β
Silverfish
@Silverfish, je n'ai pas vraiment!
nicefella

Réponses:

25

Comment un nombre aléatoire peut-il converger vers une constante?

Disons que vous avez N balles dans la boîte. Vous pouvez les choisir un par un. Après avoir choisi k balles, je vous demande: quel est le poids moyen des balles dans la boîte? Votre meilleure réponse serait ˉ x k = 1Nkk k i = 1 xi. Vous vous rendez compte que ˉ x klui-même est la valeur aléatoire? Cela dépend deskballes que vous avez choisies en premier.x¯k=1kki=1xix¯kk

Maintenant, si vous continuez à tirer les balles, à un moment donné, il ne restera plus de balles dans la boîte et vous obtiendrez ˉ x Nμ .x¯Nμ

Donc, ce que nous avons est la séquence aléatoire ˉ x 1 , , ˉ x k , , ˉ x N , ˉ x N , ˉ x N , qui converge vers la constante ˉ x N = μ . Donc, la clé pour comprendre votre problème de convergence des probabilités est de réaliser que nous parlons d' une séquence de variables aléatoires, construites d'une certaine manière .

x¯1,,x¯k,,x¯N,x¯N,x¯N,
x¯N=μ

Ensuite, obtenons des nombres aléatoires uniformes e 1 , e 2 , , où e i[ 0 , 1 ] . Regardons la séquence aléatoire ξ 1 , ξ 2 , , où ξ k = 1e1,e2,ei[0,1]ξ1,ξ2,k12ki=1(ei-12 ). Leξkest une valeur aléatoire, car tous ses termes sont des valeurs aléatoires. Nous ne pouvons pas prédire ce que seraξk. Cependant, il s'avère que nous pouvons affirmer que les distributions de probabilité deξkressembleront de plus en plus à la normale normaleN(0,1). C'est ainsi que les distributions convergent.ξk=1k12ki=1(ei12)ξkξkξkN(0,1)

Aksakal
la source
1
Quelle est la séquence de variables aléatoires dans votre premier exemple après avoir atteint N? Comment la limite est-elle évaluée?
ekvall
Ce n'est qu'une intuition. Imaginez la boîte infinie, donc votre estimateur ˉ x converge vers la moyenne de la population μ . x¯μ
Aksakal
21

On ne sait pas combien d'intuition un lecteur de cette question pourrait avoir sur la convergence de quoi que ce soit, encore moins de variables aléatoires, donc j'écrirai comme si la réponse était "très peu". Quelque chose qui pourrait aider: plutôt que de penser "comment une variable aléatoire peut -elle converger", demandez comment une séquence de variables aléatoires peut converger. En d'autres termes, ce n'est pas seulement une variable unique, mais une liste (infiniment longue!) De variables, et plus tard dans la liste se rapprochent de plus en plus de ... quelque chose. Peut-être un seul numéro, peut-être une distribution entière. Pour développer une intuition, nous devons déterminer ce que signifie "de plus en plus". La raison pour laquelle il existe tant de modes de convergence pour les variables aléatoires est qu'il existe plusieurs types de "

Récapitulons d'abord la convergence des séquences de nombres réels. Dans R, nous pouvons utiliser la distance euclidienne | x - y | pour mesurer la proximité de x avec y . Considérons x n = n + 1R |xy|xyn =1+1n . Ensuite, la séquencex1,xn=n+1n=1+1nx 2 ,x 3 , commence 2 , 3x1,x2,x3,2 ,43 ,54 ,65,2,32,43,54,65, and I claim that xnxn converges to 11. Clearly xnxn is getting closer to 11, but it's also true that xnxn is getting closer to 0.90.9. For instance, from the third term onwards, the terms in the sequence are a distance of 0.50.5 or less from 0.90.9. What matters is that they are getting arbitrarily close to 11, but not to 0.90.9. No terms in the sequence ever come within 0.050.05 of 0.90.9, let alone stay that close for subsequent terms. In contrast x20=1.05x20=1.05 so is 0.050.05 from 11, and all subsequent terms are within 0.050.05 of 11, as shown below.

Convergence of (n+1)/n to 1

I could be stricter and demand terms get and stay within 0.0010.001 of 11, and in this example I find this is true for the terms N=1000N=1000 and onwards. Moreover I could choose any fixed threshold of closeness ϵϵ, no matter how strict (except for ϵ=0ϵ=0, i.e. the term actually being 11), and eventually the condition |xnx|<ϵ|xnx|<ϵ will be satisfied for all terms beyond a certain term (symbolically: for n>Nn>N, where the value of NN depends on how strict an ϵϵ I chose). For more sophisticated examples, note that I'm not necessarily interested in the first time that the condition is met - the next term might not obey the condition, and that's fine, so long as I can find a term further along the sequence for which the condition is met and stays met for all later terms. I illustrate this for xn=1+sin(n)nxn=1+sin(n)n, which also converges to 11, with ϵ=0.05ϵ=0.05 shaded again.

Convergence of 1 + sin(n)/n to 1

Now consider XU(0,1)XU(0,1) and the sequence of random variables Xn=(1+1n)XXn=(1+1n)X. This is a sequence of RVs with X1=2XX1=2X, X2=32XX2=32X, X3=43XX3=43X and so on. In what senses can we say this is getting closer to XX itself?

Since XnXn and XX are distributions, not just single numbers, the condition |XnX|<ϵ|XnX|<ϵ is now an event: even for a fixed nn and ϵϵ this might or might not occur. Considering the probability of it being met gives rise to convergence in probability. For XnpXXnpX we want the complementary probability P(|XnX|ϵ)P(|XnX|ϵ) - intuitively, the probability that XnXn is somewhat different (by at least ϵϵ) to XX - to become arbitrarily small, for sufficiently large nn. For a fixed ϵϵ this gives rise to a whole sequence of probabilities, P(|X1X|ϵ)P(|X1X|ϵ), P(|X2X|ϵ)P(|X2X|ϵ), P(|X3X|ϵ)P(|X3X|ϵ), and if this sequence of probabilities converges to zero (as happens in our example) then we say XnXn converges in probability to XX. Note that probability limits are often constants: for instance in regressions in econometrics, we see plim(ˆβ)=βplim(β^)=β as we increase the sample size nn. But here plim(Xn)=XU(0,1)plim(Xn)=XU(0,1). Effectively, convergence in probability means that it's unlikely that XnXn and XX will differ by much on a particular realisation - and I can make the probability of XnXn and XX being further than ϵϵ apart as small as I like, so long as I pick a sufficiently large nn.

A different sense in which XnXn becomes closer to XX is that their distributions look more and more alike. I can measure this by comparing their CDFs. In particular, pick some xx at which FX(x)=P(Xx)FX(x)=P(Xx) is continuous (in our example XU(0,1)XU(0,1) so its CDF is continuous everywhere and any xx will do) and evaluate the CDFs of the sequence of XnXns there. This produces another sequence of probabilities, P(X1x)P(X1x), P(X2x)P(X2x), P(X3x)P(X3x), and this sequence converges to P(Xx)P(Xx). The CDFs evaluated at xx for each of the XnXn become arbitrarily close to the CDF of XX evaluated at xx. If this result holds true regardless of which xx we picked, then XnXn converges to XX in distribution. It turns out this happens here, and we should not be surprised since convergence in probability to XX implies convergence in distribution to XX. Note that it can't be the case that XnXn converges in probability to a particular non-degenerate distribution, but converges in distribution to a constant. (Which was possibly the point of confusion in the original question? But note a clarification later.)

For a different example, let YnU(1,n+1n)YnU(1,n+1n). We now have a sequence of RVs, Y1U(1,2)Y1U(1,2), Y2U(1,32)Y2U(1,32), Y3U(1,43)Y3U(1,43), and it is clear that the probability distribution is degenerating to a spike at y=1y=1. Now consider the degenerate distribution Y=1Y=1, by which I mean P(Y=1)=1P(Y=1)=1. It is easy to see that for any ϵ>0ϵ>0, the sequence P(|YnY|ϵ)P(|YnY|ϵ) converges to zero so that YnYn converges to YY in probability. As a consequence, YnYn must also converge to YY in distribution, which we can confirm by considering the CDFs. Since the CDF FY(y)FY(y) of YY is discontinuous at y=1y=1 we need not consider the CDFs evaluated at that value, but for the CDFs evaluated at any other yy we can see that the sequence P(Y1y)P(Y1y), P(Y2y)P(Y2y), P(Y3y)P(Y3y), converges to P(Yy)P(Yy) which is zero for y<1y<1 and one for y>1y>1. This time, because the sequence of RVs converged in probability to a constant, it converged in distribution to a constant also.

Some final clarifications:

  • Although convergence in probability implies convergence in distribution, the converse is false in general. Just because two variables have the same distribution, doesn't mean they have to be likely to be to close to each other. For a trivial example, take XBernouilli(0.5)XBernouilli(0.5) and Y=1XY=1X. Then XX and YY both have exactly the same distribution (a 50% chance each of being zero or one) and the sequence Xn=XXn=X i.e. the sequence going X,X,X,X,X,X,X,X, trivially converges in distribution to YY (the CDF at any position in the sequence is the same as the CDF of YY). But YY and XX are always one apart, so P(|XnY|0.5)=1P(|XnY|0.5)=1 so does not tend to zero, so XnXn does not converge to YY in probability. However, if there is convergence in distribution to a constant, then that implies convergence in probability to that constant (intuitively, further in the sequence it will become unlikely to be far from that constant).
  • As my examples make clear, convergence in probability can be to a constant but doesn't have to be; convergence in distribution might also be to a constant. It isn't possible to converge in probability to a constant but converge in distribution to a particular non-degenerate distribution, or vice versa.
  • Is it possible you've seen an example where, for instance, you were told a sequence XnXn converged another sequence YnYn? You may not have realised it was a sequence, but the give-away would be if it was a distribution that also depended on nn. It might be that both sequences converge to a constant (i.e. degenerate distribution). Your question suggests you're wondering how a particular sequence of RVs could converge both to a constant and to a distribution; I wonder if this is the scenario you're describing.
  • My current explanation is not very "intuitive" - I was intending to make the intuition graphical, but haven't had time to add the graphs for the RVs yet.
Silverfish
la source
16

In my mind, the existing answers all convey useful points, but they do not make an important distinction clear between the two modes of convergence.

Let XnXn, n=1,2,n=1,2,, and YY be random variables. For intuition, imagine XnXn are assigned their values by some random experiment that changes a little bit for each nn, giving an infinite sequence of random variables, and suppose YY gets its value assigned by some other random experiment.

If XnpYXnpY, we have, by definition, that the probability of YY and XnXn differing from each other by some arbitrarily small amount approaches zero as nn, for as small amount as you like. Loosely speaking, far out in the sequence of XnXn, we are confident XnXn and YY will take values very close to each other.

On the other hand, if we only have convergence in distribution and not convergence in probability, then we know that for large nn, P(Xnx)P(Xnx) is almost the same as P(Yx)P(Yx), for almost any xx. Note that this does not say anything about how close the values of XnXn and YY are to each other. For example, if YN(0,1010)YN(0,1010), and thus XnXn is also distributed pretty much like this for large nn, then it seems intuitively likely that the values of XnXn and YY will differ by quite a lot in any given observation. After all, if there is no restriction on them other than convergence in distribution, they may very well for all practical reasons be independent N(0,1010)N(0,1010) variables.

(In some cases it may not even make sense to compare XnXn and YY, maybe they're not even defined on the same probability space. This is a more technical note, though.)

ekvall
la source
1
(+1) You don't even need the XnXn to vary - I was going to add some detail on this to my answer but decided against it on length grounds. But I think it is a point worth making.
Silverfish
12

What I don't understand is how can a random variable converge to a single number but also converge to a distribution?

If you're learning econometrics, you're probably wondering about this in the context of a regression model. It converges to a degenerate distribution, to a constant. But something else does have a non-degenerate limiting distribution.

ˆβnβ^n converges in probability to ββ if the necessary assumptions are met. This means that by choosing a large enough sample size NN, the estimator will be as close as we want to the true parameter, with the probability of it being farther away as small as we want. If you think of plotting the histogram of ˆβnβ^n for various nn, it will eventually be just a spike centered on ββ.

In what sense does ˆβnβ^n converge in distribution? It also converges to a constant. Not to a normally distributed random variable. If you compute the variance of ˆβnβ^n you see that it shrinks with nn. So eventually it will go to zero in large enough nn, which is why the estimator goes to a constant. What does converge to a normally distributed random variable is

n(ˆβnβ)n(β^nβ). If you take the variance of that you'll see that it does not shrink (nor grow) with nn. In very large samples, this will be approximately N(0,σ2)N(0,σ2) under standard assumptions. We can then use this approximation to approximate the distribution of ˆβnβ^n in that large sample.

But you are right that the limiting distribution of ˆβnβ^n is also a constant.

CloseToC
la source
1
Look upon this as "looking at ^βnβn^ with a magnifying glass", with magnification increasing with nn at the rate nn.
kjetil b halvorsen
7

Let me try to give a very short answer, using some very simple examples.

Convergence in distribution

Let XnN(1n,1)XnN(1n,1), for all n, then XnXn converges to XN(0,1) in distribution. However, the randomness in the realization of Xn does not change over time. If we have to predict the value of Xn, the expectation of our error does not change over time.

Convergence in probability

Now, consider the random variable Yn that takes value 0 with probability 11n and 1 otherwise. As n goes to infinity, we are more and more sure that Yn will equal 0. Hence, we say Yn converges in probability to 0. Note that this also implies Yn converges in distribution to 0.

Sven
la source