Qui a la queue plus lourde, lognormal ou gamma?

41

(Ceci est basé sur une question qui vient de me parvenir par courrier électronique; j'ai ajouté du contexte à partir d'une conversation brève précédente avec la même personne.)

L'année dernière, on m'a dit que la distribution gamma était plus lourde que la normale, et on m'a dit depuis que ce n'était pas le cas.

  • Lequel est plus lourd à queue?

  • Quelles sont les ressources que je peux utiliser pour explorer la relation?

Glen_b -Reinstate Monica
la source
3
Pour la personne qui vient de voter: Il serait utile de savoir quel est le problème perçu avec la question.
Glen_b -Reinstate Monica
1
Ce n'était pas moi, j'ai voté il y a longtemps. Cependant, je soupçonne qu’il s’agissait de l’utilité de la méthode de la longue queue par rapport au kurtosis dans le contexte des hypothèses de test t en présence de valeurs aberrantes, ce qui n’a absolument rien à voir avec ce que vous avez demandé. Le vote négatif est, à mon humble avis, problématique .
Carl

Réponses:

41

La queue (droite) d'une distribution décrit son comportement face à des valeurs élevées. L'objet correct à l' étude n'est pas sa densité - qui , dans de nombreux cas pratiques n'existe pas - mais plutôt sa fonction de distribution F . Plus précisément, comme F doit s'élever asymptotiquement à 1 pour les grands arguments x (selon la loi de la probabilité totale), nous nous intéressons à la rapidité avec laquelle il aborde cette asymptote: nous devons étudier le comportement de sa fonction de survie 1F(x) comme x .

Plus précisément, une répartition F d'une variable aléatoire X est « plus lourde » qu'une autre G à condition que finalement F a plus de probabilité à des valeurs importantes que G . Ceci peut être formalisé: il doit exister un nombre fini x0 tel que pour tout x>x0 ,

PrF(X>x)=1F(x)>1G(x)=PrG(X>x).

Figure

La courbe rouge sur cette figure est la fonction de survie pour une distribution de Poisson . La courbe bleue correspond à une distribution Gamma ( 3 ) , qui présente la même variance. Finalement, la courbe bleue dépasse toujours la courbe rouge, ce qui montre que cette distribution gamma a une queue plus lourde que cette distribution de Poisson. Ces distributions ne peuvent pas être facilement comparées à l'aide de densités, car la distribution de Poisson n'a pas de densité.(3)(3)

Il est vrai que lorsque les densités et g existent et que f ( x ) > g ( x ) pour x > x 0, alors F est plus lourd que Gfgf(x)>g(x)x>x0FG . Cependant, l'inverse est faux - et c'est une raison impérieuse de baser la définition de la lourdeur de la queue sur les fonctions de survie plutôt que sur les densités, même si souvent l'analyse des queues peut être plus facilement réalisée à l'aide des densités.

Counter-examples can be constructed by taking a discrete distribution H of positive unbounded support that nevertheless is no heavier-tailed than G (discretizing G will do the trick). Turn this into a continuous distribution by replacing the probability mass of H at each of its support points k, written h(k), by (say) a scaled Beta(2,2) distribution with support on a suitable interval [kε(k),k+ε(k)] and weighted by h(k). Given a small positive number δ, choose ε(k) sufficiently small to ensure that the peak density of this scaled Beta distribution exceeds f(k)/δ. By construction, the mixture δH+(1δ)G is a continuous distribution G whose tail looks like that of G (it is uniformly a tiny bit lower by an amount δ) but has spikes in its density at the support of H and all those spikes have points where they exceed the density of f. Thus G is lighter-tailed than F but no matter how far out in the tail we go there will be points where its density exceeds that of F.

Figure

The red curve is the PDF of a Gamma distribution G, the gold curve is the PDF of a lognormal distribution F, and the blue curve (with spikes) is the PDF of a mixture G constructed as in the counterexample. (Notice the logarithmic density axis.) The survival function of G is close to that of a Gamma distribution (with rapidly decaying wiggles): it will eventually grow less than that of F, even though its PDF will always spike above that of F no matter how far out into the tails we look.


Discussion

Incidentally, we can perform this analysis directly on the survival functions of lognormal and Gamma distributions, expanding them around x= to find their asymptotic behavior, and conclude that all lognormals have heavier tails than all Gammas. But, because these distributions have "nice" densities, the analysis is more easily carried out by showing that for sufficiently large x, a lognormal density exceeds a Gamma density. Let us not, however, confuse this analytical convenience with the meaning of a heavy tail.

Similarly, although higher moments and their variants (such as skewness and kurtosis) say a little about the tails, they do not provide sufficient information. As a simple example, we may truncate any lognormal distribution at such a large value that any given number of its moments will scarcely change--but in so doing we will have removed its tail entirely, making it lighter-tailed than any distribution with unbounded support (such as a Gamma).

A fair objection to these mathematical contortions would be to point out that behavior so far out in the tail has no practical application, because nobody would ever believe that any distributional model will be valid at such extreme (perhaps physically unattainable) values. That shows, however, that in applications we ought to take some care to identify which portion of the tail is of concern and analyze it accordingly. (Flood recurrence times, for instance, can be understood in this sense: 10-year floods, 100-year floods, and 1000-year floods characterize particular sections of the tail of the flood distribution.) The same principles apply, though: the fundamental object of analysis here is the distribution function and not its density.

whuber
la source
6
+1 excellent discussion of why it should be based on the survivor function. I've recommended to the original source of the question that they should have a look at your response.
Glen_b -Reinstate Monica
1
(+1) for good probabilistic discussion of how to interpret survival function.
This definition of heavy tails is fine, as one definition. But it has serious problems. In particular, there are bounded distributions that arguably have heavy tails, such as a .9999*U(-1,1) + .0001*U(-1000,1000) distribution. By the "definition" given, the N(0,1) distribution has heavier tails than the .9999*U(-1,1) + .0001*U(-1000,1000) distribution. This is obviously silly. Let's face it: There are infinitely many ways to measure tailedness of distribution.
Peter Westfall
1
@Peter The "silliness" arises because you seem to have gotten the ideas backwards. Neither of your examples has a "heavy" tail in any sense, because they are bounded. Both survival functions eventually are exactly zero and therefore both tails are equally light.
whuber
1
@PeterWestfall You have compared tails having bounded support with those having infinite support, as if that were meaningful. Many contexts exist in which that would be unnecessary, silly even. In those contexts in which one would compare them a quantile difference ratio may be appropriate. There are not many contexts beyond those and if you can think of one, do tell.
Carl
30

The gamma and the lognormal are both right skew, constant-coefficient-of-variation distributions on (0,), and they're often the basis of "competing" models for particular kinds of phenomena.

There are various ways to define the heaviness of a tail, but in this case I think all the usual ones show that the lognormal is heavier. (What the first person might have been talking about is what goes on not in the far tail, but a little to the right of the mode (say, around the 75th percentile on the first plot below, which for the lognormal is just below 5 and the gamma just above 5.)

However, let's just explore the question in a very simple way to begin.

Below are gamma and lognormal densities with mean 4 and variance 4 (top plot - gamma is dark green, lognormal is blue), and then the log of the density (bottom), so you can compare the trends in the tails:

enter image description here

It's hard to see much detail in the top plot, because all the action is to the right of 10. But it's quite clear in the second plot, where the gamma is heading down much more rapidly than the lognormal.

Another way to explore the relationship is to look at the density of the logs, as in the answer here; we see that the density of the logs for the lognormal is symmetric (it's normal!), and that for the gamma is left-skew, with a light tail on the right.

We can do it algebraically, where we can look at the ratio of densities as x (or the log of the ratio). Let g be a gamma density and f lognormal:

log(g(x)/f(x))=log(g(x))log(f(x))

=log(1Γ(α)βαxα1ex/β)log(12πσxe(log(x)μ)22σ2)

=k1(α1)log(x)x/β(k2log(x)(log(x)μ)22σ2)

=[c(α2)log(x)+(log(x)μ)22σ2]x/β

The term in the [ ] is a quadratic in log(x), while the remaining term is decreasing linearly in x. No matter what, that x/β will eventually go down faster than the quadratic increases irrespective of what the parameter values are. In the limit as x, the log of the ratio of densities is decreasing toward , which means the gamma pdf is eventually much smaller than the lognormal pdf, and it keeps decreasing, relatively. If you take the ratio the other way (with lognormal on top), it eventually must increase beyond any bound.

That is, any given lognormal is eventually heavier tailed than any gamma.


Other definitions of heaviness:

Some people are interested in skewness or kurtosis to measure the heaviness of the right tail. At a given coefficient of variation, the lognormal is both more skew and has higher kurtosis than the gamma.**

For example, with skewness, the gamma has a skewness of 2CV while the lognormal is 3CV + CV3.

There are some technical definitions of various measures of how heavy the tails are here. You might like to try some of those with these two distributions. The lognormal is an interesting special case in the first definition - all its moments exist, but its MGF doesn't converge above 0, while the MGF for the Gamma does converge in a neighborhood around zero.

--

** As Nick Cox mentions below, the usual transformation to approximate normality for the gamma, the Wilson-Hilferty transformation, is weaker than the log - it's a cube root transformation. At small values of the shape parameter, the fourth root has been mentioned instead see the discussion in this answer, but in either case it's a weaker transformation to achieve near-normality.

The comparison of skewness (or kurtosis) doesn't suggest any necessary relationship in the extreme tail - it instead tells us something about average behavior; but it may for that reason work better if the original point was not being made about the extreme tail.


Resources: It's easy to use programs like R or Minitab or Matlab or Excel or whatever you like to draw densities and log-densities and logs of ratios of densities ... and so on, to see how things go in particular cases. That's what I'd suggest to start with.

Glen_b -Reinstate Monica
la source
4
Indeed it does suggest that, but there's no necessary relationship between peakedness, heavy-tailedness and kurtosis; there are counterexamples to such expectations, so we must beware. The second plot confirms the suspicion though.
Glen_b -Reinstate Monica
5
Here's a one-liner. It's a definition that log transformation is needed to make a lognormal normal; it is a good approximation that a cube root makes a gamma normal (Wilson-Hilferty are two words for the wise); the distribution needing the stronger transformation is "further" from the normal or Gaussian.
Nick Cox
2
@Glen_b I am just adding a little decoration to a very nice-looking cake of yours.
Nick Cox
2
@Nick Cox I don't disagree with the statements about transformations. The mathematically illegitimate part is the conclusion you attempt to draw: from the fact that a logarithm makes the lognormal normal and a cube root makes a gamma approximately normal, you cannot draw any conclusion about the tails of either one.
whuber
2
Thanks; your point is clearer to me, but I stick by my "rule of thumb" wording, and invoke experience too. Clearly, I don't have a theorem.
Nick Cox
7

Although kurtosis is a related to the heaviness of tails, it would contribute more to the notion of fat tailed distributions, and relatively less to tail heaviness itself, as the following example shows. Herein, I now regurgitate what I have learned in the posts above and below, which are really excellent comments. First, the area of a right tail is the area from x to of a f(x) density function, A.K.A. the survival function, 1F(t). For the lognormal distribution e(log(x)μ)22σ22πσx;x0 and the gamma distribution βαxα1eβxΓ(α);x0, let us compare their respective survival functions 12erfc(log(x)μ2σ) and Q(α,βx)=Γ(α,βx)Γ(α) graphically. To do this, I arbitrarily set their respective variances (eσ21)e2μ+σ2 and αβ2, as well as their respective excess kurtoses 3e2σ2+2e3σ2+e4σ26 and 6α equal by choosing μ=0,σ=0.8 and solved for α0.19128,β0.335421. This shows 1-F(x) for LND in blue and GD in orange

the survival function for the lognormal distribution (LND) in blue and the gamma distribution (GD) in orange. This brings us to our first caution. That is, if this plot were all we were to examine, we might conclude that the tail for GD is heavier than for LND. That this is not the case is shown by extending the x-axis values of the plot, thus 1-F(x) for LND and GD longer graph

This plot shows that 1) even with equal kurtoses, the right tail areas of LND and GD can differ. 2) That graphic interpretation alone has its dangers, as it can only display results for fixed parameter values over a limited range. Thus, there is a need to find general expressions for the limiting survival function ratio of limxS(LND,x)S(GD,x). I was unable to do this with infinite series expansions. However, I was able to do this by using the intermediary of terminal or asymptotic functions, which are not unique functions and where for right hand tails then limxF(x)G(x)=1 is sufficient for F(x) and G(x) to be mutually asymptotic. With appropriate care taken to finding these functions, this has the potential to identify a subset of simpler functions than the survival functions themselves, that can be shared or held in common with more than one density function, for example, two different density functions may share a limiting exponential tail. In the prior version of this post, this is what I was referring to as the "added complexity of comparing survival functions." Note that, limuerfc(u)eu2πu=1 and limuΓ(α,u)euuα1=1 (Incidentally and not necessarily erfc(u)<eu2πu and Γ(α,u)<euuα1. That is, it is not necessary to choose an upper bound, just an asymptotic function). Here we write 12erfc(log(x)μ2σ)<e(log(x)μ2σ)22(π(log(x)μ))2σ and Γ(α,βx)Γ(α)<eβx(βx)α1Γ(α) where the ratio of the right hand terms has the same limit as x as the left hand terms. Simplifying the limiting ratio of right hand terms yields limxσΓ(α)(βx)1αeβx(μlog(x))22σ22π(log(x)μ)= meaning that for x sufficiently large, the LND tail area is as large as we like compared to the GD tail area, irrespective of what the parameter values are. That brings up another problem, we do not always have solutions that are true for all parameter values, thus, using graphic illustrations alone can be misleading. For example, the gamma distribution right tail area is greater than the exponential distribution's tail area when α<1, less than exponential when α>1 and the GD is exactly an exponential distribution when α=1.

What then is the use of taking the logarithms of the ratio of survival functions, since we obviously do not need to take logarithms to find a limiting ratio? Many distribution function contain exponential terms that look simpler when the logarithm is taken, and if the ratio goes to infinity in the limit as x increases, then the logarithm will do so as well. In our case, that would allow us to inspect limx(log(σΓ(α)(βx)1α2π(log(x)μ))+βx(μlog(x))22σ2)=, which some people would find simpler to look at. Lastly, if the ratio of survival functions goes to zero, then the logarithm of that ratio will go to , and in all cases after finding the limit of a logarithm of a ratio, we have to take the antilogarithm of that value to understand its relationship to the limiting value of the ordinary ratio of survival function.

Carl
la source
2
In this case (and quite often in cases of interest) higher kurtosis corresponds to heavier tail, but as a general proposition this is not the case - counterexamples are easy to construct.
Glen_b -Reinstate Monica
1
1. I don't know of any general way short of directly comparing the tails. 2. What is it that's more complicated? whuber's answer shows us why there's a problem with looking at anything but the survivor function (for the right tail); he discusses why you can't compare pdfs in detail but similar points carry over to kurtosis. Further, comparing S(x)=1F(x) is often much less complicated than comparing kurtosis as well. (In the left tail you'd compare F(x) directly but that wasn't an issue for this question.)
Glen_b -Reinstate Monica
2
I also note that you say "This has something to do with a moments theorem that says that if (all of?) the moments of two distributions are equal, then the distributions are identical." -- even if all moments of two distributions are equal, the distributions are not necessarily identical. Counterexamples are discussed in answers to several questions here on CV. You need more than just all moments equal -- you need the MGF to exist in a neighborhood of 0.
Glen_b -Reinstate Monica
1
@PeterWestfall Semi-infinite support is often assumed, for example, as 0t< for drug concentrations in blood plasma. In that case, tail-heaviness would determine whether the mean residence time of drug in the body measures anything (e.g., exponential distribution) or not (e.g., some Pareto distributions).
Carl
1
@PeterWestfall I do get your point, similar to nma.berkeley.edu/ark:/28722/bk000471p7j. It is incumbent to recall that every distribution implies different measures for different things. For example, the average extreme value is MVUE for location of a uniform distribution, not the mean, and not the median. Between those extreme values, the tails are heavy, but outside of them, the tails are zip. What that has to do with a higher moment like kurtosis, when the first moment is not MVUE I would not venture to guess. Something, maybe, but what?
Carl