Régression logistique: variables de réponse de Bernoulli vs binomiales

32

Je souhaite effectuer une régression logistique avec la réponse binomiale suivante et avec et comme variables prédites. X 2X1X2

entrez la description de l'image ici

Je peux présenter les mêmes données que les réponses de Bernoulli dans le format suivant.

entrez la description de l'image ici

Les résultats de la régression logistique pour ces 2 ensembles de données sont essentiellement les mêmes. Les résidus de déviance et AIC sont différents. (La différence entre la déviance nulle et la déviance résiduelle est la même dans les deux cas - 0,228.)

Vous trouverez ci-dessous les résultats de régression issus de R. Les ensembles de données sont appelés binom.data et bern.data.

Voici la sortie binomiale.

Call:
glm(formula = cbind(Successes, Trials - Successes) ~ X1 + X2, 
    family = binomial, data = binom.data)

Deviance Residuals: 
[1]  0  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.9649    21.6072  -0.137    0.891
X1Yes        -0.1897     2.5290  -0.075    0.940
X2            0.3596     1.9094   0.188    0.851

(Dispersion parameter for binomial family taken to be 1)

Null deviance:  2.2846e-01  on 2  degrees of freedom
Residual deviance: -4.9328e-32  on 0  degrees of freedom
AIC: 11.473

Number of Fisher Scoring iterations: 4

Voici la sortie de Bernoulli.

Call:
glm(formula = Success ~ X1 + X2, family = binomial, data = bern.data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6651  -1.3537   0.7585   0.9281   1.0108  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.9649    21.6072  -0.137    0.891
X1Yes        -0.1897     2.5290  -0.075    0.940
X2            0.3596     1.9094   0.188    0.851

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 15.276  on 11  degrees of freedom
Residual deviance: 15.048  on  9  degrees of freedom
AIC: 21.048

Number of Fisher Scoring iterations: 4

Mes questions:

1) Je peux voir que les estimations ponctuelles et les erreurs types entre les 2 approches sont équivalentes dans ce cas particulier. Cette équivalence est-elle vraie en général?

2) Comment la réponse à la question n ° 1 peut-elle être justifiée mathématiquement?

3) Pourquoi les résidus de déviance et AIC sont-ils différents?

Un scientifique
la source

Réponses:

24

1) oui Vous pouvez agréger / désagréger (?) Des données binomiales d’individus ayant les mêmes covariables. Cela vient du fait que la statistique suffisante pour un modèle binomial est le nombre total d'événements pour chaque vecteur de covariable; et le Bernoulli n'est qu'un cas particulier du binôme. Intuitivement, chaque essai de Bernoulli constituant un résultat binomial est indépendant, il ne devrait donc pas y avoir de différence entre le compter comme résultat unique ou comme des essais individuels séparés.

2) Supposons que nous avons vecteurs de covariables uniques x 1 , x 2 , , x n , chacun ayant un résultat binomial sur N i essais, c’est-à-dire Y iB i n ( N i , p i ). Vous avez spécifié un modèle de régression logistique, de sorte que l o g i t ( p i ) = K Σ k = 1 β k x i knx1,x2,,xnNi

YiBin(Ni,pi)
logit(pi)=k=1Kβkxik
bien que nous verrons plus tard que ce n'est pas important.

La log-vraisemblance pour ce modèle est et nous le maximisons par rapport àβ(entermes depi) pour obtenir nos estimations de paramètres.

(β;Y)=i=1nlog(NiYi)+Yilog(pi)+(NiYi)log(1pi)
βpi

i=1,,nNi

Zi1,,ZiYi=1
Zi(Yi+1),,ZiNi=0
Yi(NiYi)

ZijBernoulli(pi)
pi
(β;Z)=i=1nj=1NiZijlog(pi)+(1Zij)log(1pi)
Zij
(β;Y)=i=1nYilog(pi)+(NiYi)log(1pi)

βlog(NiYi)β

Di=2[Yilog(Yi/Nip^i)+(NiYi)log(1Yi/Ni1p^i)]
p^ip^i=Yi/NiDi=0i

Dij=2[Zijlog(Zijp^i)+(1Zij)log(1Zij1p^i)]
Apart from the fact that you will now have i=1nNi deviance residuals (instead of n as with the binomial data), these will each be either
Dij=2log(p^i)
or
Dij=2log(1p^i)
depending on whether Zij=1 or 0, and are obviously not the same as the above. Even if you sum these over j to get a sum of deviance residuals for each i, you don't get the same:
Di=j=1NiDij=2[Yilog(1p^i)+(NiYi)log(11p^i)]

The fact that the AIC is different (but the change in deviance is not) comes back to the constant term that was the difference between the log-likelihoods of the two models. When calculating the deviance, this is cancelled out because it is the same in all models based on the same data. The AIC is defined as

AIC=2K2
and that combinatorial term is the difference between the s:

AICBernoulliAICBinomial=2i=1nlog(NiYi)=9.575
Mark
la source
Thanks for your very detailed reply, Mark! Sorry for the delay in my response - I was on vacation. 3) Given that the 2 models give different results for deviance residuals and AIC, which one is correct or better? a) As I understand, observations with a deviance residual in excess of two may indicate lack of fit, so the absolute values of the deviance residuals matter. b) Since AIC is used to compare the fit between different models, perhaps there is no "correct" AIC. I would just compare the AICs of 2 binomial models or 2 Bernoulli models.
A Scientist
a) For the binary data, the Dij will be > 2 if either (Zij=1 and p^i<e1=0.368) or (Zij=0 and p^i>1e1=0.632). So even if your model fits the binomial data perfectly for the ith covariate vector (i.e. Yi/Ni=p^i<0.368, say), then the Yi Zijs that you've arbitrarily allocated as being 1 will have Dij>2. For this reason, I think the deviance residuals make more sense with the binomial data. Furthermore, the deviance itself for binary data does not have its usual properties...
Mark
1
b) Oui, en comparant UNEjeCs entre les modèles n’a de sens que lorsque les données utilisées pour s’ajuster à chaque modèle sont exactement les mêmes. Alors comparez Bernoulli avec Bernoulli ou binomial avec binomial.
Marc
Thanks, Mark! Your thoughtful and detailed replies are much appreciated!
A Scientist
0

I just want make comments on the last paragraph, “The fact that the AIC is different (but the change in deviance is not) comes back to the constant term that was the difference between the log-likelihoods of the two models. When calculating the change in deviance, this is cancelled out because it is the same in all models based on the same data." Unfortunately, this is not correct for the change in deviance. The deviance does not include the constant term Ex (extra constant term in the log-likelihood for the binomial data). Therefore, the change in deviance does nothing to do with the constant term EX. The deviance compares a given model to the full model. The fact that the deviances are different from Bernoulli/binary and binomial modelling but change in deviance is not is due to the difference in the full model log-likelihood values. These values are cancelled out in calculating the deviance changes. Therefore, Bernoulli and binomial logistic regression models yield an identical deviance changes provided the predicted probabilities pij and pi are the same. In fact, that is true for the probit and other link functions.

Let lBm and lBf denote the log-likelihood values from fitting model m and full model f to Bernoulli data. The deviance is then

    DB=2(lBf - lBm)=-2(lBm – lBf).

Although the lBf is zero for the binary data, we have not simplified the DB and kept it as is. The deviance from the binomial modelling with the same covariates is

    Db=2(lbf+Ex – (lbm+Ex))=2(lbf – lbm) = -2(lbm – lbf)

where the lbf+Ex and lbm+Ex are the log-likelihood values by the full and m models fitted to the binomial data. The extra constant term (Ex) is disappeared from the right hand side of the Db. Now look at change in deviances from Model 1 to Model 2. From Bernoulli modelling, we have change in deviance of

    DBC=DB2-DB1=2(lBf – lBm2)-2(lBf – lBm1) =2(lBm1 – lBm2).

Similarly, change in deviance from binomial fitting is

    DbC=DB2-DB1=2(lbf – lbm2)-2(lbf – lbm1) =2(lbm1 – lbm2).

It is immediately follows that the deviance changes are free from the log-likelihood contributions from full models, lBf and lbf. Therefore, we will get the same change in deviance, DBC = DbC, if lBm1 = lbm1 and lBm2 = lbm2. We know that is the case here and that why we are getting the same deviance changes from Bernoulli and binomial modelling. The difference between lbf and lBf leads to the different deviances.

Saei
la source
6
Would it be possibly for you to edit formatting of your answer? Unfortunately in this form it is not very readable. I would encourage you to brake the text in paragraphs and add TEX formatting to the formulas. It is also not always clear what does the abbreviations you use mean.
Tim
Many thanks, Tim. I am not familiar with the TEX formatting. I have originally typed in the Word, but I was unable to copy and paste. I have separated the equations from the text.
Saei
I'm not sure if you misread that paragraph: I said "the AIC is different (but the change in deviance is not)", and the remainder of the paragraph explains why the AIC is different between the two models. I didn't claim that the change in deviance depended on the constant term. In fact, I said "When calculating the change in deviance, this [the constant term] is cancelled out because it is the same in all models based on the same data"
Mark
The problem is that there is only one “constant term” in the text and it is the combinatorial term (binomial coefficient). When you say "this" is cancelled out, it implies that the constant term is included in the deviance. The difference between deviances from the Bernoulli and binomial models is the contributions from the log-likelihood value lbf from full the model. The lbf does not vary by different binomial models on the same data and it is cancelled out when calculating the change in deviance.
Saei
Ah ok I see what you mean. I have edited my answer accordingly, leaving in the reference to the change in deviance because the asker specifically mentioned it. The change in deviance is the same because the deviance doesn't depend on the constant term.
Mark