Dans les statistiques fréquentistes, il existe un lien étroit entre les intervalles de confiance et les tests. Utilisation de l' inférence sur dans la la distribution , par exemple, le intervalle de confiance
Les intervalles de confiance fréquemment sont en ce sens des tests inversés. (Incidemment, cela signifie que nous pouvons interpréter la valeur comme la plus petite valeur de pour laquelle la valeur nulle du paramètre serait incluse dans l' intervalle de confiance . Je trouve que cela peut être un moyen utile d'expliquer ce que valeurs p sont vraiment pour les personnes qui connaissent un peu les statistiques.)
En lisant sur le fondement théorique de la décision des régions crédibles bayésiennes , j'ai commencé à me demander s'il existe un lien / une équivalence similaire entre les régions crédibles et les tests bayésiens.
- Y a-t-il un lien général?
- S'il n'y a pas de connexion générale, existe-t-il des exemples de connexion?
- S'il n'y a pas de lien général, comment pouvons-nous voir cela?
Réponses:
J'ai réussi à trouver un exemple où une connexion existe. Cela semble toutefois dépendre fortement de mon choix de fonction de perte et de l'utilisation d'hypothèses composites.
Je commence par un exemple général, qui est ensuite suivi par un cas spécial simple impliquant la distribution normale.
Exemple général
Pour un paramètre inconnu , laissez Θ être l'espace des paramètres et considérer l'hypothèse θ ∈ Θ 0 par rapport à l'alternative θ ∈ Θ 1 = Θ ∖ Θ 0 .θ Θ θ∈Θ0 θ∈Θ1=Θ∖Θ0
Laissez une fonction de test, en utilisant la notation à Xi'an de Le choix bayésien (qui est un peu en arrière à ce que je suis au moins habitué), de sorte que nous rejetons Θ 0 si φ = 0 et accepter Θ 0 si φ = 1 . Considérons la fonction de perte L ( θ , φ ) = { 0 , si φ = I Θ 0 ( θ ) un 0 , si θ ∈ Θφ Θ0 φ=0 Θ0 φ=1
Le testBayes est alorsφtc(x)
Prenez et un 1 = 1 - α . L'hypothèse nulle Θ 0 est acceptée si P ( & thetav ∈ Θ 0 | x ) ≥ 1 - α .a0=α≤0.5 a1=1−α Θ0 P(θ∈Θ0|x)≥1−α
Maintenant, une région crédible est une région telle que P ( Θ c | x ) ≥ 1 - α . Ainsi, par définition, si Θ 0 est telle que P ( & thetav ∈ Θ 0 | x ) ≥ 1 - α , Θ c peut être une région crédible que si P ( Θ 0 ∩ Θ c | x ) > 0Θc P(Θc|x)≥1−α Θ0 P(θ∈Θ0|x)≥1−α Θc P(Θ0∩Θc|x)>0 .
On accepte l'hypothèse nulle si et seulement si chaque région -credible contient un sous - ensemble non-nul de Θ 0 .1−α Θ0
Un cas particulier plus simple
Pour mieux illustrer le type de test utilisé dans l'exemple ci-dessus, considérons le cas spécial suivant.
Laissez avec θ ~ N ( 0 , 1 ) . Set Θ = R , Θ 0 = ( - ∞ , 0 ] et Θ 1 = ( 0 , ∞ ) , de sorte que l' on veut tester si & thetav ≤ 0 .x∼N(θ,1) θ∼N(0,1) Θ=R Θ0=(−∞,0] Θ1=(0,∞) θ≤0
Les calculs standard donnent oùΦ(⋅)est la fonctionrépartition normale standard.
Soit soit tel que Φ ( z 1 - α ) = 1 - α . Θ 0 est acceptée lorsque - x / √z1−α Φ(z1−α)=1−α Θ0 .−x/2–√>z1−α
Ceci équivaut à accepter quand Pourα=0,05,Θ0est donc rejeté lorsquex>-2,33x≤2–√zα. α=0.05 Θ0 x>−2.33 .
Si , au contraire , nous utilisons l'avant , Θ 0 est rejetée lorsque x > - 2,33 - ν .θ∼N(ν,1) Θ0 x>−2.33−ν
commentaires
The above loss function, where we think that falsely accepting the null hypothesis is worse than falsely rejecting it, may at first glance seem like a slightly artifical one. It can however be of considerable use in situations where "false negatives" can be costly, for instance when screening for dangerous contagious diseases or terrorists.
The condition that all credible regions must contain a part ofΘ0 is actually a bit stronger than what I was hoping for: in the frequentist case the correspondence is between a single test and a single 1−α confidence interval and not between a single test and all 1−α intervals.
la source
Michael and Fraijo suggested that simply checking whether the parameter value of interested was contained in some credible region was the Bayesian equivalent of inverting confidence intervals. I was a bit skeptical about this at first, since it wasn't obvious to me that this procedure really resulted in a Bayesian test (in the usual sense).
As it turns out, it does - at least if you're willing to accept a certain type of loss functions. Many thanks to Zen, who provided references to two papers that establish a connection between HPD regions and hypothesis testing:
I'll try to summarize them here, for future reference. In analogue with the example in the original question, I'll treat the special case where the hypotheses are
This means thatT(x) is a HPD region, with credibility P(θ∈T(x)|x) .
The Pereira-Stern test rejectsΘ0 when P(θ∉T(x)|x) is "small" (<0.05 , say). For a unimodal posterior, this means that θ0 is far out in the tails of the posterior, making this criterion somewhat similar to using p-values. In other words, Θ0 is rejected at the 5 % level if and only if it is not contained the in 95 % HPD region.
Let the test functionφ be 1 if Θ0 is accepted and 0 if Θ0 is rejected. Madruga et al. proposed the loss function
Minimization of the expected loss leads to the Pereira-Stern test whereΘ0 is rejected if P(θ∉T(x)|x)<(b+c)/(a+c).
So far, all is well. The Pereira-Stern test is equivalent to checking whetherθ0 is in an HPD region and there is a loss function that generates this test, meaning that it is founded in decision theory.
The controversial part though is that the loss function depends onx . While such loss functions have appeared in the literature a few times, they don't seem to be generally accepted as being very reasonable.
For further reading on this topic, see a list of papers that cite the Madruga et al. article.
Update October 2012:
I wasn't completely satisfied with the above loss function, as its dependence onx makes the decision-making more subjective than I would like. I spent some more time thinking about this problem and ended up writing a short note about it, posted on arXiv earlier today.
Letqα(θ|x) denote the posterior quantile function of θ , such that P(θ≤qα(θ|x))=α . Instead of HPD sets we consider the central (equal-tailed) interval (qα/2(θ|x),q1−α/2(θ|x)) . To test Θ0 using this interval can be justified in the decision-theoretic framework without a loss function that depends on x .
The trick is to reformulate the problem of testing the point-null hypothesisΘ0={θ0} as a three-decision problem with directional conclusions. Θ0 is then tested against both Θ−1={θ:θ<θ0} and Θ1={θ:θ>θ0} .
Let the test functionφ=i if we accept Θi (note that this notation is the opposite of that used above!). It turns out that under the weighted 0−1 loss function
This seems like a quite reasonable loss function to me. I discuss this loss, the Madruga-Esteves-Wechsler loss and testing using credible sets further in the manuscript on arXiv.
la source
I coincidentally read your arXiv paper prior to coming to this question and already wrote a blog entry on it (scheduled to appear on October, 08). To sum up, I find your construction of theoretical interest, but also think it is too contrived to be recommended, esp. as it does not seem to solve the point-null hypothesis Bayesian testing problem, which traditionally requires to put some prior mass on the point-null parameter value.
To wit, the solution you propose above (in the October update) and as Theorem 2 in your arXiv paper is not a valid test procedure in thatφ takes three values, rather than the two values that correspond to accept/reject. Similarly, the loss function you use in Theorem 3 (not reproduced here) amounts to testing a one-sided hypothesis, H0:θ≤θ0 , rather than a point-null hypothesis H0:θ=θ0 .
My major issue however is that it seems to me that both Theorem 3 and Theorem 4 in your arXiv paper are not valid whenH0 is a point-null hypothesis, i.e. when Θ0={θ0} , with no prior mass.
la source
You can use a credible interval (or HPD region) for Bayesian hypothesis testing. I don't think it is common; though, to be fair I do not see much nor do I use formal Bayesian Hypothesis testing in practice. Bayes factors are occasionally used (and in Robert's "Bayesian Core" somewhat lauded) in hypothesis testing set up.
la source
A credible region is just a region where the integral of the posterior density over the region is a specified probability e.g. 0.95. One way to form a Bayesian hypothesis test is to see whether or not the null hypothesized value(s) of the parameter(s) fall in the credible region. In this way we can have a similar 1-1 correspondence between hypothesis tests and credible regions just like the frequentists do with confidence intervals and hypothesis tests. But this is not the only way to do hypothesis testing.
la source
Let me give it how I got it reading Tim's answer.
It is based on the table views with hypothesis (estimated parameter) in columns and observations in the rows.
In the first table, you have col probabilities sum to 1, i.e. they are conditional probabilities, whose condition, getting into the column event is supplied in the bottom row, called 'prior'. In the last table, rows similarly sum to 1 and in the middle you have joint probabilities, i.e. conditional probabilities you find in the first and last table times the probability of the condition, the priors.
The tables basically perform the Bayesian transform: in the first table, you give p.d.f of the observations (rows) in every column, set the prior for this hypothesis (yes, hypothesis column is a pdf of observations under that hypothesis), you do that for every column and table takes it first into the joint probabilites table and, then into the probabilities of your hypothesis, conditioned by observations.
As I have got from Tim's answer (correct me if I am wrong), the Critical Interval approach looks at the first table. That is, once experiment is complete, we know the row of the table (either heads or tails in my example but you may make more complex experiments, like 100 coin flips and get a table with 2^100 rows). Frequentialist scans through its columns, which, as I have said, is a distribution of possible outcomes under condition that hypothesis colds true (e.g. coin is fair in my example), and rejects those hypothesis (columns) that has give very low probability value at the observed row.
Bayesianist first adjust the probabilities, converting cols into rows and looks at table 3, finds the row of the observed outcome. Since it is also a p.d.f, he goes through the experiment outcome row and picks the highest-prob hypethesis until his 95% credibility pocket is full. The rest of hypothesis is rejected.
How do you like it? I am still in the process of learning and graphic seems helpful to me. I belive that I am on the right track since a reputable user gives the same picture, when analyzes the difference of two approaches. I have proposed a graphical view of the mechanics of hypothesis selection.
I encourage everybody to read that Keith last answer but my picture of hypothesis test mechanics can immediately say that frequentist does not look at the other hypothesis when verifies the current one whereas consideration of high credibile hypothesis highly impacts the reception/rejection of other hypotheses in bayesian analisys because if you have a single hypothesis which occurs 95% of times under observed data, you throw all other hypothesis immediately, regardless how well is data fit within them. Let's put the statistical power analysis, which contrast two hypotheses based on their confidence intervals overlap, aside.
But, I seem have spotted the similarity between two approaches: they seem to be connected through
P(A | B) > P(A) <=> P(B|A) > P(B)
property. Basically, if there is a dependence between A and B then it will show up as correlation in both freq and bayesian tables. So, doing one hypothesis test correlates with the other, they sorta must give the same results. Studying the roots of the correlation, will likely give you the connection between the two. In my question there I actually ask why is the difference instead of absolute correlation?la source