Relations entre corrélation et causalité

19

De la page Wikipedia intitulée corrélation n'implique pas de causalité ,

Pour deux événements corrélés, A et B, les différentes relations possibles incluent:

A provoque B (causalité directe);
B provoque A (causalité inverse);
A et B sont les conséquences d'une cause commune, mais ne se causent pas;
A et B provoquent tous deux C, qui est (explicitement ou implicitement) conditionné;
A provoque B et B provoque A (causalité bidirectionnelle ou cyclique);
A provoque C qui provoque B (causalité indirecte);
Il n'y a aucun lien entre A et B; la corrélation est une coïncidence.

Que signifie le quatrième point. A et B provoquent tous deux C, qui est (explicitement ou implicitement) conditionné. Si A et B provoquent C, pourquoi A et B doivent-ils être corrélés.

correlation causality mat
la source

8

Xkcd

Todd Wilcox

2

Malgré le dicton, je m'attendrais à ce qu'il y ait une forte corrélation entre la corrélation et la causalité ...

Mehrdad

2

tylervigen.com/spurious-correlations

Ant

Voir également la discussion sur Est-ce qu'aucune corrélation n'implique aucune causalité?

ctwardy

18

"Conditionnement" est un mot de la théorie des probabilités: https://en.wikipedia.org/wiki/Conditional_probability

Le conditionnement sur C signifie que nous ne regardons que les cas où C est vrai. "Implicitement" signifie que nous ne pouvons pas rendre cette restriction explicite, parfois même pas au courant de le faire.

Le point signifie que, lorsque A et B provoquent tous deux C, l'observation d'une corrélation entre A et B dans les cas où C est vrai, ne signifie pas qu'il existe une véritable relation entre A et B.Ce conditionne simplement C (peut-être involontairement) que crée une corrélation artificielle.

Prenons un exemple.

Dans un pays, il existe exactement deux sortes de maladies, parfaitement indépendantes. Appelez A: "la personne a la première maladie", B: "la personne a la deuxième maladie". Supposons que , . $P(A)=0.1$ $P(B)=0.1$

Maintenant, toute personne qui a une de ces maladies va voir le médecin et alors seulement. Appelez C: "la personne va voir le médecin". Nous avons . $C=A \text{ or } B$

Calculons maintenant quelques probabilités:

$P(C)=0.19$
$P(A|C)=P(B|C)=\frac{0.1}{0.19}\approx 0.53$
$P(A \text{ and } B|C)=\frac{0.01}{0.19}\approx 0.053$
$P(A|C)P(B|C)\approx 0.28$

Clairement, conditionnés sur C, et sont très loin d'être indépendants. En fait, conditionnés en C, semble "cause" . $A$ $B$ $not A$ $B$

Si vous utilisez la liste des personnes qui , où enregistrées par leur médecin (s) en tant que source de données pour une analyse, alors il semble y avoir une forte corrélation entre les maladies et . Vous ne savez peut-être pas que votre source de données est en fait un conditionnement. Ceci est également appelé "biais de sélection". $A$ $B$

Benoit Sanchez
la source

13

Le quatrième point est un exemple du paradoxe de Berkson , également connu sous le nom de conditionnement sur un collisionneur , également connu sous le nom de phénomène d'explication .

A t t r a c t i v e \to A c c e p t \leftarrow C h a r m i n g

$Attractive \rightarrow Accept \leftarrow Charming$

A t t r a c t i v e

$Attractive$

C h a r m i n g

$Charming$

A c c e p t

$Accept$

$Attractive$ $Charming$ $Accept=1$ . Now suppose I tell you about a man who the woman agreed to date, and I tell you that he is (in the woman's opinion) not attractive at all. Well, we know that the woman agreed to date him anyway, so we would reasonably infer that he must be quite charming indeed. Conversely, if we learn about a man whose date proposal was accepted and who is not charming, we would reasonably infer that he must be quite attractive.

Do you see what's happened here? By conditioning on $Accept=1$ , we've induced a negative correlation between $Attractive$ and $Charming$ , even though these two traits are (by assumption) marginally independent. From the perspective of the woman, the attractive men she dates tend to be less charming, and the charming men she dates tend to be less attractive. But this is because, by thinking only of the men she has dated, she is implicitly conditioning on $Accept$ . If she would instead consider all the men who have proposed dates, regardless of whether she accepted the proposal, she would see that there is no statistical association between the two traits.

Jake Westfall
la source

5

Simpson's paradox and Berkson's paradox can each give examples of "A and B both cause C, which is (explicitly or implicitly) conditioned on"

As an example suppose I have $1000$ stamps in my collection of which $100$ are rare ( $10\%$ ) and $200$ are pretty ( $20\%$ ). If there is no intrinsic relationship between rarity and prettiness, it might turn out $20$ of my stamps are both pretty and rare.

If I now display my $280$ interesting stamps, i.e. those which are rare or pretty or both, there will be an apparent negative correlation between rarity and prettiness ( $20\%$ of displayed rare stamps are pretty while $100\%$ of displayed common stamps are pretty) due entirely to conditioning on being interesting.

Henry
la source

This is an example Berkson's paradox, not Simpson's paradox (see my answer).

Jake Westfall

@JakeWestfall You are probably right - I knew I had written the stamps example before somewhere but forgotten where and it turns out to be the Wikipedia page for Berkson's paradox

Henry

4

The paragraph starts with "For any two correlated events, A and B,...", so my guess is that correlation is assumed at the beginning. In other words, they need not be correlated to simultaneously cause C, but if they were correlated and they did both cause C, it does not imply that there exists a causal relationship between them.

Roux
la source

Relations entre corrélation et causalité

Réponses: