Clarification de la géométrie de l'information

Cette question concerne l'étude de la géométrie différentielle des familles exponentielles courbes - courbures et perte d'information par Amari.

Le texte est le suivant.

Soit une variété à dimensions de distributions de probabilité avec un système de coordonnées , où est supposé ... $S^n=\{p_{\theta}\}$ $n$ $\theta=(\theta_1,\dots,\theta_n)$ $p_{\theta}(x)>0$

On peut considérer chaque point de comme porteur d'une fonction de ... $\theta$ $S^n$ $\log p_{\theta}(x)$ $x$

Soit l'espace tangent de en , qui est, grosso modo, identifié avec une version linéarisée d'un petit voisinage de dans . Soit la base naturelle de associée au système coordonné ... $T_{\theta}$ $S^n$ $\theta$ $\theta$ $S^n$ $e_i(\theta), i=1,\dots,n$ $T_{\theta}$

Puisque chaque point de porte une fonction de , il est naturel de considérer en comme représentant la fonction $\theta$ $S^n$ $\log p_{\theta}(x)$ $x$ $e_i(\theta)$ $\theta$

e_{i} (θ) = \frac{\partial}{\partial θ_{i}} \log p_{θ} (x) .

$e_i(\theta)=\frac{\partial}{\partial\theta_i}\log p_{\theta}(x).$

Je ne comprends pas la dernière déclaration. Cela apparaît dans la section 2 du document susmentionné. Comment la base de l'espace tangent est-elle donnée par l'équation ci-dessus? Il serait utile que quelqu'un dans cette communauté qui connaît ce type de matériel puisse m'aider à comprendre cela. Merci.

Mise à jour 1:

Bien que je convienne que (de @aginensky) si sont linéairement indépendants alors $\frac{\partial}{\partial\theta_i}p_{\theta}$ sont également linéairement indépendants, comment ils sont membres de l'espace tangent en premier lieu n'est pas très clair. Alors comment $\frac{\partial}{\partial\theta_i}\log p_{\theta}$ soit considéré comme la base de l'espace tangent. Toute aide est appréciée. $\frac{\partial}{\partial\theta_i}\log p_{\theta}$

Mise à jour 2:

@aginensky: Dans son livre, Amari dit ce qui suit:

$S^n=\mathcal{P}(\mathcal{X})$ $\mathcal{X}=\{x_0,\dots,x_n\}$ $\mathcal{P}(\mathcal{X})$ $\mathbb{R}^{\mathcal{X}}=\{X\big|X:\mathcal{X}\to \mathbb{R}\}$ $\mathcal{P}(\mathcal{X})$ $\{X\big |\sum_x X(x)=1\}$

$T_p(S^n)$ $S^n$ $\mathcal{A}_0=\{X\big |\sum_x X(x)=0\}$ $\frac{\partial}{\partial\theta_i}$ $\theta=(\theta_1,\dots,\theta_n)$ $(\frac{\partial}{\partial\theta_i})_{\theta}=\frac{\partial}{\partial\theta_i}p_{\theta}$

$p\mapsto \log p$ $S^n$ $\log S^n:=\{\log p\big |p\in S^n\}$ $\mathbb{R}^{\mathcal{X}}$ $X\in T_p(S^n)$ $X$ $p\mapsto \log p$ $X^{(e)}$ $(\frac{\partial}{\partial\theta_i})_{\theta}^{(e)}=\frac{\partial}{\partial\theta_i}\log p_{\theta}$ $X^{(e)}=X(x)/p(x)$

T_{p}^{(e)} (S^{n}) = {X^{(e)} | X \in T_{p} (S^{n})} = {A \in R^{X} | \sum_{x} A (x) p (x) = 0} .

$T_p^{(e)}(S^n)=\{X^{(e)}\big |X\in T_p(S^n)\}=\{A\in \mathbb{R}^{\mathcal{X}}\big |\sum_x A(x)p(x)=0\}.$

$\frac{\partial}{\partial\theta_i}$ $(\frac{\partial}{\partial\theta_i})^{(e)}$ $T_p$ $T_p^{(e)}$ $\frac{\partial}{\partial\theta_i}^{(e)}\in T_p^{(e)}$

$S^n,T_p$ $(\log S^n,T_p^{(e)})$

mathematical-statistics statistical-learning geometry information-geometry Ashok
la source

e_{i} (θ) = \frac{\partial}{\partial θ_{i}} \log p_{θ} (x)

$e_i(\theta)=\frac{\partial}{\partial\theta_i}\log p_{\theta}(x)$

θ_{i}

$\theta_{i}$

\frac{\partial}{\partial θ_{i}}

$\frac{\partial}{\partial\theta_i}$

p_{θ}

$p_{\theta}$

J'ai essayé de modifier mon commentaire pour plus de clarté et je n'ai pas été autorisé à le faire. Faites-moi savoir si vous souhaitez plus de détails.

meh

\frac{\partial}{\partial θ_{i}} \log p_{θ} (x) = 1 / p_{θ} (x) \frac{\partial}{\partial θ_{i}} p_{θ} (x)

$\frac{\partial}{\partial\theta_i}\log p_{\theta}(x)=1/p_{\theta}(x)\frac{\partial}{\partial\theta_i}p_{\theta}(x)$

{d θ_{i}}

$\{d\theta_i\}$

{\frac{\partial}{\partial θ_{i}}}

$\{\frac{\partial}{\partial\theta_i}\}$

d θ

$d\theta$

p_{θ}

$p_{\theta}$

Clarification de la géométrie de l'information

Réponses: