Recherche d'union dirigée

Considérons un graphe orienté $G$ sur lequel on peut ajouter dynamiquement des bords et faire des requêtes spécifiques.

Exemple: forêt à ensembles disjoints

Considérez l'ensemble de requêtes suivant:

arrow(u, v)
equiv(u, v)
find(u)

le premier ajoute une flèche au graphe, le second décide si , le dernier trouve un représentant canonique de la classe d'équivalence de , soit un tel que implique $u→v$ $u↔^*v$ $↔^*$ $r(u)$ $u↔^*v$ $r(v)=r(u)$ .

Il existe un algorithme bien connu utilisant la structure de données forestières à ensembles disjoints mettant en œuvre ces requêtes en complexité amortie quasi constante, à savoir . Notez que dans ce cas est implémenté en utilisant . $O(α(n))$ equivfind

Variante plus complexe

Maintenant, je suis intéressé par un problème plus complexe où les directions importent:

arrow(u, v)
confl(u, v)
find(u)

la première ajoute une flèche , la seconde décide s'il y a un nœud accessible depuis et , c'est-à-dire . Le dernier devrait retourner un objet tel que implique où devrait être facilement calculable. (Pour, disons, calculer $u→v$ $w$ $u$ $v$ $u→^*←^*v$ $r(u)$ $u→^*←^*v$ $r(u) \bullet r(v)$ $\bullet$ confl). Le but est de trouver une bonne structure de données telle que ces opérations soient rapides.

Cycles

The graph can contain cycles.

I don't know if there is a way to efficiently and incrementally compute the strongly connected components, in order to only consider DAGs for the main problem.

Of course I would appreciate a solution for DAGs, too. It would correspond to an incremental computation of the least common ancestor.

Naive approach

The disjoint-set forest data structure is not helpful here, since it disregards the direction of the edges. Note that $r(u)$ cannot be a single node, in the case the graph is not confluent.

One can define $r(u)=\{v ∣ u→^*v\}$ and to define $\bullet$ as $S_1\bullet S_2$ when $S_1 ∩ S_2≠∅$ . But how to compute this incrementally?

Probably that computing such a big set is not useful, a smaller set should be more interesting, as in the usual union-find algorithm.

algorithms graphs search-algorithms union-find jmad
la source

(Edit: completely rewrote my answer now that my understanding of the problem is (I hope) clearer.)

It sounds like this problem can be reduced to incrementally constructing and improving an approximation of the transitive closure of the graph, as the graph is built and searched.

$r(u)$ is, abstractly, the set of all nodes which are reachable from both $u$ and $v$ for every $v \ne u$ in the graph. (Of course, not all $u, v$ pairs will necessarily have a node that can be reached from both of them.) Unlike the case in union-find, this set can't be represented as a canonical representative node in the graph, because there may be nodes that are reachable from both $u$ and $v$ , and from both $u$ and $w$ , that are nonetheless not reachable from both $v$ and $w$ .

Say you maintain, for every $u$ , a set of nodes reachable from $u$ (I'll call this $R(u)$ ). These sets would of necessity be an extra data structure for each node, or at least, a set of extra "shortcut" edges in the graph. If you don't care about retaining the specified structure of the graph, there would be no need to distinguish between these edges and the specified edges.

I don't have any ideas offhand for a data structure that captures this that is more efficient than the general case (e.g. a bit vector or hash table,) but you can update these sets incrementally:

Each time you add an edge from $u$ to some other node $v$ , you set $R(u) = R(u) \cup R(v)$ .

Implement confl by first trying $R(u) \cap R(v)$ ; if that's non-empty, return true. But if it is empty, do two parallel breadth-first searches from $R(u)$ and $R(v)$ until you either exhaust both reachable sets or find a node in common. While you do this, also update $R(u)$ and $R(v)$ (and the $R$ 's of all the intermediate nodes you find) to include the reachable nodes that you found. ~~and if you do find a node in common, set R(u) = R(v) = R(u) \cup R(v)~~.

find(u) just returns $R(u)$ . The problem is that confl is not implemented purely in terms of find. I don't see how it could possibly be unless the algorithm was non-incremental (i.e. pre-compute all the $R$ sets of all nodes with the transitive closure of the graph.) However, the incremental approach should still give you fairly good amortized cost, although I have no idea if it approaches $O(\alpha(n))$ offhand. (It probably doesn't. A false answer from confl requires that you start up two BFS'es even when your $R$ sets are saturated; this also seems unavoidable unless the algorithm is made non-incremental.)

This sounds a lot like it might be a special case of La Poutré and van Leeuwen's methods for maintaining transitive closure of a graph.

I realize this doesn't fully answer the question, but perhaps it serves to clarify it, and someone who has more experience with graph algorithms can give a better data structure for encoding the $R$ sets.

Chris Pressey
la source

Thank you for your answer, I hope I made my question more clear now: I don't care about connected components (but the strongly CCs might be helpful to the final solution); I don't have

r (u)

$r(u)$ yet and this

r (u)

$r(u)$ cannot be a single node in a DAG.

jmad

OK, that's a bit clearer. It would seem that

r (u)

$r(u)$ is, abstractly, the set of of all nodes that are reachable from both

u

$u$ and

v

$v$ for every

v \neq u

$v \ne u$ in the graph. This set could be a set of "shortcut" edges off of

u

$u$ , I think, and then this begins to look like computing the transitive closure of reachability in the graph. I still don't immediately see why this couldn't be done incrementally (compress paths as you find them) although it would likely require more storage/work (label/update all "shortcut" edges) than union-find. Does this make sense?

Chris Pressey

Assuming transitive closure is a fair way to characterize

r (u)

$r(u)$ , this sounds like it would be closely related: en.wikipedia.org/wiki/…

Chris Pressey

I don't think confl(u,v) should merge

R (u)

$R(u)$ and

R (v)

$R(v)$ . It could modify them, but that would be already done by the call to find, like in the disjoint-set forest method.

jmad

You're right that it shouldn't merge them; I'll edit the answer. But calls to find really can't compute anything useful, because there is no unique object to "find" except

r (u)

$r(u)$ , which

R (u)

$R(u)$ approximates. (How would find know what to look for, to make updates? It is only given

u

$u$ but the information in

R (u)

$R(u)$ potentially applies to every other node in the graph.)

Chris Pressey

Recherche d'union dirigée

Exemple: forêt à ensembles disjoints

Variante plus complexe

Cycles

Naive approach

Réponses: