I'm currently participating in developing a knowledge graph that uses ConceptNet and a few others as its data sources. It uses the same architecture as ConceptNet namely it is stored as a hypergraph (edges connect an arbitrarily number of nodes).
My main focus now is in figuring out how to find, so called, "contextual nodes" (ConceptNet terminology) given a set of (user-supplied) nodes that are related to a specific context. It should work similar to how we humans rank associations between parts of knowledge;
For example the n inputs X: "lock", "cell", "police", "guard", "prisoner" should associate strongly with "jail".
My key questions around designing this algorithm are:
- How should I rank the edges between the nodes?; by their distance to origin or by their connectiveness to origin?
How should I combine these ranks?; that is, if
A
-0.5->
B-0.5->
C,does that infer
A
-1.0->
C (additively minimum distance)or
A
-0.25->
C (multiplicatively maximum (strongest) weight)? Provided that all distances or weights respectively are normalized to lie between 0 and 1.
The additive variant is natural to use in the classical shortest path algorithm when edges are ranked on smallest distance. This solution however becomes unintuitive mathematically if we want to combine strengths because adding distances worsens their total rank. We could of course average the distances, but then how do we promote two nodes connected with several edges each of distance d over two nodes connected with fewer edges each with the same distance d? If we on the other hand combine serial edges multiplicatively and parallel edges additive we get a mathematically more natural expression for paths with higher connectiveness.
My current solution is to use a set of traversers T that "lazily" traverses the graph from all the graph nodes associated with the elements in X and after some user specified amount of time stop the traversal, and rank the union of all the traversed nodes according to some "good" ranking measure and then present N best of them to the user.
Further, I have a loose idea of extending my Dijkstra implementation by also storing "some combination" of all the edge weights from a previous node to the current node beside the "best" weight (smallest distance) in the weight/distance map. Have anybody hade a similar idea?
Aucun commentaire:
Enregistrer un commentaire