GraphChi is a vertex-centric graph processing platform from the GraphLab group. Unlike GraphLab and PowerGraph, GraphChi is designed for disk. Their sole index is over the input-edges. It is partitioned by the sourceId (O) and then ordered within each partition by the target edge (S). They use a Parallel Sliding Window (PSW) over the different shards. The PWS algorithm requires P^2 passes over the shards for each iteration (e.g., BFS pass) over the graph.
Given our existing database indices on SPO, OSP, and POS, it seems that we could do this in O(2 * P) IOs rather than O(P^2). We would use a scan on the OSP index to gather the in-edges, and a scan on the SPO index to gather the out-edges. For a given block of the OSP index that we read into memory, we would generate the 1-hop neighborhoods using what amounts to a join on OSP.O = SPO.S. Once we have some set of O vertex subgraphs materialized, we can then apply the update rule. Since we are doing clustered index scans, we could partition those scans into parallel iterators each of which reads on "partition" of the OSP index. I.e., break the OSP index into chunks of 100k vertices. Those parallel scans might make it easier to keep a GPU busy with the join on OSP.O = SPO.S and the actual application of the update rule.
We could do this nearly as easily on a cluster. The bigdata cluster generates key-range shards dynamically. The modified graph chi approach could be distributed to each of the nodes and run for each OSP index shard. A full pass would require a complete read on the SPO index shards. However, we would only need to read the key-range from the SPO index corresponding to the key-range of the O values on each OSP index shard. Thus, this is still a parallel scan of the OSP index and a parallel scan of the SPO index per BFS iteration. We have the infrastructure to setup that distributed process and do those remote SPO index key-range reads.
The main thing that would be different on the cluster is that we can not rely on the linear list abstraction (indexOf(key), keyAt(index)) for a shard view unless we have done a compacting merge on that view. This is because the shard views are comprised of several backing files (recent log-structured writes plus zero or more batch-build btree file segments that are in key order on the disk). When we do a compacting merge, all of that gets turned into a single index file in key order on the disk. When the file gets too large (200M), the shard is split.
The implication is that on a single node database instance, we have fast indexOf(key) operations. That method can be used to index directly into the link values and the vertex values, which are written to the disk using sequential IO. On the cluster, we would have to use a more structured file for the link and vertex values.
GraphChi also has an option to schedule which vertices will be computed. Presumably that winds up as state on the disk. Perhaps a flag combined with the per-vertex values that are updated on each iteration.
GraphChi provides an ordering guarantee. It appears that this is simply that A will be updated before B if A is ordered LT B in the natural order of the vertices. When B is in a different OSP shard, this is trivially guaranteed. When A and B are in the same OSP shard, we need to serialize their update ordering if there is a dependency (A=>B) where B is also in that OSP shard. So, for all O[i] in an OSP shard, we can evaluate O[i] in parallel unless an out-edge (found in the corresponding SPO shard) has a vertex in the source OSP shard as a target (ie., a 1-step cycle within the OSP shard).
See https://sourceforge.net/apps/trac/bigdata/ticket/630 (Modify SIDs indexing for better locality).