Details

    • Type: New Feature
    • Status: In Progress
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Bigdata RDF Database
    • Labels:
      None

      Description

      Implement the parallel decomposition of the RDFS+ closure [1,2,3].

      The implementation will be a mapped bigdata job, similar to how we handle high-through RDF data load. For the current architecture, the job will be distributed across the data services having shards for the POS index of the target knowledge base instance. For the HA architecture, it will be possible to task any machine since shard processing will be decoupled from shard location. Note that the RDFS rules as executed will only read on the POS index for the "data" (aka assertions) and only on the POS index for the ontology as well. This makes the decomposition embarassingly parallel against the POS index.

      The basic steps of the job are:

      1. Prepare the ontology.

      1a. Suck in the ontology & axioms from the KB. The axioms are available directly from the AbstractTripleStore object when it is materialized. The ontology needs to be acquired by a series of queries, which could be a program of rules executing in parallel against the KB each of which does a constrained access path scan.

      1b. Compute the closure over the ontology of those rules for which all inferences will draw only on the ontology (for example, rdfs5 and rdfs11). Once we have this "closure" we will only need the POS index for the ontology. Computing the closure will require both the SPO and POS indices.

      1c. Write the ontology onto a file available to all nodes. The location of that file can be specified as part of the job state.

      2. Assign shards to workers.

      3. Wait for the workers to complete.

      The worker steps are:

      W1. For each shard on the DS:

      W2a. Compute the closure of the POS shard against the ontology, bringing a POSOnly temporary triple store instance (the focus store) to fixed point. This is basically an automated rewrite of the RDFS closure program against the POS shard, the ontology from step (1) and the focus store, which will contain the materialized entailments.

      W2b. Use the asynchronous index write API to redistribute the materialized entailments from the focus store to the SPO, POS, and OSP shards of the KB instance.

      W3: Close the asynchronous write buffers, causing all writes to be flushed to the KB indices. When all workers complete this step the RDFS closure will be completely materialized.

      Related issues:


      - Explore optimizations which allow us to avoid iterating over rule(s) to achieve fixed point.


      - A POSOnly mode should be implemented for the AbstractTripleStore. This can be used for the ontology once it reaches fixed point.

      [1] http://www.bigdata.com/blog/2009/10/parallel-materialization-of-rdfs.html

      [2] Jesse Weaver, James A. Hendler. Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples, In Proceedings of the 8th International Semantic Web Conference, pp. 682--697, 2009.

      [3] Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen. Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands, Scalable Distributed Reasoning using MapReduce, In Proceedings of the 8th International Semantic Web Conference, 2009.

        Activity

        There are no comments yet on this issue.

          People

          • Assignee:
            mikepersonick mikepersonick
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: