Status: In Progress
Affects Version/s: BIGDATA_RELEASE_1_1_0
Fix Version/s: None
Component/s: Query Engine
We should reconcile the older rule engine code while undertaking this task. The rule engine supports conjunctive query with filters, but is expressed solely at the IV (Internal Value) layer and does not allow full SPARQL constructs (subquery, hash joins, EXISTS or NOT EXISTS, etc). However, it does support programs which can be a mixture of sequences of programs or programs which can be run in parallel. A program is either an ordered or a parallel set of rules or a single rule. The IStep interface is the common abstraction with IProgram and IRule as children. Steps can also be marked with a transitive closure operator, in which case they will run to fixed point. The fixed point is identified by a mutation count of zero.
The rule engine logic still depends on the older pipeline join code and does not run through the newer query engine code base. This it is not expressed in terms of BOPs (bigdata operators). Refactoring the rule engine to run as BOPs on the query engine will allow us to get rid of quite a bit of code which is only used to support inference.
There are some special considerations for the rule engine:
1. Fixed point requires the ability to advance the read-behind point in such a manner as to always read from a fully consistent view while writing on either a temporary graph or the database. The reader and writer must be independent. Writes must not become visible until a particular "step" is done, and then they must become visible before the next step is evaluated.
2. The writer uses mutable index views. None of the BOPs currently use mutable views so we need to develop some unit tests around mutation support.
3. Truth maintenance uses the rule engine. That integration needs to be examined. It is mostly localized in InferenceEngine#computeClosure() and within the TMUtility (rule rewrite for truth maintenance) and TruthMaintenance (logic for truth maintenance under assertion and retraction, including maintenance and use of proof chains).
4. We currently lack an interchange syntax for rules. Ideally they can be expressed in SPARQL (QUERY + UPDATE) and organized within a declarative framework which allows us to express the notions of composition of rules into programs, parallel versus serial evaluation of rules, and whether or not a rule or program must be run to a fixed point.
5. We have not yet implemented the parallel decomposition of RDFS+ closure against the POS index shards. This should be very straight forward, but may require an extension to the BOP model to support the fixed point of the ontology, the replication of the ontology (after fixed point) to each DS node in the cluster, and access to that ontology from within the context of the inference rules.
6. The need to access a temporary, replicated, and possibly maintained closure of the ontology on each cluster node is related to a more general requirement for temporary graphs and temporary solution sets whose life cycle is not limited to a single query or update operation.
7. The vectoring for rules, database-at-once-closure, and truth maintenance should be examined when doing performance tuning against rules running on the query engine.
Related tickets include:
 https://sourceforge.net/apps/trac/bigdata/ticket/448 (SPARQL 1.1 Update)
 https://sourceforge.net/apps/trac/bigdata/ticket/255 (Retro fit the rule engine to attach constraints dynamically to predicates)
 https://sourceforge.net/apps/trac/bigdata/ticket/180 (Migrate rule engine to query engine)
 https://sourceforge.net/apps/trac/bigdata/ticket/19 (Parallel decomposition of RDFS+ closure)