Affects Version/s: BIGDATA_RELEASE_1_2_0
Fix Version/s: None
Component/s: Query Plan Generator
Refresh the star-join operator code, ideally moving it into its own join operator and test suite. The star join will be a clear winner on a cluster for an SPO join (which is the most common) but should NOT be chosen for a POS join without also testing the cardinality of the APs (POS can have very large cardinality, e.g., rdf:type foo ?x).
I am unconvinced that we will be able to observe a performance delta on a single machine due to the good spatial/temporal locality of star joins on a Journal, but it should be easy to observe on a cluster.
This is also closely related to several other issues dealing with the identification of star join patterns and chains hung off those star joins where the intermediate variables are not being projected out of the query. See [1,2].
This is also closely related to the triple groups approach and to the maintenance of tables per characteristic set (the SPO index is essentially a schema flexible table having all out-bound triples for a given subject and thus a perfectly valid means to encode a characteristic set with one such index per characteristic set).
Also see .
 http://sourceforge.net/apps/trac/bigdata/ticket/253 (BSBM Q5 performance optimization (hash join of at once subqueries)
 https://sourceforge.net/apps/trac/bigdata/ticket/551 (N-Way joins)
 https://sourceforge.net/apps/trac/bigdata/ticket/584 (Maintain DESCRIBE cache)