Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1253

Use GROUP BY to defer STAR JOIN expansion



    • Type: New Feature
    • Status: Open
    • Resolution: Unresolved
    • Affects Version/s: BLAZEGRAPH_RELEASE_1_5_1
    • Fix Version/s: None
    • Component/s: B+Tree
    • Labels:


      Various groups have now proposed approaches that use GROUP_BY to avoid multiplying out the solutions in a STAR JOIN, instead propagating the nested triple groups (or more accurately variable binding groups). This has numerous potential advantages.

      Related tickets include:
      - BLZG-760 JoinGroup optimizations

      Relevant publications:
      - Hyeongsik Kim, Padmashree Ravindra, and Kemafor Anyanwu "A Semantics-Oriented Storage Model for Big Heterogeneous RDF Data" at ISWC 2014 (poster, but this is Kim's PhD Thesis).
      - See http://www4.ncsu.edu/~hkim22/research.html for several more related articles, workshops, etc. Anything around "nesting" is probably related work.
      - Pham, M. D.
      - Linnea, P.
      - Erling, O.
      - Boncz, P.A. Deriving an Emergent Relational Schema from RDF Data. Proceedings of International World Wide Web Conference 2015 (WWW 0)
      - Pham, M. D.
      - Boncz, P.A. MonetDB/RDF: Discovering and Exploiting the Emergent Schema of RDF Data ERCIM News, 96, p.41?42.
      - See also Minh-Duc Pham http://www.dama.upc.edu/seminars/3rd-graph-ta/3rd-graph-ta-program as probably directly related to these ideas. "We motivate and describe techniques that allow to detect an 'emergent' relational schema from RDF data. We show that on a wide variety of datasets, the found structure explains well over 90% of the RDF triples. Further, we also describe technical solutions to the semantic challenge to give short names that humans find logical to these emergent tables, columns and relationships between tables."
      - "Workload matters: Why RDF databases need a new design", G Alu?, MT Ozsu, K Daudjee
      - Proceedings of the VLDB Endowment, 2014.
      - Andreas Harth and I discussed the concept of deferred star joins several years ago. I am not sure if he has any publications that relate to this topic.

      My feeling is that standard triple (or quads) indices already provide efficient storage models for schema flexible tables which allow multiple values in cells and that these concepts can all be realized against a standard triple store.

      Implementing this will require some changes in how we serialize/store/interchange binding sets.




            michaelschmidt michaelschmidt
            bryanthompson bryanthompson
            0 Vote for this issue
            2 Start watching this issue