Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1757

Small variation in query introduces heavy Memory Pressure

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed - Won't Fix
    • Priority: Medium
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Using a local system with ~2G heap on a dataset with 4M triples

      The following query works fine finishing in ~100ms:

      base <http://deleteme.com/>
      prefix syapse: </graph/syapse#>
      prefix sci: </bdm/api/kbobject/sci:sci:>
      SELECT *
      WHERE
      {
            ?p rdfs:range </vocabulary/nlm/rxnorm#> .
            ?p syapse:hasLiteralProperty ?q .
            FILTER ( ?p != sci:rxnormCd)
            graph </graph/abox> {
                ?s ?q ?term .
            }
            OPTIONAL {
                graph </graph/abox> {
                    ?s ?p ?old .
                }
            }
      } LIMIT 100
      

      Changing the query to:

      base <http://deleteme.com/>
      prefix syapse: </graph/syapse#>
      prefix sci: </bdm/api/kbobject/sci:sci:>
      SELECT *
      WHERE
      {
            ?p rdfs:range </vocabulary/nlm/rxnorm#> .
            ?p syapse:hasLiteralProperty ?q .
            FILTER ( ?p != sci:rxnormCd)
            graph </graph/abox> {
                OPTIONAL {
                   ?s ?p ?old .
                }
                ?s ?q ?term .
            }
      } LIMIT 100
      

      creates a lot of memory pressure and makes the GC fail with:

      java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      	at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:281)
      

      I assume this can be executed in analytic mode to use direct memory, but the problem here is that a small change in the query, maintaining the semantics, can bring the system to consume a lot of memory.

      Please find attached a data generator, which could be used to generate ~40M triples to reproduce the issue, which can be executed like:

      $ python generator.py 'http://deleteme.com' 1000000
      

        Attachments

          Activity

            People

            Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            edgarr Edgar R
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: