Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-9146

Timeout issues and High Query Execution times

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Bigdata RDF Database
    • Labels:
      None

      Description

      We are facing multiple timeout issues and very high query execution time (more than 60 sec.) on our Blazegraph instance. While executing those queries we face a 100% cpu utilization and very high load on all the 16 cpus.
      We are running Blazegraph on a high end large EC2 instance with 192GB ram, Intel Xeon 16cpu.

      The current `wdq` namespace properties are

      com.bigdata.namespace.wdq.spo.com.bigdata.btree.BTree.branchingFactor | 1024
      com.bigdata.relation.container | wdq
      com.bigdata.namespace.wdq.spo.OSP.com.bigdata.btree.BTree.branchingFactor | 64
      com.bigdata.rwstore.RWStore.smallSlotType | 1024
      com.bigdata.journal.AbstractJournal.bufferMode | DiskRW
      com.bigdata.journal.AbstractJournal.file | /mnt/wikidata/wikidata.jnl
      com.bigdata.namespace.wdq.spo.SPO.com.bigdata.btree.BTree.branchingFactor | 600
      com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass | org.wikidata.query.rdf.blazegraph.WikibaseVocabulary$V003
      com.bigdata.rdf.store.AbstractTripleStore.textIndex | false
      com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDatatypeConfig.0 | {"config": {"uri":"http://www.opengis.net/ont/geosparql#wktLiteral","literalSerializer":"org.wikidata.query.rdf.blazegraph.inline.literal.WKTSerializer","fields":[{"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LONGITUDE"},{"valueType":"DOUBLE","multiplier":"1000000000","serviceMapping":"LATITUDE"},{"valueType":"LONG","multiplier":"1","minValue":"0","serviceMapping":"COORD_SYSTEM"}]}}
      com.bigdata.journal.AbstractJournal.initialExtent | 209715200
      com.bigdata.rdf.store.AbstractTripleStore.geoSpatialIncludeBuiltinDatatypes | false
      com.bigdata.btree.BTree.branchingFactor | 128
      com.bigdata.namespace.wdq.lex.com.bigdata.btree.BTree.branchingFactor | 400
      com.bigdata.rdf.store.AbstractTripleStore.extensionFactoryClass | org.wikidata.query.rdf.blazegraph.WikibaseExtensionFactory
      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass | com.bigdata.rdf.axioms.NoAxioms
      com.bigdata.service.AbstractTransactionService.minReleaseAge | 1
      com.bigdata.rdf.sail.bufferCapacity | 100000
      com.bigdata.rdf.sail.truthMaintenance | false
      com.bigdata.namespace.wdq.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor | 800
      com.bigdata.journal.AbstractJournal.maximumExtent | 209715200
      com.bigdata.rdf.store.AbstractTripleStore.geoSpatialDefaultDatatype | http://www.opengis.net/ont/geosparql#wktLiteral
      com.bigdata.rdf.sail.namespace | wdq
      com.bigdata.relation.class | com.bigdata.rdf.store.LocalTripleStore
      com.bigdata.rdf.store.AbstractTripleStore.quads | false
      com.bigdata.journal.AbstractJournal.writeCacheBufferCount | 1000
      com.bigdata.relation.namespace | wdq
      com.bigdata.rdf.store.AbstractTripleStore.inlineURIFactory | org.wikidata.query.rdf.blazegraph.WikibaseInlineUriFactory
      com.bigdata.btree.writeRetentionQueue.capacity | 4000
      com.bigdata.journal.AbstractJournal.historicalIndexCacheTimeout | 5
      com.bigdata.journal.AbstractJournal.historicalIndexCacheCapacity | 20
      com.bigdata.rdf.store.AbstractTripleStore.geoSpatial | true
      com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers | false
      com.bigdata.namespace.wdq.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor | 128
      

      I have attached here https://gist.github.com/loretoparisi/372b84e791887b6a734478e2b217646e the BlazeGraph performances table

      An example of queries that are hanging are the following:

      SELECT DISTINCT ?music_track ?music_trackLabel ?artist ?artistLabel ?album ?albumLabel ?publication_date WHERE {
            SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
            
            { ?music_track wdt:P31 wd:Q134556. }
            UNION
            { ?music_track wdt:P31 wd:Q7366. }
            UNION
            { ?music_track wdt:P31 wd:Q2188189. }
            UNION
            { ?music_track wdt:P31 wd:Q207628. }
            UNION
            { ?music_track wdt:P31 wd:Q7302866. }
            UNION
            { ?music_track wdt:P31 wd:Q9748. }
            UNION
            { ?music_track wdt:P2207 ?_spotifyTrackID_. }
            
            ?music_track rdfs:label ?music_trackLabel.
      
            ?music_track wdt:P175 ?artist. 
            ?artist rdfs:label ?artistLabel.
      
            OPTIONAL { ?music_track wdt:P361 ?album.}
            OPTIONAL { ?music_track wdt:P577 ?publication_date. }
      
            FILTER regex(?music_trackLabel, "^I Don't Need It Anymore (Interlude)$", "i")
            FILTER (regex(?artistLabel, "christina aguilera", "i"))
          }
          LIMIT 10
      

      and

      SELECT DISTINCT ?artist ?artistLabel ?birth_nameLabel  ?date ?website ?instagramID ?facebookID ?twitterID ?musicBrainzArtistID WHERE {
            SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
              
              { ?artist wdt:P106 wd:Q2252262. }
              UNION
              { ?artist wdt:P106 wd:Q177220. }
              UNION
              { ?artist wdt:P106 wd:Q639669. }
              UNION
              { ?artist wdt:P106 wd:Q2643890. }
              UNION
              { ?artist wdt:P106 wd:Q488205. }
              UNION
              { ?artist wdt:P434 ?musicBrainzArtistID. }       
      
      
              FILTER(REGEX(?artistLabel, "^Christina Aguilera$", "i"))
              ?artist rdfs:label ?artistLabel.
              
              OPTIONAL { ?artist wdt:P571 ?date. }
              OPTIONAL { ?artist wdt:P1477 ?birth_nameLabel. }
              OPTIONAL { ?artist wdt:P856 ?website. }
              OPTIONAL { ?artist wdt:P2003 ?instagramID. }
              OPTIONAL { ?artist wdt:P2013 ?facebookID. }
              OPTIONAL { ?artist wdt:P2002 ?twitterID. }
      
      
          }
          LIMIT 10
      

      the same queries executed on `query.wikidata.org` work just fine:

      https://query.wikidata.org/sparql?format=json&query=%0A%20%20%20%20SELECT%20DISTINCT%20%3Fmusic_track%20%3Fmusic_trackLabel%20%3Fartist%20%3FartistLabel%20%3Falbum%20%3FalbumLabel%20%3Fpublication_date%20WHERE%20%7B%0A%20%20%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%20%20%20%20%20%20%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ134556.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ7366.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ2188189.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ207628.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ7302866.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP31%20wd%3AQ9748.%20%7D%0A%20%20%20%20%20%20UNION%0A%20%20%20%20%20%20%7B%20%3Fmusic_track%20wdt%3AP2207%20%3F_spotifyTrackID_.%20%7D%0A%20%20%20%20%20%20%0A%20%20%20%20%20%20%3Fmusic_track%20rdfs%3Alabel%20%3Fmusic_trackLabel.%0A%0A%20%20%20%20%20%20%3Fmusic_track%20wdt%3AP175%20%3Fartist.%20%0A%20%20%20%20%20%20%3Fartist%20rdfs%3Alabel%20%3FartistLabel.%0A%0A%20%20%20%20%20%20OPTIONAL%20%7B%20%3Fmusic_track%20wdt%3AP361%20%3Falbum.%7D%0A%20%20%20%20%20%20OPTIONAL%20%7B%20%3Fmusic_track%20wdt%3AP577%20%3Fpublication_date.%20%7D%0A%0A%20%20%20%20%20%20FILTER%20regex%28%3Fmusic_trackLabel%2C%20%22%5ECandyman%24%22%2C%20%22i%22%29%0A%20%20%20%20%20%20FILTER%20%28regex%28%3FartistLabel%2C%20%22christina%20aguilera%22%2C%20%22i%22%29%29%0A%20%20%20%20%7D%0A%20%20%20%20LIMIT%2010%0A%20%20%20%20
      

      Using the *EXPLAIN* I can see a internal `timeout` value set to *600000*. The full explain log is here https://gist.github.com/loretoparisi/526d421cff84bf248a84cdc52930d1c9.

      In some cases these queries can take significative amount of time, but it does not end up with a timeout issues (less than 60 seconds).
      The REST clients are running in Node.js using `http` native modules, and the connection/socket timeout are correctly setup.

        Attachments

          Activity

            People

            Assignee:
            beebs Brad Bebee
            Reporter:
            loretoparisi Loreto Parisi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: