Details

    • Type: Bug
    • Status: Closed - Won't Fix
    • Resolution: Cannot Reproduce
    • Affects Version/s: BIGDATA_RELEASE_1_2_2
    • Fix Version/s: None
    • Component/s: Bigdata SAIL
    • Labels:
      None

      Description

      This ticket concerns the compile time constant:
      com.bigdata.relation.accesspath.BlockingBuffer.NSPIN

      The suggested resolution is to:
      a) replace this with two compile time constants, one for each of the two usages
      and
      EITHER
      b) substantial increase the value for the usage in _hasNext(long) (e.g. to 100000)
      OR
      c) add a configuration option for that usage
      OR
      d) both b) and c)

      ====

      The commentary is based on test results, and I will start with a description of the tests, then of the test hardware.

      The test dataset and queries

      The test dataset is approximately: 57,000 quads of test data conforming to Syapse proprietary ontologies, over 24 named graphs.

      The test queries are 11 moderately easy SPARQL queries involving only a few joins each.

      The test queries are generated by some python code, and the test harness asks the 11 queries three times over, in series, using the SPARQL end point.

      The test harness also allows the possibility of starting multiple parallel clients, and we consider
      a) a single client asking the 11 * 3 queries
      or
      b) six parallel clients asking a total of 6 * 11 * 3 queries: notice the queries are asked in the same order by each parallel client which maximizes both potential contention and potential cache hits

      The test hardware

      The primary hardware is my development machine which is a MacBook Pro with SSD and a quad core processor (with hyperthreading, for 8 threads).

      The secondary hardware is an AWS instance

      Linux/2.6.32-355-ec2 amd64
      Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz Family 6 Model 44 Stepping 2, GenuineIntel #CPU=2
      Sun Microsystems Inc. 1.6.0_27
      freeMemory=509153376
      

      The issue

      Prior to any changes at r7390 we observed the following conundrum.

      For a single client the 33 queries take a wall-time of 13.3s, but the client CPU time is 1.6s and the server CPU time is 1.8s leaving 10 seconds missing
      - even assuming synchronized single threading (on a quad core box )

      Doing 6 parallel clients performance is somewhat better. The wall-time is 14.5s , the client CPU time is 14.3s and the server CPU time is 10.4s seconds. Still only about 20% load (8 threads x 14.5), but a lot better than in the one client case.

      Using yourkit, with wall time measurements and tracing, drew attention to the _hasNext() method above, and the

      With the adjustments described above, the system goes a lot faster.

      Single client: wall-time 2.4s client CPU 1.6s server CPU 1.9s

      Six parallel clients: wall-time 5.1s client CPU 14.5s server CPU 6.1s

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I would look at this under BSBM. While there is an impact, I was only able to find worse parameter settings for BSBM, not better ones. By the time we have concurrent query mixes, I think that there is nearly always more work to be done.

        Another way to keep down context switches is to move to increase vectoring and use asynchronous IO (Java 7 AIO). This would require:


        - An AIO thread to schedule disk IOs and dispatch results. This could go into the memoizer pattern.


        - Currently, most joins are nested index joins and we do index reads in the JOIN operators and do not attempt to pre-queue the disk pages that we need to access. Thus, we are doing contexts switches inside of the JOINs. As the vector size increases, the number of relevant tuples within a key-range span increases and it becomes more valuable to pre-materialize the pages for the key-range and then scan those tuples.


        - Moving intermediate solutions off of the Java object heap, e.g., onto NIO buffers. However, Java 7 might not have as much of a GC overhead problem, so moving the data off of the heap might not be as critical.


        - Using a larger vector size.


        - Removing the BlockingBuffer from any context in which we always process the buffer entirely (as a consumer) or fill it entirely (as a producer). The old query logic used asynchronous iterator patterns. The newer QueryEngine does not rely on this. It uses chunked execution instead. The logic that dispatches chunks to operators will automatically combine chunks. For contexts in which we no longer need BlockingBuffer, a linked or array backed queue or deque could replace it.

        The advantage of increased vectorization is that there are fewer context switches. Column-wise storage (on the disk and for the intermediate solutions) would also provide increased CPU vectorization and increased data density. Thus, what I have outlined above is basically a road map for moving towards a column-wise vectored evaluation. We would not see as much of a performance gain from the current indexing approach because it does too much work to decode index tuples and does the IO inside of the JOINs, which would force context switches.

        Bryan

        Show
        bryanthompson bryanthompson added a comment - I would look at this under BSBM. While there is an impact, I was only able to find worse parameter settings for BSBM, not better ones. By the time we have concurrent query mixes, I think that there is nearly always more work to be done. Another way to keep down context switches is to move to increase vectoring and use asynchronous IO (Java 7 AIO). This would require: - An AIO thread to schedule disk IOs and dispatch results. This could go into the memoizer pattern. - Currently, most joins are nested index joins and we do index reads in the JOIN operators and do not attempt to pre-queue the disk pages that we need to access. Thus, we are doing contexts switches inside of the JOINs. As the vector size increases, the number of relevant tuples within a key-range span increases and it becomes more valuable to pre-materialize the pages for the key-range and then scan those tuples. - Moving intermediate solutions off of the Java object heap, e.g., onto NIO buffers. However, Java 7 might not have as much of a GC overhead problem, so moving the data off of the heap might not be as critical. - Using a larger vector size. - Removing the BlockingBuffer from any context in which we always process the buffer entirely (as a consumer) or fill it entirely (as a producer). The old query logic used asynchronous iterator patterns. The newer QueryEngine does not rely on this. It uses chunked execution instead. The logic that dispatches chunks to operators will automatically combine chunks. For contexts in which we no longer need BlockingBuffer, a linked or array backed queue or deque could replace it. The advantage of increased vectorization is that there are fewer context switches. Column-wise storage (on the disk and for the intermediate solutions) would also provide increased CPU vectorization and increased data density. Thus, what I have outlined above is basically a road map for moving towards a column-wise vectored evaluation. We would not see as much of a performance gain from the current indexing approach because it does too much work to decode index tuples and does the IO inside of the JOINs, which would force context switches. Bryan
        Hide
        jeremycarroll jeremycarroll added a comment -

        I have run some new performance tests based on the syapse application.
        These show consistently better behavior with the default value of NSPIN than the modified value.

        These new tests are based on bigdata 1.3.1, using Linux images on an AWS m3.medium instance type

        At some point, rerunning the tests above may be worthwhile
        - to see whether some code change since this ticket was filed has addressed some underlying problem

        Show
        jeremycarroll jeremycarroll added a comment - I have run some new performance tests based on the syapse application. These show consistently better behavior with the default value of NSPIN than the modified value. These new tests are based on bigdata 1.3.1, using Linux images on an AWS m3.medium instance type At some point, rerunning the tests above may be worthwhile - to see whether some code change since this ticket was filed has addressed some underlying problem
        Hide
        jeremycarroll jeremycarroll added a comment -

        Further testing using a c3.xlarge for a slightly harder work load gave different results.

        This machine is pretty powerful for the workload, and the test queries did not take the CPU to 100%, even with the high NSPIN value. i.e. the high NSPIN value meant that we were spinning cycles that otherwise would be wasted.

        Normal NSPIN value
        - test name first field, last field is the time in seconds.

        retrieval:None:0:0.303774166107
        update:None:0:2.61060380936
        simple_syql:25898:0:1.7988945961
        syql3:9459:0:2.63849382401
        harder_syql:9459:0:71.7355894089
        browse_first_click:27457:0:4.93990316391
        browse_rdf_type_facet:None:0:0.603485202789
        smaller_browse:166:0:0.23685798645
        browse_one_gene:265:0:3.09802641869
        browse_two_genes:171:0:2.42231502533
        

        High NSPIN value

        retrieval:None:0:0.330000782013
        update:None:0:2.28383278847
        simple_syql:25898:0:1.93887019157
        syql3:9459:0:2.93343296051
        harder_syql:9459:0:18.4483426094
        browse_first_click:27457:0:3.43643183708
        browse_rdf_type_facet:None:0:0.674526405334
        smaller_browse:166:0:0.195732593536
        browse_one_gene:265:0:1.73067779541
        browse_two_genes:171:0:2.3030310154
        

        Each test corresponds to one or more SPARQL queries
        - we see that the majority of the queries seem to do a little better with the normal NSPIN value, whereas one query does appallingly badly. In fact we could put this down to an optimization issue because the test harder_syql and the test syql3 both correspond to a single complex SPARQL query which is in fact identical in semantics.

        Show
        jeremycarroll jeremycarroll added a comment - Further testing using a c3.xlarge for a slightly harder work load gave different results. This machine is pretty powerful for the workload, and the test queries did not take the CPU to 100%, even with the high NSPIN value. i.e. the high NSPIN value meant that we were spinning cycles that otherwise would be wasted. Normal NSPIN value - test name first field, last field is the time in seconds. retrieval:None:0:0.303774166107 update:None:0:2.61060380936 simple_syql:25898:0:1.7988945961 syql3:9459:0:2.63849382401 harder_syql:9459:0:71.7355894089 browse_first_click:27457:0:4.93990316391 browse_rdf_type_facet:None:0:0.603485202789 smaller_browse:166:0:0.23685798645 browse_one_gene:265:0:3.09802641869 browse_two_genes:171:0:2.42231502533 High NSPIN value retrieval:None:0:0.330000782013 update:None:0:2.28383278847 simple_syql:25898:0:1.93887019157 syql3:9459:0:2.93343296051 harder_syql:9459:0:18.4483426094 browse_first_click:27457:0:3.43643183708 browse_rdf_type_facet:None:0:0.674526405334 smaller_browse:166:0:0.195732593536 browse_one_gene:265:0:1.73067779541 browse_two_genes:171:0:2.3030310154 Each test corresponds to one or more SPARQL queries - we see that the majority of the queries seem to do a little better with the normal NSPIN value, whereas one query does appallingly badly. In fact we could put this down to an optimization issue because the test harder_syql and the test syql3 both correspond to a single complex SPARQL query which is in fact identical in semantics.
        Hide
        jeremycarroll jeremycarroll added a comment -

        This was the problem query (harder_syql)

        base <https://test-finland-jjc.syapse.com/>
        prefix owl: <http://www.w3.org/2002/07/owl#>
        prefix based: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/based:>
        prefix xsd: <http://www.w3.org/2001/XMLSchema#>
        prefix demod: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/demod:>
        prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        prefix demo: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/demo:>
        prefix dc: <http://purl.org/dc/elements/1.1/>
        prefix sys: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/sys:>
        prefix s: <https://test-finland-jjc.syapse.com/bdm/api/>
        prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        prefix bds: <http://www.bigdata.com/rdf/search#>
        prefix sysd: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/sysd:>
        prefix base: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/base:>
        prefix skos: <http://www.w3.org/2004/02/skos/core#>
        prefix syapse: <https://test-finland-jjc.syapse.com/graph/syapse#>
        SELECT *
        FROM <https://test-finland-jjc.syapse.com/graph/syapse>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/demo>
        FROM <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/base>
        FROM <https://test-finland-jjc.syapse.com/graph/vocabulary>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/sys>
        FROM <https://test-finland-jjc.syapse.com/graph/cat-co/abox>
        FROM <https://test-finland-jjc.syapse.com/graph/django/cat-co>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/syapse>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/demo>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/base>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/vocabulary>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/sys>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/abox>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/django/cat-co>
        
        WITH {
        SELECT DISTINCT $SIZE_LIMIT_EXCEEDED $j__1 ?KbPatientOmicsRecord_A
        WHERE {
        
          INCLUDE %__MainQuery
          OPTIONAL {
            ?KbPatientOmicsRecord_A sys:label $j__1
          }
          { SELECT (COUNT(*) as $CHECK_LIMIT) {
            INCLUDE %__MainQuery
          }}
          BIND ( ( $CHECK_LIMIT = 174925 ) AS $SIZE_LIMIT_EXCEEDED )
        }} AS %__FullQuery
        WITH {
        SELECT *
        WHERE {
        
          { SELECT DISTINCT ?KbPatientOmicsRecord_A {
            { ?KbPatientOmicsRecord_A sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          ?KbPatientOmicsRecord_A rdf:type / syapse:subClassOf demo:KbPatientOmicsRecord .
          ?KbPatientOmicsRecord_A demo:hasKbPatient ?KbPatient_B .
          { SELECT DISTINCT ?KbPatient_B {
            { ?KbPatient_B sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatient_B sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatient_B sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          ?KbPatient_B rdf:type / syapse:subClassOf demo:KbPatient .
          { SELECT DISTINCT ?KbPatientRecord_C {
            { ?KbPatientRecord_C sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatientRecord_C sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatientRecord_C sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientCancerDiagnosis .
          ?KbPatientRecord_C demo:diagnosis / skos:broader * /  skos:prefLabel | skos:altLabel 'Kidney renal papillary cell carcinoma'@x-term  .
          ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientRecord .
          ?KbPatientRecord_C syapse:part? $j__2 .
          $j__2 $j__3 ?KbPatient_B
          FILTER ( ?KbPatientRecord_C != ?KbPatient_B )
        }
        LIMIT 174925} AS %__MainQuery
        
        WHERE {
          hint:Query hint:queryId '9e2fc6d4-e2c9-11e3-bb2c-06f835ccde4d' .
        
        { SELECT (COUNT(*) AS $S__COUNT)
          WHERE {
            INCLUDE %__FullQuery
          }
         }
         INCLUDE %__FullQuery
        
        }
        ORDER BY ?j__1 DESC(?j__1) ?KbPatientOmicsRecord_A
        LIMIT 10
        
        Show
        jeremycarroll jeremycarroll added a comment - This was the problem query (harder_syql) base <https://test-finland-jjc.syapse.com/> prefix owl: <http://www.w3.org/2002/07/owl#> prefix based: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/based:> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix demod: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/demod:> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix demo: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/demo:> prefix dc: <http://purl.org/dc/elements/1.1/> prefix sys: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/sys:> prefix s: <https://test-finland-jjc.syapse.com/bdm/api/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix bds: <http://www.bigdata.com/rdf/search#> prefix sysd: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/sysd:> prefix base: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/base:> prefix skos: <http://www.w3.org/2004/02/skos/core#> prefix syapse: <https://test-finland-jjc.syapse.com/graph/syapse#> SELECT * FROM <https://test-finland-jjc.syapse.com/graph/syapse> FROM <https://test-finland-jjc.syapse.com/graph/ontology/demo> FROM <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary> FROM <https://test-finland-jjc.syapse.com/graph/ontology/base> FROM <https://test-finland-jjc.syapse.com/graph/vocabulary> FROM <https://test-finland-jjc.syapse.com/graph/ontology/sys> FROM <https://test-finland-jjc.syapse.com/graph/cat-co/abox> FROM <https://test-finland-jjc.syapse.com/graph/django/cat-co> FROM NAMED <https://test-finland-jjc.syapse.com/graph/syapse> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/demo> FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/base> FROM NAMED <https://test-finland-jjc.syapse.com/graph/vocabulary> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/sys> FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/abox> FROM NAMED <https://test-finland-jjc.syapse.com/graph/django/cat-co> WITH { SELECT DISTINCT $SIZE_LIMIT_EXCEEDED $j__1 ?KbPatientOmicsRecord_A WHERE { INCLUDE %__MainQuery OPTIONAL { ?KbPatientOmicsRecord_A sys:label $j__1 } { SELECT (COUNT(*) as $CHECK_LIMIT) { INCLUDE %__MainQuery }} BIND ( ( $CHECK_LIMIT = 174925 ) AS $SIZE_LIMIT_EXCEEDED ) }} AS %__FullQuery WITH { SELECT * WHERE { { SELECT DISTINCT ?KbPatientOmicsRecord_A { { ?KbPatientOmicsRecord_A sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } ?KbPatientOmicsRecord_A rdf:type / syapse:subClassOf demo:KbPatientOmicsRecord . ?KbPatientOmicsRecord_A demo:hasKbPatient ?KbPatient_B . { SELECT DISTINCT ?KbPatient_B { { ?KbPatient_B sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatient_B sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatient_B sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } ?KbPatient_B rdf:type / syapse:subClassOf demo:KbPatient . { SELECT DISTINCT ?KbPatientRecord_C { { ?KbPatientRecord_C sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatientRecord_C sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatientRecord_C sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientCancerDiagnosis . ?KbPatientRecord_C demo:diagnosis / skos:broader * / skos:prefLabel | skos:altLabel 'Kidney renal papillary cell carcinoma'@x-term . ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientRecord . ?KbPatientRecord_C syapse:part? $j__2 . $j__2 $j__3 ?KbPatient_B FILTER ( ?KbPatientRecord_C != ?KbPatient_B ) } LIMIT 174925} AS %__MainQuery WHERE { hint:Query hint:queryId '9e2fc6d4-e2c9-11e3-bb2c-06f835ccde4d' . { SELECT (COUNT(*) AS $S__COUNT) WHERE { INCLUDE %__FullQuery } } INCLUDE %__FullQuery } ORDER BY ?j__1 DESC(?j__1) ?KbPatientOmicsRecord_A LIMIT 10
        Hide
        jeremycarroll jeremycarroll added a comment -

        The performance issue seems to really hinge on the property paths

        demo:diagnosis / skos:broader * /  skos:prefLabel | skos:altLabel
        

        and

        syapse:part? 
        

        Rewriting the query to eliminate these as:

        base <https://test-finland-jjc.syapse.com/>
        prefix owl: <http://www.w3.org/2002/07/owl#>
        prefix based: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/based:>
        prefix xsd: <http://www.w3.org/2001/XMLSchema#>
        prefix demod: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/demod:>
        prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        prefix demo: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/demo:>
        prefix dc: <http://purl.org/dc/elements/1.1/>
        prefix sys: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/sys:>
        prefix s: <https://test-finland-jjc.syapse.com/bdm/api/>
        prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        prefix bds: <http://www.bigdata.com/rdf/search#>
        prefix sysd: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/sysd:>
        prefix base: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/base:>
        prefix skos: <http://www.w3.org/2004/02/skos/core#>
        prefix syapse: <https://test-finland-jjc.syapse.com/graph/syapse#>
        SELECT *
        FROM <https://test-finland-jjc.syapse.com/graph/syapse>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/demo>
        FROM <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/base>
        FROM <https://test-finland-jjc.syapse.com/graph/vocabulary>
        FROM <https://test-finland-jjc.syapse.com/graph/ontology/sys>
        FROM <https://test-finland-jjc.syapse.com/graph/cat-co/abox>
        FROM <https://test-finland-jjc.syapse.com/graph/django/cat-co>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/syapse>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/demo>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/base>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/vocabulary>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/sys>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/abox>
        FROM NAMED <https://test-finland-jjc.syapse.com/graph/django/cat-co>
        
        WITH {
        SELECT DISTINCT $SIZE_LIMIT_EXCEEDED $j__1 ?KbPatientOmicsRecord_A
        WHERE {
        
          INCLUDE %__MainQuery
          OPTIONAL {
            ?KbPatientOmicsRecord_A sys:label $j__1
          }
          { SELECT (COUNT(*) as $CHECK_LIMIT) {
            INCLUDE %__MainQuery
          }}
          BIND ( ( $CHECK_LIMIT = 174925 ) AS $SIZE_LIMIT_EXCEEDED )
        }} AS %__FullQuery
        WITH {
        SELECT *
        WHERE {
        
          ?KbPatientOmicsRecord_A rdf:type / syapse:subClassOf demo:KbPatientOmicsRecord .
          ?KbPatientOmicsRecord_A demo:hasKbPatient ?KbPatient_B .
          ?KbPatient_B rdf:type / syapse:subClassOf demo:KbPatient .
          ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientCancerDiagnosis .
          ?KbPatientRecord_C demo:diagnosis /  skos:prefLabel  'Kidney renal papillary cell carcinoma'@x-term  .
          ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientRecord .
          ?KbPatientRecord_C syapse:partlet $j__2 .
          $j__2 $j__3 ?KbPatient_B .
            { SELECT DISTINCT ?KbPatientOmicsRecord_A {
            { ?KbPatientOmicsRecord_A sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          { SELECT DISTINCT ?KbPatient_B {
            { ?KbPatient_B sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatient_B sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatient_B sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          { SELECT DISTINCT ?KbPatientRecord_C {
            { ?KbPatientRecord_C sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> }
            UNION
            { ?KbPatientRecord_C sys:assignedProject / syapse:isPrivate false .
            }
            UNION
            { ?KbPatientRecord_C sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> .
            }
        } }
          FILTER ( ?KbPatientRecord_C != ?KbPatient_B )
        }
        LIMIT 174925} AS %__MainQuery
        
        WHERE {
        
        { SELECT (COUNT(*) AS $S__COUNT)
          WHERE {
            INCLUDE %__FullQuery
          }
         }
         INCLUDE %__FullQuery
        
        }
        ORDER BY ?j__1 DESC(?j__1) ?KbPatientOmicsRecord_A
        LIMIT 10
        

        and it takes less than 2 seconds instead of over 70

        Show
        jeremycarroll jeremycarroll added a comment - The performance issue seems to really hinge on the property paths demo:diagnosis / skos:broader * / skos:prefLabel | skos:altLabel and syapse:part? Rewriting the query to eliminate these as: base <https://test-finland-jjc.syapse.com/> prefix owl: <http://www.w3.org/2002/07/owl#> prefix based: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/based:> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix demod: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/demod:> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix demo: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/demo:> prefix dc: <http://purl.org/dc/elements/1.1/> prefix sys: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/sys:> prefix s: <https://test-finland-jjc.syapse.com/bdm/api/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix bds: <http://www.bigdata.com/rdf/search#> prefix sysd: <https://test-finland-jjc.syapse.com/bdm/api/appindividual/sysd:> prefix base: <https://test-finland-jjc.syapse.com/bdm/api/kbobject/base:> prefix skos: <http://www.w3.org/2004/02/skos/core#> prefix syapse: <https://test-finland-jjc.syapse.com/graph/syapse#> SELECT * FROM <https://test-finland-jjc.syapse.com/graph/syapse> FROM <https://test-finland-jjc.syapse.com/graph/ontology/demo> FROM <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary> FROM <https://test-finland-jjc.syapse.com/graph/ontology/base> FROM <https://test-finland-jjc.syapse.com/graph/vocabulary> FROM <https://test-finland-jjc.syapse.com/graph/ontology/sys> FROM <https://test-finland-jjc.syapse.com/graph/cat-co/abox> FROM <https://test-finland-jjc.syapse.com/graph/django/cat-co> FROM NAMED <https://test-finland-jjc.syapse.com/graph/syapse> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/demo> FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/vocabulary> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/base> FROM NAMED <https://test-finland-jjc.syapse.com/graph/vocabulary> FROM NAMED <https://test-finland-jjc.syapse.com/graph/ontology/sys> FROM NAMED <https://test-finland-jjc.syapse.com/graph/cat-co/abox> FROM NAMED <https://test-finland-jjc.syapse.com/graph/django/cat-co> WITH { SELECT DISTINCT $SIZE_LIMIT_EXCEEDED $j__1 ?KbPatientOmicsRecord_A WHERE { INCLUDE %__MainQuery OPTIONAL { ?KbPatientOmicsRecord_A sys:label $j__1 } { SELECT (COUNT(*) as $CHECK_LIMIT) { INCLUDE %__MainQuery }} BIND ( ( $CHECK_LIMIT = 174925 ) AS $SIZE_LIMIT_EXCEEDED ) }} AS %__FullQuery WITH { SELECT * WHERE { ?KbPatientOmicsRecord_A rdf:type / syapse:subClassOf demo:KbPatientOmicsRecord . ?KbPatientOmicsRecord_A demo:hasKbPatient ?KbPatient_B . ?KbPatient_B rdf:type / syapse:subClassOf demo:KbPatient . ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientCancerDiagnosis . ?KbPatientRecord_C demo:diagnosis / skos:prefLabel 'Kidney renal papillary cell carcinoma'@x-term . ?KbPatientRecord_C rdf:type / syapse:subClassOf demo:KbPatientRecord . ?KbPatientRecord_C syapse:partlet $j__2 . $j__2 $j__3 ?KbPatient_B . { SELECT DISTINCT ?KbPatientOmicsRecord_A { { ?KbPatientOmicsRecord_A sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatientOmicsRecord_A sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } { SELECT DISTINCT ?KbPatient_B { { ?KbPatient_B sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatient_B sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatient_B sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } { SELECT DISTINCT ?KbPatientRecord_C { { ?KbPatientRecord_C sys:owner <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> } UNION { ?KbPatientRecord_C sys:assignedProject / syapse:isPrivate false . } UNION { ?KbPatientRecord_C sys:assignedProject / syapse:member <https://test-finland-jjc.syapse.com/bdm/api/syuser/5> . } } } FILTER ( ?KbPatientRecord_C != ?KbPatient_B ) } LIMIT 174925} AS %__MainQuery WHERE { { SELECT (COUNT(*) AS $S__COUNT) WHERE { INCLUDE %__FullQuery } } INCLUDE %__FullQuery } ORDER BY ?j__1 DESC(?j__1) ?KbPatientOmicsRecord_A LIMIT 10 and it takes less than 2 seconds instead of over 70

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            jeremycarroll jeremycarroll
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: