Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1947

DISTINCT over predicates and VALUES clause do not go along very well

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_2_2_0
    • Component/s: None
    • Labels:
      None

      Description

      Hello,
      I'm running Blazegraph 2.1.1 with DBpedia and it usually works awesome, but today I stumbled upon some strange behavior. When I pose the following query to the endpoint

      select distinct ?p
      where
      {
        ?s ?p ?o.
        values ?s {
                            <http://dbpedia.org/resource/Michael_Collins_(astronaut)>
                            <http://dbpedia.org/resource/Neil_Armstrong>
        }
      }
      limit 5
      

      I get the following results:

      <?xml version='1.0' encoding='UTF-8'?>
      <sparql xmlns='http://www.w3.org/2005/sparql-results#'>
              <head>
                      <variable name='p'/>
              </head>
              <results>
                      <result>
                              <binding name='p'>
                                      <literal xml:lang='en'></literal>
                              </binding>
                      </result>
                      <result>
                              <binding name='p'>
                                      <literal xml:lang='en'>_</literal>
                              </binding>
                      </result>
                      <result>
                              <binding name='p'>
                                      <literal xml:lang='en'>-</literal>
                              </binding>
                      </result>
                      <result>
                              <binding name='p'>
                                      <literal xml:lang='en'>-gauge</literal>
                              </binding>
                      </result>
                      <result>
                              <binding name='p'>
                                      <literal xml:lang='en'>-ism</literal>
                              </binding>
                      </result>
              </results>
      </sparql>
      

      By no means I expect p to be a literal. The query behaves as expected in the official DBpedia SPARQL endpoint. Also, if I remove distinct or one of the bindings from the values clause, I get the expected results. If I add a clause filter(isIRI(?p)) the problem disappears, but when I add a clause filter(!isIRI(?p)) I get an empty set of results (which is correct, but inconsistent with the original result).

        Activity

        Hide
        beebs Brad Bebee added a comment -
        Show
        beebs Brad Bebee added a comment - Adding Michael Schmidt .
        Hide
        beebs Brad Bebee added a comment -
        Show
        beebs Brad Bebee added a comment - Adding Alexandre Riazanov
        Hide
        michaelschmidt michaelschmidt added a comment -

        This is probably related to an invalid use of the distinct term scan op (an internal optimization that we use for simple DISTINCT queries) in conjunction with VALUES injection. Will try to reproduce that on a small toy data set.

        Show
        michaelschmidt michaelschmidt added a comment - This is probably related to an invalid use of the distinct term scan op (an internal optimization that we use for simple DISTINCT queries) in conjunction with VALUES injection. Will try to reproduce that on a small toy data set.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Was able to reproduce, TC in branch https://github.com/SYSTAP/bigdata/tree/blzg-1947 which will be used to further investigate the issue.

        Show
        michaelschmidt michaelschmidt added a comment - Was able to reproduce, TC in branch https://github.com/SYSTAP/bigdata/tree/blzg-1947 which will be used to further investigate the issue.
        Hide
        jpotoniec Jedrzej Potoniec added a comment -

        I guess this is another version of the same problem:

        select distinct ?l 
        where
        {
           ?s <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?l.
          values ?s {<http://dbpedia.org/resource/Warsaw> <http://dbpedia.org/resource/Berlin>                                                  
          }
        }
        

        With distinct it apparently gives me graph URIs, e.g.

        <binding name='l'>
                                        <uri>file:/media/dbpedia/archives/gz/geo-coordinates_en.ttl.gz</uri>
                                </binding>
        

        , without distinct works all right.

        Show
        jpotoniec Jedrzej Potoniec added a comment - I guess this is another version of the same problem: select distinct ?l where { ?s <http: //www.w3.org/2003/01/geo/wgs84_pos# long > ?l. values ?s {<http: //dbpedia.org/resource/Warsaw> <http://dbpedia.org/resource/Berlin> } } With distinct it apparently gives me graph URIs, e.g. <binding name='l'> <uri>file:/media/dbpedia/archives/gz/geo-coordinates_en.ttl.gz</uri> </binding> , without distinct works all right.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Yes, very likely it is – added a test case with the same query structure to the branch.

        Show
        michaelschmidt michaelschmidt added a comment - Yes, very likely it is – added a test case with the same query structure to the branch.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Proposed fix implemented, PR CI now: https://github.com/SYSTAP/bigdata/pull/417

        Show
        michaelschmidt michaelschmidt added a comment - Proposed fix implemented, PR CI now: https://github.com/SYSTAP/bigdata/pull/417
        Hide
        michaelschmidt michaelschmidt added a comment -

        Merged down the fixes in https://github.com/SYSTAP/bigdata/pull/417. This fix resolves both issues.

        Show
        michaelschmidt michaelschmidt added a comment - Merged down the fixes in https://github.com/SYSTAP/bigdata/pull/417 . This fix resolves both issues.

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            jpotoniec Jedrzej Potoniec
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: