Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-5736

Strings with quotes are not properly encoded in federated queries

    Details

      Description

      When using federated query via SERVICE, strings with quotes are not properly encoded, resulting in invalid SPARQL being sent out. For example, this is one of the queries produced (with prefixes removed):

      SELECT  ?o
      WHERE {
      
        <http://www.wikidata.org> <http://schema.org/dateModified> ?o .
        
      }
      VALUES ( ?o) {
      ( "sto fatto o sta cosa ? no tipo 'e nato elemento"@nap )
      ( "dette elementet er et konkret objekt/eksemplar (instans) av denne klassen, kategorien eller objektgruppen"@nb )
      ( "dit item is n eksemplaor/instansie van t tweede item (Veurbeeld: "Mark Rutte" is nen "politieker")"@nds-nl )
      ( "???? ??? ??????? ?? ?????? ?????? ? ???? ??? ?????? ????? ????? ??????"@ne )
      ( "dit item is een exemplaar (instantie) van deze groep elementen"@nl )
      ( "natura de, expression de o exemplar de"@oc )
      ( "stanowi przyk?ad (jest elementem) danej kategorii/klasy"@pl )
      ( "este item ? uma inst?ncia deste outro item"@pt )
      ( "este item ? uma inst?ncia deste outro item"@pt-br )
      ( "????"@rif )
      ( "acest element este un exemplar din clasa definit? de acel element"@ro )
      ( "?????? ??????? ???????????? ????? ?????????? ?????? (????????? / ??????? ??????) ??????, ????????? ??? ?????? ????????"@ru )
      ( "sta cosa ? n'esimplari cuncretu di sta classi, di sta catigur?a, o di stu cuncettu"@scn )
      ( "je konkretna izvedba objekta v razredu, kategoriji ali skupini objektov"@sl )
      ( "??? ?????? ?? ????????? ??????? (????????) ?????, ?????????? ??? ????? ????????"@sr )
      ( "??? ?????? ?? ????????? ??????? (????????) ?????, ?????????? ??? ????? ????????"@sr-ec )
      ( "ova stavka je konkretan objekat (instanca) klase, kategorije ili grupe objekata"@sr-el )
      ( "?r ett konkret objekt (instans) av denna klass, kategori eller objektgrupp"@sv )
      ( "??? ??????? ? ?????????? ????????? ??'???? (??????? ???????) ?????? ?????, ????????? ?? ????? ??'?????"@uk )
      ( "kho?n m?c n?y l? m?t th?c th? c?a kho?n m?c kia"@vi )
      }
      

      Note the third value - quotes there should be escaped, but they are not.

      Looks like the bug is in AST2SPARQLUtil.toExternal(Literal) - it does not do any encoding for the literal. This can be fixed by either using the three-quotes syntax (https://www.w3.org/TR/sparql11-query/#QSynLiterals) - still needs to check for three quotes in text though - or escaping quotes and probably other characters not acceptable in SPARQL strings. According to https://www.w3.org/TR/sparql11-query/#rSTRING_LITERAL2, #x22#x5C#xA#xD should be escaped.

        Activity

        Hide
        stasmalyshev stasmalyshev added a comment -

        Created pull request: https://github.com/blazegraph/database/pull/48

        Unfortunately, could not find where SPARQL generation is tested, so didn't add tests. If anybody points me to it would add.

        Show
        stasmalyshev stasmalyshev added a comment - Created pull request: https://github.com/blazegraph/database/pull/48 Unfortunately, could not find where SPARQL generation is tested, so didn't add tests. If anybody points me to it would add.
        Hide
        stasmalyshev stasmalyshev added a comment -
        Show
        stasmalyshev stasmalyshev added a comment - cc michaelschmidt
        Hide
        michaelschmidt michaelschmidt added a comment -

        Hi Stas, sure, your fix makes sense. However, I realized that this issue had been fixed in master by Igor quite a while ago, his proposal was:

            public String toExternal(final Literal lit) {
        
                final String label = lit.getLabel();
                
                final String languageCode = lit.getLanguage();
                
                final URI datatypeURI = lit.getDatatype();
        
                final String datatypeStr = datatypeURI == null ? null
                        : toExternal(datatypeURI);
        
                final StringBuilder sb = new StringBuilder((label.length() + 2)
                        + (languageCode != null ? (languageCode.length() + 1) : 0)
                        + (datatypeURI != null ? datatypeStr.length() + 2 : 0));
        
                sb.append('"');
                sb.append(SPARQLUtil.encodeString(label));
                sb.append('"');
        
                if (languageCode != null) {
                    sb.append('@');
                    sb.append(languageCode);
                } else if (datatypeURI != null && !XMLSchema.STRING.equals(datatypeURI)) {
                    sb.append("^^");
                    sb.append(datatypeStr);
                }
        
                return sb.toString();
        
            }
        

        I would prefer using the Sesame function here, so I would suggest to go with that one.

        Brad Bebee I guess this has been part of the RDF1.1 activities, it's not in 2.1.4 but in the 2.2 RC branch. Would suggest to cherry pick this commit:

        commit 2ec9cae48cecc21cd93535d5ff88ce55d3c75395
        Author: Igor Kim <igor.kim@ms2w.com> 2016-07-01 04:00:21
        Committer: Igor Kim <igor.kim@ms2w.com> 2016-07-01 04:00:21
        Parent: a6fc4a4675143e9d1144aafa38fc50352eca7f73 (AST2SPARQLUtil should not add STRING datatype for literals)
        Branches: BLZG-2082, BLZG-2085, BLZG-2089, BLZG-4323, BLZG-4476, master, origin/2.1.4_2041_merge, origin/2.1.4_2041_merge_bbt, origin/2.1.4_reverse_merge, origin/2.2.0_RC, origin/2.2.0_RC2, origin/2092, origin/2_2_0_RC_BLZG-2078, origin/BLAZEGRAPH_RELEASE_CANDIDATE_2_2_0, origin/BLZG-1679, origin/BLZG-1915, origin/BLZG-2023, origin/BLZG-2030, origin/BLZG-2042, origin/BLZG-2057, origin/BLZG-2063, origin/BLZG-2075 and 19 more branches
        
        AST2SPARQLUtil.toExternal changed to escape literals;
        TestRemoteSparqlBuilderFactory.test_service_009 updated with literal
        requiring escaping
        

        Could you please have a look?

        Show
        michaelschmidt michaelschmidt added a comment - Hi Stas, sure, your fix makes sense. However, I realized that this issue had been fixed in master by Igor quite a while ago, his proposal was: public String toExternal( final Literal lit) { final String label = lit.getLabel(); final String languageCode = lit.getLanguage(); final URI datatypeURI = lit.getDatatype(); final String datatypeStr = datatypeURI == null ? null : toExternal(datatypeURI); final StringBuilder sb = new StringBuilder((label.length() + 2) + (languageCode != null ? (languageCode.length() + 1) : 0) + (datatypeURI != null ? datatypeStr.length() + 2 : 0)); sb.append('"'); sb.append(SPARQLUtil.encodeString(label)); sb.append('"'); if (languageCode != null ) { sb.append('@'); sb.append(languageCode); } else if (datatypeURI != null && !XMLSchema.STRING.equals(datatypeURI)) { sb.append( "^^" ); sb.append(datatypeStr); } return sb.toString(); } I would prefer using the Sesame function here, so I would suggest to go with that one. Brad Bebee I guess this has been part of the RDF1.1 activities, it's not in 2.1.4 but in the 2.2 RC branch. Would suggest to cherry pick this commit: commit 2ec9cae48cecc21cd93535d5ff88ce55d3c75395 Author: Igor Kim <igor.kim@ms2w.com> 2016-07-01 04:00:21 Committer: Igor Kim <igor.kim@ms2w.com> 2016-07-01 04:00:21 Parent: a6fc4a4675143e9d1144aafa38fc50352eca7f73 (AST2SPARQLUtil should not add STRING datatype for literals) Branches: BLZG-2082, BLZG-2085, BLZG-2089, BLZG-4323, BLZG-4476, master, origin/2.1.4_2041_merge, origin/2.1.4_2041_merge_bbt, origin/2.1.4_reverse_merge, origin/2.2.0_RC, origin/2.2.0_RC2, origin/2092, origin/2_2_0_RC_BLZG-2078, origin/BLAZEGRAPH_RELEASE_CANDIDATE_2_2_0, origin/BLZG-1679, origin/BLZG-1915, origin/BLZG-2023, origin/BLZG-2030, origin/BLZG-2042, origin/BLZG-2057, origin/BLZG-2063, origin/BLZG-2075 and 19 more branches AST2SPARQLUtil.toExternal changed to escape literals; TestRemoteSparqlBuilderFactory.test_service_009 updated with literal requiring escaping Could you please have a look?
        Hide
        beebs Brad Bebee added a comment -

        Cherry-picked into 2_1_5_RC branch.

        Show
        beebs Brad Bebee added a comment - Cherry-picked into 2_1_5_RC branch.

          People

          • Assignee:
            beebs Brad Bebee
            Reporter:
            stasmalyshev stasmalyshev
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: