Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-692

nxparser fails with uppercase language tag

    Details

      Description

      I am hitting an NPE inside nxparser for the tbl 6-degrees of freedom crawl at [1]. I see this in the log:

      Aug 22, 2012 10:25:59 AM org.semanticweb.yars.nx.Literal getData WARNING: Something wrong with the literal-backing string. The parsing regex pattern didn't match. Check the string for correct N3 syntax. The malicious string is: "Up to 11.9"@FR
      

      And then this trace.

      Caused by: java.lang.NullPointerException
      	at org.semanticweb.yars.nx.util.NxUtil.unescape(NxUtil.java:178)
      	at org.semanticweb.yars.nx.util.NxUtil.unescape(NxUtil.java:164)
      	at org.semanticweb.yars.nx.Literal.toString(Literal.java:235)
      	at com.bigdata.rdf.rio.nquads.NQuadsParser.parse(NQuadsParser.java:297)
      	at com.bigdata.rdf.rio.nquads.NQuadsParser.parse(NQuadsParser.java:178)
      

      AndreasHarth wrote:

      The issue is an uppercase language string.
      
      If you change PATTERN in Literal.java (add A-Z to the regex):
      private static final Pattern PATTERN = Pattern
      		
      .compile("(?:\"(.*)\")(?:@([a-zA-Z]+(?:-[a-zA-Z0-9]+)*)|\\^\\^(<\\S+>))?");
      
      it'll parse fine.
      
      We're looking into the issue to decide where to put in a fix (probably
      do a toLowerCase() for language tags).
      

      I have added a unit test for bigdata which verifies the problem.

      You can work around the problem by modifying the nxparser source code as indicated above. The bug is against nxparser 1.2.2. There is a bug report against nxparser for this as well
      - see http://code.google.com/p/nxparser/issues/detail?id=9

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Added unit test to demostrate a problem with the handling of uppercase language tags for literals in nxparser.

        @see https://sourceforge.net/apps/trac/bigdata/ticket/590 (nxparser fails with uppercase language tag)

        Committed revision r6479.

        Show
        bryanthompson bryanthompson added a comment - Added unit test to demostrate a problem with the handling of uppercase language tags for literals in nxparser. @see https://sourceforge.net/apps/trac/bigdata/ticket/590 (nxparser fails with uppercase language tag) Committed revision r6479.
        Hide
        bryanthompson bryanthompson added a comment -

        Introduced a bug in the previous commit where I had refactored the literal handling for nquads. This fixes the bug and also adds a unit test for the handling of escape codes in literals for nquads.

        Committed revision r6480

        Show
        bryanthompson bryanthompson added a comment - Introduced a bug in the previous commit where I had refactored the literal handling for nquads. This fixes the bug and also adds a unit test for the handling of escape codes in literals for nquads. Committed revision r6480
        Hide
        bryanthompson bryanthompson added a comment -

        Changed nxparser dependency to 1.2.3 to close out this ticket.

        Committed revision r7136.

        Show
        bryanthompson bryanthompson added a comment - Changed nxparser dependency to 1.2.3 to close out this ticket. Committed revision r7136.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: