Details

    • Type: Sub-task
    • Status: Done
    • Resolution: Done
    • Affects Version/s: BIGDATA_RELEASE_1_3_4
    • Fix Version/s: BLAZEGRAPH_2_2_0
    • Component/s: NanoSparqlServer
    • Labels:
      None

      Description

      When posting/loading an N-Triples files to the NanoSparqlServer, the content type application/n-triples is not properly recognized (which is the official type as per http://www.w3.org/TR/n-triples/#n-triples-mediatype)

      The response is:

      <<<<<<<<<
      HTTP/1.1 400 Bad Request
      Server: Apache-Coyote/1.1
      Content-Type: text/plain;charset=ISO-8859-1
      Transfer-Encoding: chunked
      Date: Tue, 11 Nov 2014 14:20:55 GMT
      Connection: close
      
      Content-Type not recognized as RDF: application/n-triples
      >>>>>>>>>>
      

      It works fine with text/plain, though, as workaround.

      See BLZG-1147 (openrdf 2.8)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The parsers are generally found based on their associations with RDFFormat objects. Those are declared by openrdf (when they are bundled with that platform) and otherwise by bigdata. In general, the incantation looks like this (from InsertServlet):

                final RDFFormat format = RDFFormat
                        .forMIMEType(new MiniMime(contentType).getMimeType());
        
                if (format == null) {
        
                    buildResponse(resp, HTTP_BADREQUEST, MIME_TEXT_PLAIN,
                            "Content-Type not recognized as RDF: " + contentType);
        
                    return;
        
                }
        
                if (log.isInfoEnabled())
                    log.info("RDFFormat=" + format);
                
                final RDFParserFactory rdfParserFactory = RDFParserRegistry
                        .getInstance().get(format);
        
                if (rdfParserFactory == null) {
        
                    buildResponse(resp, HTTP_INTERNALERROR, MIME_TEXT_PLAIN,
                            "Parser factory not found: Content-Type=" + contentType
                                    + ", format=" + format);
        
                    return;
        
                }
        
        

        Based on this, I would expect that the problem is in RDFFormat.NTRIPLES. At least in 2.6.10 it was declared as follows:

        	public static final RDFFormat NTRIPLES = new RDFFormat("N-Triples", "text/plain",
        			Charset.forName("US-ASCII"), "nt", false, false);
        

        Looking at grepcode

        http://grepcode.com/file/repo1.maven.org/maven2/org.openrdf.sesame/sesame-rio-api/2.7.0-beta1/org/openrdf/rio/RDFFormat.java

        I see that this declaration is still in effect:

        	
        The N-Triples file format.
        The file extension .nt is recommend for N-Triples documents. The media type is text/plain and encoding is in 7-bit US-ASCII.
        See also:
        http://www.w3.org/TR/rdf-testcases/.ntriples
        60 
        61 	public static final RDFFormat NTRIPLES = new RDFFormat("N-Triples", "text/plain",
        62 			Charset.forName("US-ASCII"), "nt", false, false);
        

        Thus I would say that this is clearly an openrdf bug.

        Show
        bryanthompson bryanthompson added a comment - The parsers are generally found based on their associations with RDFFormat objects. Those are declared by openrdf (when they are bundled with that platform) and otherwise by bigdata. In general, the incantation looks like this (from InsertServlet): final RDFFormat format = RDFFormat .forMIMEType(new MiniMime(contentType).getMimeType()); if (format == null) { buildResponse(resp, HTTP_BADREQUEST, MIME_TEXT_PLAIN, "Content-Type not recognized as RDF: " + contentType); return; } if (log.isInfoEnabled()) log.info("RDFFormat=" + format); final RDFParserFactory rdfParserFactory = RDFParserRegistry .getInstance().get(format); if (rdfParserFactory == null) { buildResponse(resp, HTTP_INTERNALERROR, MIME_TEXT_PLAIN, "Parser factory not found: Content-Type=" + contentType + ", format=" + format); return; } Based on this, I would expect that the problem is in RDFFormat.NTRIPLES. At least in 2.6.10 it was declared as follows: public static final RDFFormat NTRIPLES = new RDFFormat("N-Triples", "text/plain", Charset.forName("US-ASCII"), "nt", false, false); Looking at grepcode http://grepcode.com/file/repo1.maven.org/maven2/org.openrdf.sesame/sesame-rio-api/2.7.0-beta1/org/openrdf/rio/RDFFormat.java I see that this declaration is still in effect: The N-Triples file format. The file extension .nt is recommend for N-Triples documents. The media type is text/plain and encoding is in 7-bit US-ASCII. See also: http://www.w3.org/TR/rdf-testcases/.ntriples 60 61 public static final RDFFormat NTRIPLES = new RDFFormat("N-Triples", "text/plain", 62 Charset.forName("US-ASCII"), "nt", false, false); Thus I would say that this is clearly an openrdf bug.
        Hide
        bryanthompson bryanthompson added a comment -

        Peter,

        Would you mind filing this with openrdf and cross linking the tickets?

        Thanks,
        Bryan

        Show
        bryanthompson bryanthompson added a comment - Peter, Would you mind filing this with openrdf and cross linking the tickets? Thanks, Bryan
        Hide
        peterhaase peterhaase added a comment -

        It's actually fixed in Sesame 2.8.0
        from the feature list
        - Updates of the N-Quads, TriG and N-Triples parsers to the latest W3C specs.
        See also
        http://grepcode.com/file/repo1.maven.org/maven2/org.openrdf.sesame/sesame-rio-api/2.8.0-beta2/org/openrdf/rio/RDFFormat.java?av=f

        I suggest we wait until we upgrade to Sesame 2.8.
        The workaround with text/plain as media type is sufficient.

        Show
        peterhaase peterhaase added a comment - It's actually fixed in Sesame 2.8.0 from the feature list - Updates of the N-Quads, TriG and N-Triples parsers to the latest W3C specs. See also http://grepcode.com/file/repo1.maven.org/maven2/org.openrdf.sesame/sesame-rio-api/2.8.0-beta2/org/openrdf/rio/RDFFormat.java?av=f I suggest we wait until we upgrade to Sesame 2.8. The workaround with text/plain as media type is sufficient.
        Hide
        no_reply Tom Johnson added a comment -

        Is there a plan to upgrade and fix this issue? I'm running into it with `application/n-quads`. While it's easy enough to work around, it's a minor pain.

        Show
        no_reply Tom Johnson added a comment - Is there a plan to upgrade and fix this issue? I'm running into it with `application/n-quads`. While it's easy enough to work around, it's a minor pain.
        Hide
        bryanthompson bryanthompson added a comment -

        Yes. However, the openrdf upgrade has some potentially non-trivial aspects.

        Probably the easiest way to work around this on the server is to modify the ServiceProviderHook class. It registers MIME Types and parsers that are not supplied by openrdf. This is done in a static initialization block.

            static {
        
        		/*
        		 * Note: These MUST be declared before the forceLoad() call or they will
        		 * be NULL when that method runs.
        		 */
            	
            	TURTLE_RDR = new RDFFormat("Turtle-RDR",
        				Arrays.asList("application/x-turtle-RDR"),
        				Charset.forName("UTF-8"), Arrays.asList("ttlx"), true, false);
        		
            	NTRIPLES_RDR = new RDFFormat("N-Triples-RDR",
        				"application/x-n-triples-RDR", Charset.forName("US-ASCII"),
        				"ntx", false, false);
                
        		JSON_RDR = new RDFFormat("SPARQL/JSON", Arrays.asList(
        				"application/sparql-results+json", "application/json"),
        				Charset.forName("UTF-8"), Arrays.asList("srj", "json"),
        				RDFFormat.NO_NAMESPACES, RDFFormat.SUPPORTS_CONTEXTS);        
        		
                forceLoad();
        
            }
        

        Currently only blazegraph specific extensions are registered here, but registering a MIME Type declaration here, however the declaration listed above (from openrdf 2.7) could be modified as follows:

                NTRIPLES = new RDFFormat("N-Triples", Arrays.asList(
                        "application/n-triples", "text/plain"),
                        Charset.forName("UTF-8"), Arrays.asList("nt"), 
        //                ValueFactoryImpl
        //                        .getInstance().createURI(
        //                                "http://www.w3.org/ns/formats/N-Triples"),
                                        RDFFormat.NO_NAMESPACES, RDFFormat.SUPPORTS_CONTEXTS);
        

        and also add the following declaration:

            /**
             * The N-Triples file format. The file extension .nt is recommend for
             * N-Triples documents. The media type is application/n-triples and encoding
             * is in UTF-8. See also: N-Triples
             */
            public static final RDFFormat NTRIPLES;
        

        Finally, modify the forceLoad() method to also invoke

        		RDFFormat.register(NTRIPLES);
        

        Please note that the new MIME Type specifies UTF-8 while the original MIME Type specifies ASCII. I suspect that the N-Triples parser also needs to be modified in order to correctly handle UTF-8. This is why we have this as a dependency on BLZG-1147. Finally, there will be UTF-8 aware N-Triples documents that can not be processed with the older ASCII specification precisely because they rely on Unicode support.

        You are welcome to try this patch, and please provide feedback if you do. I suspect that the patch above will not work without also rolling forward to the openrdf 2.8 N-Triples parser. Perhaps that can be achieved as an isolated and backward compatible modification to the openrdf 2.7 version of the N-Triples parser?

        Thanks,
        Bryan

        Show
        bryanthompson bryanthompson added a comment - Yes. However, the openrdf upgrade has some potentially non-trivial aspects. Probably the easiest way to work around this on the server is to modify the ServiceProviderHook class. It registers MIME Types and parsers that are not supplied by openrdf. This is done in a static initialization block. static { /* * Note: These MUST be declared before the forceLoad() call or they will * be NULL when that method runs. */ TURTLE_RDR = new RDFFormat( "Turtle-RDR" , Arrays.asList( "application/x-turtle-RDR" ), Charset.forName( "UTF-8" ), Arrays.asList( "ttlx" ), true , false ); NTRIPLES_RDR = new RDFFormat( "N-Triples-RDR" , "application/x-n-triples-RDR" , Charset.forName( "US-ASCII" ), "ntx" , false , false ); JSON_RDR = new RDFFormat( "SPARQL/JSON" , Arrays.asList( "application/sparql-results+json" , "application/json" ), Charset.forName( "UTF-8" ), Arrays.asList( "srj" , "json" ), RDFFormat.NO_NAMESPACES, RDFFormat.SUPPORTS_CONTEXTS); forceLoad(); } Currently only blazegraph specific extensions are registered here, but registering a MIME Type declaration here, however the declaration listed above (from openrdf 2.7) could be modified as follows: NTRIPLES = new RDFFormat( "N-Triples" , Arrays.asList( "application/n-triples" , "text/plain" ), Charset.forName( "UTF-8" ), Arrays.asList( "nt" ), // ValueFactoryImpl // .getInstance().createURI( // "http://www.w3.org/ns/formats/N-Triples" ), RDFFormat.NO_NAMESPACES, RDFFormat.SUPPORTS_CONTEXTS); and also add the following declaration: /** * The N-Triples file format. The file extension .nt is recommend for * N-Triples documents. The media type is application/n-triples and encoding * is in UTF-8. See also: N-Triples */ public static final RDFFormat NTRIPLES; Finally, modify the forceLoad() method to also invoke RDFFormat.register(NTRIPLES); Please note that the new MIME Type specifies UTF-8 while the original MIME Type specifies ASCII. I suspect that the N-Triples parser also needs to be modified in order to correctly handle UTF-8. This is why we have this as a dependency on BLZG-1147 . Finally, there will be UTF-8 aware N-Triples documents that can not be processed with the older ASCII specification precisely because they rely on Unicode support. You are welcome to try this patch, and please provide feedback if you do. I suspect that the patch above will not work without also rolling forward to the openrdf 2.8 N-Triples parser. Perhaps that can be achieved as an isolated and backward compatible modification to the openrdf 2.7 version of the N-Triples parser? Thanks, Bryan
        Hide
        igorkim igorkim added a comment -

        Original issues is indeed fixed in branch BLZG-1147. PR https://github.com/SYSTAP/bigdata/pull/360

        Show
        igorkim igorkim added a comment - Original issues is indeed fixed in branch BLZG-1147 . PR https://github.com/SYSTAP/bigdata/pull/360
        Hide
        bryanthompson bryanthompson added a comment -

        Alexandre, please review this PR. Assign back to igorkim if there are any issues.

        Show
        bryanthompson bryanthompson added a comment - Alexandre, please review this PR. Assign back to igorkim if there are any issues.

          People

          • Assignee:
            alexr Alexandre Riazanov
            Reporter:
            peterhaase peterhaase
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: