Details

      Description

      Implement capabilities to query external Solr Service.

      As part of these activities, we might consider implementing an "isRunLast" paradigm, to make sure that variables (and not only constants) can be used as input. Currently, ServiceOptionsBase only offeres an "isRunFirst()" interface (which is used in ASTJoinOrderByTypeOptimizer). We need to add this interface and its implementation.isRunLast() mode for services

        Activity

        Hide
        mschmidt00 Michael Schmidt added a comment -

        Merged branch Solr, following is a description of what's been done:

        • FTS.java is a good entry point, defining and documenting the magic vocabulary. Note that the class also contains Options, which allow to set system wide defaults for, e.g., the index to be queries, or the timeout. These options can be specified in GraphStore.properties.
        • A new type of service call, MockIVReturningServiceCall, was introduced. It differs from the ExternalService in that its interface is not Sesame based, but generic in the sense that the service simply returns mocked IVs. This might be quite useful for implementing other services, e.g. against Web service interfaces, in the future. ServiceCallJoin.java was slightly extended to deal with this new service type.
        • AST2BOpUtility was slightly changed to deal with MockIVReturningServiceCalls, namely resolving MockIVs.
        • The FulltextFeature was implemented in a couple of classes
        • ASTFulltextSearchOptimizer: lifts magic vocabulary from FTS.java into a SERVICE keyword. This is essentially the same as ASTSearchOptimizer. Therefore, I pulled out the logics from ASTSearchOptimizer (BDS feature) into a dedicated new class, ASTSearchOptimizerBase — both ASTSearchOptimizer and ASTFulltextSearchOptimizer now extend ASTSearchOptimizerBase (initializing the latter with the concrete magic predicates and the namespace).
        • IFulltextSearchHit + FulltextSearchHit: represents a fulltext search result
        • FulltextSearchHiterator: iterator over FulltextSearchHits
        • FulltextSearchServiceFactory: the main implementation of the service logics, interpreting the magic vocabulary and wrapping the service call into an iterator (very much in the style of BDS, but slightly more complex, due to the fact that we do not require all incoming variables to be bound to constants, but support operations on top of arbitrary binding sets, thus leading to multiple service calls)
        • IFulltextSearch: an interface for fulltext search configurations
        • SolrFulltextSearchImpl: the implementation of the service call against Solr and its mapping into FulltextSearchHits
        • Testing facilities
        • TestASTFulltextSearchOptimizer testing rewriting of the magic predicates into SERVICE keyword
        • TestFulltextSearch, with lots of tests checking various SPARQL queries against a Solr server setup at localhost (still to be hooked in)
        • In /src/build/solr: small maven project for starting and initializing the Solr server (still to be hooked in)
        • Incorporated feedback from code review
        • Improved documentation
        • reuse of existing HttpClient
        Show
        mschmidt00 Michael Schmidt added a comment - Merged branch Solr, following is a description of what's been done: FTS.java is a good entry point, defining and documenting the magic vocabulary. Note that the class also contains Options, which allow to set system wide defaults for, e.g., the index to be queries, or the timeout. These options can be specified in GraphStore.properties. A new type of service call, MockIVReturningServiceCall, was introduced. It differs from the ExternalService in that its interface is not Sesame based, but generic in the sense that the service simply returns mocked IVs. This might be quite useful for implementing other services, e.g. against Web service interfaces, in the future. ServiceCallJoin.java was slightly extended to deal with this new service type. AST2BOpUtility was slightly changed to deal with MockIVReturningServiceCalls, namely resolving MockIVs. The FulltextFeature was implemented in a couple of classes ASTFulltextSearchOptimizer: lifts magic vocabulary from FTS.java into a SERVICE keyword. This is essentially the same as ASTSearchOptimizer. Therefore, I pulled out the logics from ASTSearchOptimizer (BDS feature) into a dedicated new class, ASTSearchOptimizerBase — both ASTSearchOptimizer and ASTFulltextSearchOptimizer now extend ASTSearchOptimizerBase (initializing the latter with the concrete magic predicates and the namespace). IFulltextSearchHit + FulltextSearchHit: represents a fulltext search result FulltextSearchHiterator: iterator over FulltextSearchHits FulltextSearchServiceFactory: the main implementation of the service logics, interpreting the magic vocabulary and wrapping the service call into an iterator (very much in the style of BDS, but slightly more complex, due to the fact that we do not require all incoming variables to be bound to constants, but support operations on top of arbitrary binding sets, thus leading to multiple service calls) IFulltextSearch: an interface for fulltext search configurations SolrFulltextSearchImpl: the implementation of the service call against Solr and its mapping into FulltextSearchHits Testing facilities TestASTFulltextSearchOptimizer testing rewriting of the magic predicates into SERVICE keyword TestFulltextSearch, with lots of tests checking various SPARQL queries against a Solr server setup at localhost (still to be hooked in) In /src/build/solr: small maven project for starting and initializing the Solr server (still to be hooked in) Incorporated feedback from code review Improved documentation reuse of existing HttpClient
        Hide
        bryanthompson bryanthompson added a comment -

        We should compare index write and search performance with the internal full text index, provide some guidance around when to use either one, and examine whether we can use the solr index for the kind of low latency query slicing that we have been doing with the internal index. To support this:

        @mikepersonick: Work with @michaelschmidt to define sample queries against some data set, including some that slice the full text search results in order to manage the output cardinality of the query.

        @michaelscmidt: Compare performance of the two approaches and provide some recommendations.

        This can be moved to a new ticket if that is preferable.

        Show
        bryanthompson bryanthompson added a comment - We should compare index write and search performance with the internal full text index, provide some guidance around when to use either one, and examine whether we can use the solr index for the kind of low latency query slicing that we have been doing with the internal index. To support this: @mikepersonick: Work with @michaelschmidt to define sample queries against some data set, including some that slice the full text search results in order to manage the output cardinality of the query. @michaelscmidt: Compare performance of the two approaches and provide some recommendations. This can be moved to a new ticket if that is preferable.

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            michaelschmidt michaelschmidt
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: