Details

    • Type: Task
    • Status: Done
    • Resolution: Done
    • Affects Version/s: BIGDATA_RELEASE_1_2_1
    • Fix Version/s: None
    • Component/s: Other

      Description

      This ticket is for the ability to manage the bigdata namespaces in the workbench.

      ----

      This is a feature request for a JSP page that can be bundled with the WAR and also deployed standalone to a servlet container for the cluster. The JSP page will walk people through the configuration of a KB instance. For the standalone deployment, it might also

      This actual configuration can be done programatically using the REST Multi-Tenancy API. The JSP page will, in effect, walk people through a decision tree to help them configure the database more easily.

      Configuring a new KB instance is much easier to script than editing properties in an existing KB instance, or even than providing a description of the implications of those properties. Therefore, JSP page will focus on configuration of new KB instances. Note that both the standalone and clustered deployments allow multiple KB instances (multi-tenancy). The JSP page can be used to setup new KBs.

      We also need to offer simple patterns to allow bulk loading into a KB following by incremental truth maintenance against that KB. My preference here is to add some options to the SPARQL UPDATE "LOAD" operation that allow loading data without truth maintenance into a KB for which truth maintenance is enabled. (The load needs to tunnel to the AbstractTripleStore or explicitly temporarily disable TM at the Sail layer). We would also need to offer a SPARQL UPDATE operation to force the database at once closure once some sequence of files had been loaded. The approach then needs to be documented on the wiki.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Here are some properties that we commonly configure:

        Journal level:

        # The backing file.
        com.bigdata.journal.AbstractJournal.file=bigdata.jnl
        
        # The persistence engine.  See com.bigdata.journal.BufferMode.  Some options are:
        # 'DiskWORM' for the WORM (infinite history)
        # 'DiskRW'   for the RWStore (recycling, scalable storage).
        # 'MemStore' for a transient Journal on the native heap (up to 4TB).
        
        com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
        
        # By default, the RWStore uses session protection and only recycles during a commit in which there are no
        # active readers.  By setting the minReleaseAge to a non-zero value, deferred deletes are journals and
        # storage is recycled incrementally once the history retention period has expired for a commit point and
        # there are no longer any active readers on that commit point (or on earlier commit points).
        com.bigdata.service.AbstractTransactionService.minReleaseAge=1
        
        # Default write retention queue for all B+Tree indices.
        com.bigdata.btree.writeRetentionQueue.capacity=4000
        
        # Default branching factor for all B+Tree indices.
        com.bigdata.btree.BTree.branchingFactor=128
        
        

        KB mode: triples, sids, quads

        # The namespace of the KB instance (set when you create a given KB and defaults to "kb").
        com.bigdata.rdf.sail.namespace=kb
        
        # Defaults 
        com.bigdata.rdf.store.AbstractTripleStore.quads=false
        com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=true|false
        
        
        1. Integrated full text index.
          #
        2. See https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page
          # Enable or disable the integrated full text index.
          com.bigdata.rdf.store.AbstractTripleStore.textIndex=true|false
          
          # Enable or disable full text indexing of datatype literals.
          com.bigdata.rdf.store.AbstractTripleStore.textIndex.datatypeLiterals=true|false
          
          # The full text index implementation. The default is the bigdata full text index.  This may be used
          # to specify a 3rd party integration.
          com.bigdata.rdf.store.AbstractTripleStore.textIndexClass=com.bigdata.rdf.lexicon.BigdataValueCentricFullTextIndex
          
          

        #

        1. A pre-declared vocabulary let's you optimize common URIs so they will be represented in 2-3 bytes
        2. within the statement indices. The default vocabulary class declares several common vocabularies.
        3. You can create your own vocabulary class to inline common URIs for your applications.
          #
          com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.RDFSVocabulary
          

        Inference related properties:

        # Incremental truth maintenance is supported for triples or triples plus statement identifiers
        # mode KBs. When enabled, the entailments are eagerly materialized as statements are asserted
        # or retracted.  It is also possible to do database-at-once closure.  In this mode of operation,
        # you load files (or assert or retract statements) and then demand the database at once closure
        # to recompute the entailments.  This approach can be more efficient when you are making large
        # changes to the KB.  Database at once closure can be explicitly managed using the DataLoader
        # or custom extensions to SPARQL UPDATE.  You *can* combine incremental truth maintenance with
        # database-at-once closure or you can just rely on database-at-once closure or just truth
        # maintenance.
        
        com.bigdata.rdf.sail.truthMaintenance=true|false
        
        # Turn off the maintenance of justification chains.  This impacts only the load performance,
        # but it is a big impact and only required if you will be doing incremental truth maintenance.
        
        com.bigdata.rdf.store.AbstractTripleStore.justify=true|false
        
        # This disables inference.
        com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
        
        # This is the RDFS Plus entailment regime (default).
        com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.OwlAxioms
        
        # May be used to turn off query-time expansion of entailments such as (x rdf:type rdfs:Resource). 
        # This property is interpreted by the BigdataSail.  When this property is disabled, entailments
        # such as (x rdf:type rdfs:Resource) entailments will NOT be available as they are neither
        # eagerly materialized nor generated at query time.  (Most entailments are computed eagerly
        # during materialization.)
        
        com.bigdata.rdf.sail.queryTimeExpander=false
        
        
        1. Full read-write TX support. The best performance is obtain using the UNISOLATED connection
        2. and indices that do not support full read-write transactions (isolatableIndices=false). This
        3. configuration is also required if truth maintenance is enabled. If you want to use full
        4. read-write transactions (not available on a cluster) then you can enabled this option and the
        5. indices will be provisioned with the necessary metadata to support the full read-write tx
        6. commit protocol.
          com.bigdata.rdf.sail.isolatableIndices=false
          
        1. B+Tree options.
          #
        2. Note: B+Tree overrides MUST be specified on a per-namespace basis. Replace ".kb" with the
        3. namespace of the specific KB instance whose branching factors you intend to override.
          # Bump up the branching factor for the lexicon indices on the default kb.
          com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400
          
          # Bump up the branching factor for the statement indices on the default kb.
          com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024
          
          
        Show
        bryanthompson bryanthompson added a comment - Here are some properties that we commonly configure: Journal level: # The backing file. com.bigdata.journal.AbstractJournal.file=bigdata.jnl # The persistence engine. See com.bigdata.journal.BufferMode. Some options are: # 'DiskWORM' for the WORM (infinite history) # 'DiskRW' for the RWStore (recycling, scalable storage). # 'MemStore' for a transient Journal on the native heap (up to 4TB). com.bigdata.journal.AbstractJournal.bufferMode=DiskRW # By default, the RWStore uses session protection and only recycles during a commit in which there are no # active readers. By setting the minReleaseAge to a non-zero value, deferred deletes are journals and # storage is recycled incrementally once the history retention period has expired for a commit point and # there are no longer any active readers on that commit point (or on earlier commit points). com.bigdata.service.AbstractTransactionService.minReleaseAge=1 # Default write retention queue for all B+Tree indices. com.bigdata.btree.writeRetentionQueue.capacity=4000 # Default branching factor for all B+Tree indices. com.bigdata.btree.BTree.branchingFactor=128 KB mode: triples, sids, quads # The namespace of the KB instance (set when you create a given KB and defaults to "kb"). com.bigdata.rdf.sail.namespace=kb # Defaults com.bigdata.rdf.store.AbstractTripleStore.quads=false com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=true|false Integrated full text index. # See https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page # Enable or disable the integrated full text index. com.bigdata.rdf.store.AbstractTripleStore.textIndex=true|false # Enable or disable full text indexing of datatype literals. com.bigdata.rdf.store.AbstractTripleStore.textIndex.datatypeLiterals=true|false # The full text index implementation. The default is the bigdata full text index. This may be used # to specify a 3rd party integration. com.bigdata.rdf.store.AbstractTripleStore.textIndexClass=com.bigdata.rdf.lexicon.BigdataValueCentricFullTextIndex # A pre-declared vocabulary let's you optimize common URIs so they will be represented in 2-3 bytes within the statement indices. The default vocabulary class declares several common vocabularies. You can create your own vocabulary class to inline common URIs for your applications. # com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.RDFSVocabulary Inference related properties: # Incremental truth maintenance is supported for triples or triples plus statement identifiers # mode KBs. When enabled, the entailments are eagerly materialized as statements are asserted # or retracted. It is also possible to do database-at-once closure. In this mode of operation, # you load files (or assert or retract statements) and then demand the database at once closure # to recompute the entailments. This approach can be more efficient when you are making large # changes to the KB. Database at once closure can be explicitly managed using the DataLoader # or custom extensions to SPARQL UPDATE. You *can* combine incremental truth maintenance with # database-at-once closure or you can just rely on database-at-once closure or just truth # maintenance. com.bigdata.rdf.sail.truthMaintenance=true|false # Turn off the maintenance of justification chains. This impacts only the load performance, # but it is a big impact and only required if you will be doing incremental truth maintenance. com.bigdata.rdf.store.AbstractTripleStore.justify=true|false # This disables inference. com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms # This is the RDFS Plus entailment regime (default). com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.OwlAxioms # May be used to turn off query-time expansion of entailments such as (x rdf:type rdfs:Resource). # This property is interpreted by the BigdataSail. When this property is disabled, entailments # such as (x rdf:type rdfs:Resource) entailments will NOT be available as they are neither # eagerly materialized nor generated at query time. (Most entailments are computed eagerly # during materialization.) com.bigdata.rdf.sail.queryTimeExpander=false Full read-write TX support. The best performance is obtain using the UNISOLATED connection and indices that do not support full read-write transactions (isolatableIndices=false). This configuration is also required if truth maintenance is enabled. If you want to use full read-write transactions (not available on a cluster) then you can enabled this option and the indices will be provisioned with the necessary metadata to support the full read-write tx commit protocol. com.bigdata.rdf.sail.isolatableIndices=false B+Tree options. # Note: B+Tree overrides MUST be specified on a per-namespace basis. Replace ".kb" with the namespace of the specific KB instance whose branching factors you intend to override. # Bump up the branching factor for the lexicon indices on the default kb. com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400 # Bump up the branching factor for the statement indices on the default kb. com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024
        Hide
        bryanthompson bryanthompson added a comment -

        Refactored the logic behind the -pages option of DumpJournal to work for both the HTree and the BTree, to collect a histogram of the allocation sizes, and to recommend a target branching factor (BTree only). This was done in preparation for the JSP configuration page so we can provide a feature in which we analyze the RWStore and report on target branching factors.

        This feature will have to enumerate the indices in the KB namespace and then invoke dumpPages() on each such index.

        @see https://sourceforge.net/apps/trac/bigdata/ticket/585 (GIST)
        @see https://sourceforge.net/apps/trac/bigdata/ticket/587 (JSP Configuration page)

        Committed revision r6487.

        Show
        bryanthompson bryanthompson added a comment - Refactored the logic behind the -pages option of DumpJournal to work for both the HTree and the BTree, to collect a histogram of the allocation sizes, and to recommend a target branching factor (BTree only). This was done in preparation for the JSP configuration page so we can provide a feature in which we analyze the RWStore and report on target branching factors. This feature will have to enumerate the indices in the KB namespace and then invoke dumpPages() on each such index. @see https://sourceforge.net/apps/trac/bigdata/ticket/585 (GIST) @see https://sourceforge.net/apps/trac/bigdata/ticket/587 (JSP Configuration page) Committed revision r6487.
        Hide
        bryanthompson bryanthompson added a comment -

        We also need to specify in the default journal configuration:

        # Setup for the RWStore recycler rather than session protection.
        com.bigdata.service.AbstractTransactionService.minReleaseAge=1
        
        Show
        bryanthompson bryanthompson added a comment - We also need to specify in the default journal configuration: # Setup for the RWStore recycler rather than session protection. com.bigdata.service.AbstractTransactionService.minReleaseAge=1

          People

          • Assignee:
            tobycraig tobycraig
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: