Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-197 BlazeGraph release 2.1 (Scale-out GA)
  3. BLZG-253

Refactor the async write API to buffer per target DS, not per target shard

    XMLWordPrintable

    Details

      Description

      The RDF bulk data load uses the async write API. This current buffers the data for each target shard. As the number of shards increases so does the memory demand on the client. The async write API should be modified to buffer per target node rather than per target shard.

      As a workaround, the nominal size of a shard can be increased from its default configuration value of ~ 200M. This will reduce the number of shards in the system and therefore reduce the memory demand on the client. However, it increases the effort when performing merges and splits for large index partitions (aka shards).

      In the configuration file, change:

      static private partitionSizeMultiplier = 1;

      to

      static private partitionSizeMultiplier = 2;

      to double the effective maximum shard size. However note that this can exacerbate the journal over extension issue [1] (which has been mitigated by improving the index build performance).

      This issue is related to [2] and [3].

      [1] https://sourceforge.net/apps/trac/bigdata/ticket/20
      [2] https://sourceforge.net/apps/trac/bigdata/ticket/40
      [3] https://sourceforge.net/apps/trac/bigdata/ticket/35

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            bryanthompson bryanthompson
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: