There appears to be a bug where the value of initialExtent on the journal when it is (re-)opened is being set to its current file extent rather than the value of the "com.bigdata.journal.AbstractJournal.initialExtent" property.
If you look at AbstractBufferStrategyBLZG-511 you will see the code which imposes the policy for the file growth.
The [initialExtent] property default is 10M. However, we generally specify 100M or 200M for that property in configuration files. However, if you look at WORMStrategyBLZG-166 you will see that FileMetadata.extent (the current size of the file) is being passed in where it should be passing in the configured value for the initialExtent (probably fileMetadata.getProperty(Options.INITIAL_EXTENT,Options.DEFAULT_INITIAL_EXTENT)). The upshot of this is that the initial file growth is slower than is intended (32M per extension) while after a journal re-open the file growth is doubling the file extent as of the time when the file was reopened. I do not see anything in the code which would lead me to believe that it would extend the file until it was out of room for the next allocation, so it is not as if the allocated space remains unused.
This "doubling" was probably introduced when the WORMStrategy was written to replace the old DiskOnlyStrategy. This should be easy enough to fix, we just need to pass in the "initalExtent" rather than the current file extent to the WORMStrategy and verify that nothing else had a dependency on those changed semantics for that constructor argument.
I expect that the WORM also evidences growth in the data written, especially for the lexicon, as the size of the kb instance grows. Moving the large literals and URIs out of the ID2TERM index should fix that as new revisions will not involve making persistent copies of the literals/URIs involved. In fact, Mike has suggested that we do this for everything which is not inlined, effectively getting rid of ID2TERM and making all of the keys in the TERM2ID index based on a prefix (URI, Literal, BNode), a hash code, and a counter to break ties in the hash code. All URIs and Literals would be raw records allocated on the Journal. See .