The RWStore uses allocation contexts for group commit (one per task). There is also some discussion about introducing an allocation context for unisolated writes to provide a mechanism for denying alloc()/free() requests after a call top AbstractJournal.abort(). Such requests can appear after RWStore.reset() due to a data race between the invocation of RWStore.reset() and the interruption of the threads associated with an update. For example, we have 3-6 threads writing on the statement indices during update. If any one of those threads has not yet been interrupted until after RWStore.reset() then an alloc()/free() request for the unisolated connection could come through after the RWStore.reset(). See BLZG-1313
The AllocationContext deferred free list is an Array<Long>
The WikiData large load, with 20GB journal had ~ 50M inuse slots (46M 64 byte slots).
So, in theory, say 50M * 8bytes per freed address for the AllocationContext when a namespace was deleted =~ 400M JVM heap for worst case.
I needed a 10G JVM to load this without being GC bound.
- Use non-heap memory
- Convert to bits per allocator
At scale the bits conversion would win, in this case. A simple test with random longs unsurprisingly showed no benefit from compression.
We had around 10K allocators, so the memory overhead of a "free bits" bit map would be around 10k * 1K - 10M.
After a few minutes thought I can see how this could be put together reasonably straightforwardly.
ESTIMATE: 3 days (MC).