The front-coded scheme is not providing enough compression. It is repeating the leading key every 8 tuples. That is going to be pretty FAT on the TERM2ID index for data such as chembl.
It would be more efficient if we knew a prefix shared across ALL tuples in a leaf and then factored that out across the entire leaf, THEN did this delta coding game.
Also see .
 https://sourceforge.net/apps/trac/bigdata/ticket/514 (PartlyInlineURIIV support)