Type: New Feature
Status: Closed - Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Query Plan Generator
The goal is to accelerate value expression evaluation by presenting the CPU / GPU with clear opportunities for data parallel evaluation. This effort is motivated by the high overhead of tuple-at-a-time recursive evaluation of value expressions. For example, I have observed large performance costs associated with value expressions ranging from 10% ((2*?price)+1) to 100% (casting xsd:string => xsd:float in BSBM BI Q4). The approach is inspired by the vectored architecture developed in the MonetDB/X100 query engine.
The bigdata query engine is already vectored in the sense that it does chunk-at-a-time processing. However, value expression evaluation proceeds by exactly the kind of nested operator call which inhibits pipelined evaluation in the CPU. There are several complications which enter into play when dealing with RDF data and Java. For RDF, it is impossible (unless the query explicitly constrains the datatype) to have the strong type information which is available to a relational engine. Each Object position which becomes bound on a solution can be of any data type. I think that we could deal with this by partitioning the internal value representation (IVs) of these data, which are distinguished in their first 8 bits as to RDF Value type (URI, blank node, or Literal), their natural datatype (byte, short, int, long, etc, etc) and some other relevant characteristics. Thus, I believe that we could do an efficient (cache conscious) partitioning a vector of IVs by data type and then bind data type specific operators to the different slices of the vector. We could also manage "Value" caches either through hash maps (resolution), replacing a TermId or BlobIV with a fully inline IV (transformation), or by dropping the cached value (which then requires re-materialization if the Value is
needed beyond that point in the plan, but might be a reasonable choice for BlobIVs.)
Given the effort involved and the smaller size of the data-type specific slices of each vector, my inclination would be to increase the vector size flowing through the query engine and then code the data type specific operators for OpenGL to run on a GPU coprocessor rather than attempting to vector them on the CPU. Aside from the vastly greater memory bandwidth and parallelism, the other reason for pursuing this approach is that I have great confidence in our ability to manage memory and data parallelism on a GPU, but much less confidence that we could manage cache effects on a CPU under Java where the JVM is also vying for the CPU cache.