In patterns like the one from Ticket
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ps ?p ?o
?ps ?p ?o.
?ps a <http://example.com/data/Person>.
a hash index is set up for the binding set produced by the outer triple pattern (?ps, ?p, ?o), for later reuse, and subsequently the variable ?ps is projected and flooded into the subquery. In the projection step, we need to apply an additional JVMDistinctBindingSetsOp to avoid duplicates (cf. ticket
BLZG-913). However, this DISTINCT comes for free: the key is (always?) defined by exactly those variables that are projected into the subquery, so it is already computed when setting up the hash index.
Note that we might even apply this pattern for queries where, for instance, the SELECT subquery is replaced through an OPTIONAL join group. Currently, in such cases no projection is applied at all, but it should be possible to use the same pattern (i.e., a distinct projection).
Think about the benefits of such an optimization and a possible generalization. This might be considered in the scope of a general strategy/framework to drop variables that are no longer required.