Status: Closed - Won't Fix
Resolution: Cannot Reproduce
Affects Version/s: BIGDATA_RELEASE_1_2_2
Fix Version/s: None
Component/s: Bigdata SAIL
This ticket concerns the compile time constant:
The suggested resolution is to:
a) replace this with two compile time constants, one for each of the two usages
b) substantial increase the value for the usage in _hasNext(long) (e.g. to 100000)
c) add a configuration option for that usage
d) both b) and c)
The commentary is based on test results, and I will start with a description of the tests, then of the test hardware.
The test dataset and queries
The test dataset is approximately: 57,000 quads of test data conforming to Syapse proprietary ontologies, over 24 named graphs.
The test queries are 11 moderately easy SPARQL queries involving only a few joins each.
The test queries are generated by some python code, and the test harness asks the 11 queries three times over, in series, using the SPARQL end point.
The test harness also allows the possibility of starting multiple parallel clients, and we consider
a) a single client asking the 11 * 3 queries
b) six parallel clients asking a total of 6 * 11 * 3 queries: notice the queries are asked in the same order by each parallel client which maximizes both potential contention and potential cache hits
The test hardware
The primary hardware is my development machine which is a MacBook Pro with SSD and a quad core processor (with hyperthreading, for 8 threads).
The secondary hardware is an AWS instance
Prior to any changes at r7390 we observed the following conundrum.
For a single client the 33 queries take a wall-time of 13.3s, but the client CPU time is 1.6s and the server CPU time is 1.8s leaving 10 seconds missing
- even assuming synchronized single threading (on a quad core box )
Doing 6 parallel clients performance is somewhat better. The wall-time is 14.5s , the client CPU time is 14.3s and the server CPU time is 10.4s seconds. Still only about 20% load (8 threads x 14.5), but a lot better than in the one client case.
Using yourkit, with wall time measurements and tracing, drew attention to the _hasNext() method above, and the
With the adjustments described above, the system goes a lot faster.
Single client: wall-time 2.4s client CPU 1.6s server CPU 1.9s
Six parallel clients: wall-time 5.1s client CPU 14.5s server CPU 6.1s