Details

    • Type: Task
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_2_2_0
    • Component/s: None
    • Labels:
      None

      Description

      Run our usual baseline benchmarks once 2.2.0 is ready to go.


      Note: Please conduct a parameterized test with maxParallel = 5 (default today), 1, 10, 15, and 20 on our benchmarks. It may be useful to test on a subset of the benchmarks to identify a suitable maxParallel value.

      The only benchmark that uses SPARQL UPDATE is BSBM EXPLORE + UPDATE.

      Single query at a time benchmarks could potentially benefit from increased inter-query parallelism either in joins or in materialization (for high cardinality queries that have a high solution production rate).

      Note: Please verify that the maxParallel hint is respected in SPARQL UPDATE. If it is not, then see BLZG-1962
      Expose maxParallelDefault for global override

      Note: Using the native heap for intermediate solutions (BLZG-1963) or for final solutions (as a native buffer ring buffer) will reduce heap pressure and could easily change the "good" default for maxParallel, as could the specifics of the hardware platform. The "good" value for chunkSize could also change. Also note that for at least some queries, the system throughput on a single query appears to be relatively robust to changes in maxParallel. Maybe what we should look for are conditions under which we clearly should increase parallelism in materialization or in JOINS. maxParallel could also be adaptive in terms of the total heap pressure and total number of concurrent operators running on the database. A heavy query (either joins that can benefit or materialization that could benefit) could be allowed to spend more of the "maxParallel" global budget but there would be a global budget (effectively the maximum number of threads). Also note that we can look for operators that are bottlenecks and then dynamically allocate more resources to queries for those operators. Bottlenecks can be observed by looking at the input queues for operators. The producer will block if the consumer queue is full.

        Activity

        Hide
        michaelschmidt michaelschmidt added a comment - - edited

        Started baseline benchmarks for LUBM, govtrack, BSBM, and SP2Bench on benchmark server on 2.2.0 RC branch (626160fa8e9bcb1627c70a31a9d8e366835e7a75).

        What's left to be done:

        • Trigger geospatial baseline benchmark (will do as soon as I confirm Bryan's machine is free for benchmarks)
        • Agree on baseline configuration, hook in WatDiv into runAll.sh script, run WatDiv, and create running spreadsheet for WatDiv
        • maxParallel tests as described above

        Here are the links to the result sheets (still empty at this point):

        Show
        michaelschmidt michaelschmidt added a comment - - edited Started baseline benchmarks for LUBM, govtrack, BSBM, and SP2Bench on benchmark server on 2.2.0 RC branch (626160fa8e9bcb1627c70a31a9d8e366835e7a75). What's left to be done: Trigger geospatial baseline benchmark (will do as soon as I confirm Bryan's machine is free for benchmarks) Agree on baseline configuration, hook in WatDiv into runAll.sh script, run WatDiv, and create running spreadsheet for WatDiv maxParallel tests as described above Here are the links to the result sheets (still empty at this point): SP2Bench: https://docs.google.com/spreadsheets/d/1xtOf9C-SycmynC8tZg6aDkjVWU0dcsj-l_da4o0gjtw/edit#gid=2047012484 LUBM: https://docs.google.com/spreadsheets/d/12rbe77GOqnRmi4yFjWE1D1hXnd2jB-WPUV2uVJZEmwE/edit#gid=537848077 Govtrack: https://docs.google.com/spreadsheets/d/1MFF3kQmzQv7LzBFjbNX7bhc1eLgCVPQH0vQA6SxeHuQ/edit#gid=508166633 BSBM: https://docs.google.com/spreadsheets/d/1i-JnEy_W5Pt4AWg87oxg564GYkz3zaxxmIS9H4OKssE/edit#gid=383235545 GEO: https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=226755243
        Hide
        michaelschmidt michaelschmidt added a comment - - edited

        The first round of benchmarks is through, here are the results:

        Govtrack failed due to the -defaultGraph parameter not being specified in the data loader, i.e. it seems the data loader is more strict now for some reason (will re-benchmark this including all others for the latest 2.2 branch). GEO is not yet done, will report on that separately.

        Show
        michaelschmidt michaelschmidt added a comment - - edited The first round of benchmarks is through, here are the results: SP2Bench: https://docs.google.com/spreadsheets/d/1xtOf9C-SycmynC8tZg6aDkjVWU0dcsj-l_da4o0gjtw/edit#gid=2047012484 -> pretty much stable LUBM: https://docs.google.com/spreadsheets/d/12rbe77GOqnRmi4yFjWE1D1hXnd2jB-WPUV2uVJZEmwE/edit#gid=537848077 -> there's an interesting speedup for Q10-Q13 (four short-running queries), resulting in a 3.86% speedup in overall, need to check back what the cause could be for those queries BSBM: https://docs.google.com/spreadsheets/d/1i-JnEy_W5Pt4AWg87oxg564GYkz3zaxxmIS9H4OKssE/edit#gid=383235545 -> pretty much stable except for the UPDATE scenario, where we observe 5-10% speedups; this is likely due to the native solution set streams we're using for SPARQL UPDATE now Govtrack failed due to the -defaultGraph parameter not being specified in the data loader, i.e. it seems the data loader is more strict now for some reason (will re-benchmark this including all others for the latest 2.2 branch). GEO is not yet done, will report on that separately.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Restarted baseline benchmarks on latest 2.2.0_RC branch, rev. 971ba0349dfabd3be2cc76d81c98690d50fb439d.

        Show
        michaelschmidt michaelschmidt added a comment - Restarted baseline benchmarks on latest 2.2.0_RC branch, rev. 971ba0349dfabd3be2cc76d81c98690d50fb439d.
        Hide
        michaelschmidt michaelschmidt added a comment - - edited

        Here are the latest results for 971ba0349dfabd3be2cc76d81c98690d50fb439d, they confirm the previous results:

        Here are the results of GEO: https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=226755243
        -> observed a ~20% regression, however my guess is that this was due to wrong -Xmx settings (not sufficient memory for the long-running queries being available); I retriggered benchmarks with the proper setting, should have results by tomorrow evening.

        Show
        michaelschmidt michaelschmidt added a comment - - edited Here are the latest results for 971ba0349dfabd3be2cc76d81c98690d50fb439d, they confirm the previous results: BSBM: https://docs.google.com/spreadsheets/d/1i-JnEy_W5Pt4AWg87oxg564GYkz3zaxxmIS9H4OKssE/edit#gid=383235545 -> Slight performance improvements (~3%) for the single threaded EXPLORE versions, stable runtime for EXPLORE MT, significant improvements (~5-10%) for the multi-threaded EXPLORE+UPDATE (due to the native heap solution set for UPDATE changes). Loading time ~10% slower, which was observed previously and explained by slower Sesame parsers. Govtrack: https://docs.google.com/spreadsheets/d/1MFF3kQmzQv7LzBFjbNX7bhc1eLgCVPQH0vQA6SxeHuQ/edit#gid=111157345 -> stable, both in terms of runtime and loading time LUBM: https://docs.google.com/spreadsheets/d/12rbe77GOqnRmi4yFjWE1D1hXnd2jB-WPUV2uVJZEmwE/edit#gid=1025538420 -> 10% speedup due to Q10-Q13, loading 10% faster (don't have a good explanation for both yet, will look into that) SP2B: https://docs.google.com/spreadsheets/d/1xtOf9C-SycmynC8tZg6aDkjVWU0dcsj-l_da4o0gjtw/edit#gid=2047012484 -> stable Here are the results of GEO: https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=226755243 -> observed a ~20% regression, however my guess is that this was due to wrong -Xmx settings (not sufficient memory for the long-running queries being available); I retriggered benchmarks with the proper setting, should have results by tomorrow evening.
        Hide
        michaelschmidt michaelschmidt added a comment - - edited

        I've now got proper results for the geo benchmark: https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=1019906563

        It looks like there's still a ~10% regression. I personally would not make that a blocker for the release, but that decision is up to you bryanthompson & Brad Bebee. I've created a follow-up ticket to look into the results: BLZG-2025. Will definitely have a quick sanity check soon, but we should decide how to proceed with 2.2.0 in case there are no obvious setup issues but this is indeed a regression.

        Show
        michaelschmidt michaelschmidt added a comment - - edited I've now got proper results for the geo benchmark: https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=1019906563 It looks like there's still a ~10% regression. I personally would not make that a blocker for the release, but that decision is up to you bryanthompson & Brad Bebee . I've created a follow-up ticket to look into the results: BLZG-2025 . Will definitely have a quick sanity check soon, but we should decide how to proceed with 2.2.0 in case there are no obvious setup issues but this is indeed a regression.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Having reran the GeoSpatial benchmarks using the 2.1.0 snapshot release (the same which was 10% faster in our previous baseline run), I was not able to reproduce the performance that we had observed back then, see https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=1610715386.

        So the most reasonable explanation is that there's been a difference in configuration back then. One possible explanation would be that in the baseline we did run the experiments in analytics mode back then – unfortunately, I can't verify that based on the output files and there's no note in the document either. Just restarted the old experiments in analytics mode to see if that makes a difference.

        In any way, there seems to be no regression in GeoSpatial benchmark, but the issue seems to be causes by differences in the baseline setting. Given that the previous baseline was faster, I'm trying to figure out / understand what changed. But this is definitely not a blocker for the upcoming release.

        Brad Bebee bryanthompson

        Show
        michaelschmidt michaelschmidt added a comment - Having reran the GeoSpatial benchmarks using the 2.1.0 snapshot release (the same which was 10% faster in our previous baseline run), I was not able to reproduce the performance that we had observed back then, see https://docs.google.com/spreadsheets/d/1ULC9oGZ1npc8md0CdphaPNBRHM6--eAtQrGDyclWeHs/edit#gid=1610715386 . So the most reasonable explanation is that there's been a difference in configuration back then. One possible explanation would be that in the baseline we did run the experiments in analytics mode back then – unfortunately, I can't verify that based on the output files and there's no note in the document either. Just restarted the old experiments in analytics mode to see if that makes a difference. In any way, there seems to be no regression in GeoSpatial benchmark, but the issue seems to be causes by differences in the baseline setting. Given that the previous baseline was faster, I'm trying to figure out / understand what changed. But this is definitely not a blocker for the upcoming release. Brad Bebee bryanthompson
        Hide
        michaelschmidt michaelschmidt added a comment -

        Closing this one as resolved (in summary, all benchmarks results except geospatial are good or even better than previous ones, geospatial benchmark regressions are due to configuration change, see BLZG-2025 for ongoing discussions).

        Show
        michaelschmidt michaelschmidt added a comment - Closing this one as resolved (in summary, all benchmarks results except geospatial are good or even better than previous ones, geospatial benchmark regressions are due to configuration change, see BLZG-2025 for ongoing discussions).

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            michaelschmidt michaelschmidt
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: