Details

      Description

      Implement transparent proxying of the REST API for HA. There are several benefits to this approach.

      1. Clients do not need to know which service is the leader. A write can be directed to any service and will be automatically proxied to the quorum leader.

      2. Clients do not need to know when a service is resynchronizing, rebuilding, or otherwise not 100% online. Read requests can be transparently proxied any service that is joined with the met quorum.

      The main challenges are to minimize the overhead associated with proxying requests and ensure that the proxy behavior conforms with the HTTP protocol.

      HTTP proxying is simple in theory but has a lot of wrinkles in practice and needs to be done intelligently to support large numbers of concurrent connections. We have done some research into possible libraries that could support this feature. The most likely candidate is LittleProxy [1,2]. This is an Apache 2.0 license and uses netty for non-blocking NIO.

              /*
               * Note: This is a summary of what we need to look at to proxy a
               * request.
               * 
               * Note: We need to proxy PUT, POST, DELETE requests to the quorum
               * leader.
               * 
               * Note: If an NSS service is NOT joined with a met quorum, but there is
               * a met quorum, then we should just proxy the request to the met
               * quorum. This includes both reads and writes.
               */
              
              req.getMethod();
              final Enumeration<String> names = req.getHeaderNames();
              while(names.hasMoreElements()) {
                  req.getHeaders(names.nextElement());
              }
              req.getInputStream();
      
              // Note: Response also has status code plus everything above.
              
      

      [1] http://www.littleshoot.org/littleproxy
      [2] http://www.littleshoot.org/littleproxy/dependencies.html

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Identified a problem with the GangliaLBSPolicy where bigdata and bigdata-ganglia use the canonical (fully qualified) hostname and ganglia uses the local name of the host. This means that the host metrics are not being obtained by the GangliaLBSPolicy. While it is possible to override the hostname for ganglia starting with 3.2.x, this is quite a pain and could involve full restarts of gmond on all machines in the cluster. I have not yet resolved this issue, but I have added the ability to force the bigdata-ganglia implementation to use a hostname specified in an environment variable.

        Added the ability to override the hostname for bigdata-ganglia using the com.bigdata.hostname environment variable per [1].

        Updated the pom.xml and build.properties files for the bigdata-ganglia-1.0.3 release.

        Published that release to our maven repo.

        [1] http://trac.bigdata.com/ticket/886 (Provide workaround for bad reverse DNS setups)

        Committed revision r8216

        Show
        bryanthompson bryanthompson added a comment - Identified a problem with the GangliaLBSPolicy where bigdata and bigdata-ganglia use the canonical (fully qualified) hostname and ganglia uses the local name of the host. This means that the host metrics are not being obtained by the GangliaLBSPolicy. While it is possible to override the hostname for ganglia starting with 3.2.x, this is quite a pain and could involve full restarts of gmond on all machines in the cluster. I have not yet resolved this issue, but I have added the ability to force the bigdata-ganglia implementation to use a hostname specified in an environment variable. Added the ability to override the hostname for bigdata-ganglia using the com.bigdata.hostname environment variable per [1] . Updated the pom.xml and build.properties files for the bigdata-ganglia-1.0.3 release. Published that release to our maven repo. [1] http://trac.bigdata.com/ticket/886 (Provide workaround for bad reverse DNS setups) Committed revision r8216
        Hide
        bryanthompson bryanthompson added a comment -

        Commit includes fixes for BLZG-994 (content-negotation) and $624 (HA Load
        Balancer). I have run through the NSS, AST evaluation, and QUADS mode
        test suites and everything is green. The TestAll_LBS test suite is
        also green (HA).


        - CONNEG was broken in previous releases and would return available

        Content-Type corresponding to the least desired MIME Type as

        specified by the Accept header. See BLZG-994. Changes to ConnegScore,

        ConnegUtil, TestConneg.


        - RemoteRepository: A bug was identified where the openrdf binary RDF

        interchange type could not be used because a non-null charset would

        cause a Reader to be allocated rather than an InputStream within the

        BackgroundGraphResult. Historically, due to BLZG-994, this interchange

        type was not preferred and hence this code path was not tested. The

        fix was to use the default charset for the format associated with

        the Content-Type of the response unless overridden by an explicit

        charset in the encoding.


        - Added a new LBS policy (CountersLBSPolicy) based on the

        /bigdata/counters servlet. This policy is more chatty than the

        GangliaLBSPolicy, but it can be used in environments that do not

        support multicast and can be secured using standard techniques for

        httpd. The GangliaLBSPolicy was heavily refactored to create an

        abstract base class that is now shared by both the CountersLBSPolicy

        and the GangliaLBSPolicy. Added documentation to web.xml and the

        HALoadBalancer page of the wiki. See BLZG-721.


        - Release a new bigdata-ganglia.jar (v1.0.4). This release permits

        the Comparator to be null, which is useful since we want to order

        the hosts based on our IHostScoringRule rather than a simple ganglia

        metric comparison.


        - AbstractStatisticsCollection: Added @Override tags and FIXME on

        getCounters().


        - CounterSet: private and final attributes. ignoring some unchecked

        conversions or raw types. @Override attributes.


        - ICounterSetSelector: expanded the interface slightly to allow

        optional filtering for HistoryInstruments (was implicit and

        manditory). This was necessary in order to support XML rendering of

        /bigdata/counters.


        - CounterSetFormat: Added to support CONNEG for the different kinds of

        counter set interchange (text/plain, text/html, application/xml).

        This was in support of the new CountersLBSPolicy.


        - IOStatCollector, VMStatCollector: Fixed some bugs in the OSX

        platforn metrics collectors, mostly around data races.


        - BigdataSailRemoteRepositoryConnection: added link to BLZG-988 (Set

        timeout on remote query). I have not worked on this ticket yet, but

        these comments mark the integration points. The other integration

        point is BigdataRDFContext.newQuery(), which is also linked to the

        ticket in this commit.


        - CountersServlet: modified to support CONNEG.


        - ConnegOptions: added toString(). clean up.


        - jetty.xml: refactored per guidence from webtide.


        - web.xml: comments on the CountersLBSPolicy.

        Committed revision r8294.

        Show
        bryanthompson bryanthompson added a comment - Commit includes fixes for BLZG-994 (content-negotation) and $624 (HA Load Balancer). I have run through the NSS, AST evaluation, and QUADS mode test suites and everything is green. The TestAll_LBS test suite is also green (HA). - CONNEG was broken in previous releases and would return available Content-Type corresponding to the least desired MIME Type as specified by the Accept header. See BLZG-994 . Changes to ConnegScore, ConnegUtil, TestConneg. - RemoteRepository: A bug was identified where the openrdf binary RDF interchange type could not be used because a non-null charset would cause a Reader to be allocated rather than an InputStream within the BackgroundGraphResult. Historically, due to BLZG-994 , this interchange type was not preferred and hence this code path was not tested. The fix was to use the default charset for the format associated with the Content-Type of the response unless overridden by an explicit charset in the encoding. - Added a new LBS policy (CountersLBSPolicy) based on the /bigdata/counters servlet. This policy is more chatty than the GangliaLBSPolicy, but it can be used in environments that do not support multicast and can be secured using standard techniques for httpd. The GangliaLBSPolicy was heavily refactored to create an abstract base class that is now shared by both the CountersLBSPolicy and the GangliaLBSPolicy. Added documentation to web.xml and the HALoadBalancer page of the wiki. See BLZG-721 . - Release a new bigdata-ganglia.jar (v1.0.4). This release permits the Comparator to be null, which is useful since we want to order the hosts based on our IHostScoringRule rather than a simple ganglia metric comparison. - AbstractStatisticsCollection: Added @Override tags and FIXME on getCounters(). - CounterSet: private and final attributes. ignoring some unchecked conversions or raw types. @Override attributes. - ICounterSetSelector: expanded the interface slightly to allow optional filtering for HistoryInstruments (was implicit and manditory). This was necessary in order to support XML rendering of /bigdata/counters. - CounterSetFormat: Added to support CONNEG for the different kinds of counter set interchange (text/plain, text/html, application/xml). This was in support of the new CountersLBSPolicy. - IOStatCollector, VMStatCollector: Fixed some bugs in the OSX platforn metrics collectors, mostly around data races. - BigdataSailRemoteRepositoryConnection: added link to BLZG-988 (Set timeout on remote query). I have not worked on this ticket yet, but these comments mark the integration points. The other integration point is BigdataRDFContext.newQuery(), which is also linked to the ticket in this commit. - CountersServlet: modified to support CONNEG. - ConnegOptions: added toString(). clean up. - jetty.xml: refactored per guidence from webtide. - web.xml: comments on the CountersLBSPolicy. Committed revision r8294.
        Hide
        bryanthompson bryanthompson added a comment -

        Modified the HA load balancer:


        - Added additional normalization logic.
        - Added counters for tracking the #of local forwards, the #of proxies, and the #of errors.
        - Increased the information rendered on the /status page for the HA load balancer.

        Note: I have not yet fixed up the unit tests for the normalized scoring policy. I expect 2 failures for those tests.

        Committed revision r8310.

        Show
        bryanthompson bryanthompson added a comment - Modified the HA load balancer: - Added additional normalization logic. - Added counters for tracking the #of local forwards, the #of proxies, and the #of errors. - Increased the information rendered on the /status page for the HA load balancer. Note: I have not yet fixed up the unit tests for the normalized scoring policy. I expect 2 failures for those tests. Committed revision r8310.
        Hide
        bryanthompson bryanthompson added a comment -

        The http based performance counters version is working.

        Some observations.


        - The client can observe HTTP 502 (Bad Gateway), 503 (Service Unavailable), and 504 (Gateway Timeout) errors when a very heavy workload is placed on the system. This is somewhat ungraceful and the root cause appears to be a fixed upper capacity on the jetty thread pool. I will continue to look at this. The client SHOULD retry for a 504, but this needs to be explicitly configured in the client. The other errors SHOULD be masked into something that causes the client to retry (such as a 503). However, there is a danger that there is a real problem with the http connection between the servers rather than just blocking on a thread pool causing an intermittent failure.


        - NGINX (LBS is not available under their open source version) or HAProxy is probably required to get linear scaling of query throughput. In my tests, one machine can deliver up to approximately 43k QMpH. 3 machines using the CountersLBSPolicy can deliver up to 75k QMpH. This shows that the workload is being distributed, but does not show linear scaling. However, the same servers can deliver linear scaling if the workload is evenly distributed. The solution is to use a round-robin over the services and then have a preference to serve the request locally and proxy only for (a) updates; and (b) when the load imbalance is severe.

        Next steps:


        - Continue to benchmark variations on the LBS deployment.
        - Examine impact when using NGINX (not available as open source) or HAProxy (preferred).
        - Examine how to minimize errors observed by a client when a service is down and a round-robin (HAProxy) is being applied over the services. If there is a basic health check protocol, then the external round-robin can avoid the down service until it comes back online, notice that the service is online again, and finally begin to deliver requests to that service.
        - Examine acceleration factor (linear interpretation of load leads to slow correction of load imbalances
        - a non-linear factor might help to correct those imbalances more quickly).]
        - The NetQuery class in the BSBM benchmark should (a) be smart about 502, 503, and 504 exceptions
        - this is necessary to have robust behavior for the benchmark against an HA configuration; and (b) use an HTTP client implementation with better throughput and concurrency (Apache or jetty)
        - this is an optimization and would not be in keeping with the benchmark rules, but might help us to tune the performance of the platform further.
        - A sticky tenant LBS policy could be developed by examining the /namespace/NAMESPACE component of the Request-URI. However, some provision would need to be made among the servers to coordinate the mapping from a tenant namespace onto a service (or services, depending on the amount of activity for the tenant).

        Show
        bryanthompson bryanthompson added a comment - The http based performance counters version is working. Some observations. - The client can observe HTTP 502 (Bad Gateway), 503 (Service Unavailable), and 504 (Gateway Timeout) errors when a very heavy workload is placed on the system. This is somewhat ungraceful and the root cause appears to be a fixed upper capacity on the jetty thread pool. I will continue to look at this. The client SHOULD retry for a 504, but this needs to be explicitly configured in the client. The other errors SHOULD be masked into something that causes the client to retry (such as a 503). However, there is a danger that there is a real problem with the http connection between the servers rather than just blocking on a thread pool causing an intermittent failure. - NGINX (LBS is not available under their open source version) or HAProxy is probably required to get linear scaling of query throughput. In my tests, one machine can deliver up to approximately 43k QMpH. 3 machines using the CountersLBSPolicy can deliver up to 75k QMpH. This shows that the workload is being distributed, but does not show linear scaling. However, the same servers can deliver linear scaling if the workload is evenly distributed. The solution is to use a round-robin over the services and then have a preference to serve the request locally and proxy only for (a) updates; and (b) when the load imbalance is severe. Next steps: - Continue to benchmark variations on the LBS deployment. - Examine impact when using NGINX (not available as open source) or HAProxy (preferred). - Examine how to minimize errors observed by a client when a service is down and a round-robin (HAProxy) is being applied over the services. If there is a basic health check protocol, then the external round-robin can avoid the down service until it comes back online, notice that the service is online again, and finally begin to deliver requests to that service. - Examine acceleration factor (linear interpretation of load leads to slow correction of load imbalances - a non-linear factor might help to correct those imbalances more quickly).] - The NetQuery class in the BSBM benchmark should (a) be smart about 502, 503, and 504 exceptions - this is necessary to have robust behavior for the benchmark against an HA configuration; and (b) use an HTTP client implementation with better throughput and concurrency (Apache or jetty) - this is an optimization and would not be in keeping with the benchmark rules, but might help us to tune the performance of the platform further. - A sticky tenant LBS policy could be developed by examining the /namespace/NAMESPACE component of the Request-URI. However, some provision would need to be made among the servers to coordinate the mapping from a tenant namespace onto a service (or services, depending on the amount of activity for the tenant).
        Hide
        bryanthompson bryanthompson added a comment -

        Feature is implemented.

        Show
        bryanthompson bryanthompson added a comment - Feature is implemented.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: