Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-851

Setup AWS image for HA replication cluster

    Details

      Description

      Develop a recipe for AWS image for an HA replication cluster, actual AWS images, and a recipe for deploying and configuring those AWS images as a highly available replication cluster.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I had some more thoughts as relate to EC2 deployments and SSD. SSDs can have vastly more than the 4000k IOPs that you can obtain for EBS with provisioned IOPs. Therefore, we probably want to setup on hi1.4xlarge instances with an EBS volume for durable storage of the service directory, transaction logs, and snapshots. The data directory would be on the SSD disk. This would allow very large database instances with high IOPs using the instance SSDs and durable storage using EBS. A nice combination.

        I suggest that the deployment concept for the HA replication cluster might look like: Use an instance type with local SSD. For example, either of the following might work, but the Storage Optimized is likely to be the better choice for bigdata:

        Memory optimized	cr1.8xlarge	64-bit	32	88	244	2 x 120  SSD	-	10 Gigabit4
        Storage optimized	hi1.4xlarge	64-bit	16	35	60.5	2 x 1,024 SSD2	-	10 Gigabit4 <== best choice.
        

        Then put the snapshots, transaction logs, and the process log files on EBS so that they are durable. We use sequential reads and writes for the snapshots and transaction logs so EBS should be Ok and provisioned IOPs could be used if necessary to ensure good sequential write performance.

        Since the SSDs are instance disks, the data on the SSDs will not survive a restart. Therefore, if you restart an instance node, you would create a new journal from the most recent snapshot. On startup, the HAJournalServer would automatically roll forward the transaction logs since that snapshot. This could all be automated on the startup of an SSD-backed instance node.

        Show
        bryanthompson bryanthompson added a comment - I had some more thoughts as relate to EC2 deployments and SSD. SSDs can have vastly more than the 4000k IOPs that you can obtain for EBS with provisioned IOPs. Therefore, we probably want to setup on hi1.4xlarge instances with an EBS volume for durable storage of the service directory, transaction logs, and snapshots. The data directory would be on the SSD disk. This would allow very large database instances with high IOPs using the instance SSDs and durable storage using EBS. A nice combination. I suggest that the deployment concept for the HA replication cluster might look like: Use an instance type with local SSD. For example, either of the following might work, but the Storage Optimized is likely to be the better choice for bigdata: Memory optimized cr1.8xlarge 64-bit 32 88 244 2 x 120 SSD - 10 Gigabit4 Storage optimized hi1.4xlarge 64-bit 16 35 60.5 2 x 1,024 SSD2 - 10 Gigabit4 <== best choice. Then put the snapshots, transaction logs, and the process log files on EBS so that they are durable. We use sequential reads and writes for the snapshots and transaction logs so EBS should be Ok and provisioned IOPs could be used if necessary to ensure good sequential write performance. Since the SSDs are instance disks, the data on the SSDs will not survive a restart. Therefore, if you restart an instance node, you would create a new journal from the most recent snapshot. On startup, the HAJournalServer would automatically roll forward the transaction logs since that snapshot. This could all be automated on the startup of an SSD-backed instance node.
        Hide
        bryanthompson bryanthompson added a comment -

        We should be looking at the new generation of EC2 storage optimized instances:

        	vCPU	ECU	Memory (GiB)	Instance Storage (GB)	Linux/UNIX UsageStorage Optimized - Current Generation
        i2.xlarge	4	14	30.5	1 x 800 SSD	$0.853 per Hour
        i2.2xlarge	8	27	61	2 x 800 SSD	$1.705 per Hour
        i2.4xlarge	16	53	122	4 x 800 SSD	$3.410 per Hour
        i2.8xlarge	32	104	244	8 x 800 SSD	$6.820 per Hour
        hs1.8xlarge	16	35	117	24 x 2048	$4.600 per Hour
        
        Show
        bryanthompson bryanthompson added a comment - We should be looking at the new generation of EC2 storage optimized instances: vCPU ECU Memory (GiB) Instance Storage (GB) Linux/UNIX UsageStorage Optimized - Current Generation i2.xlarge 4 14 30.5 1 x 800 SSD $0.853 per Hour i2.2xlarge 8 27 61 2 x 800 SSD $1.705 per Hour i2.4xlarge 16 53 122 4 x 800 SSD $3.410 per Hour i2.8xlarge 32 104 244 8 x 800 SSD $6.820 per Hour hs1.8xlarge 16 35 117 24 x 2048 $4.600 per Hour
        Hide
        bryanthompson bryanthompson added a comment -
        - The services are not joining so the quorum never meets.  They are
          all added as members, getting into the pipeline, and a leader
          election is being performed.  But progress stops there.  I suspect
          that this is a DNS issue.  /etc/hosts?  /etc/resolv.conf?  Maybe use
          IPs rather than hostnames?
        
        - Override the log files using a template.  For debugging, you can
          turn on additional logging.  E.g.:
        
        #log4j.logger.com.bigdata.ha=INFO
        
        - Chase down the SystemProperty vs Property.  Why is SystemProperty
          not working on EC2? It works on ubuntu on my local cluster.  It
          works on my laptop.
        
        - HARestore needs to run on machine instance restart.  E.g., from
          /etc/init.d/rc.local.  However, it needs to run before we start
          bigdataHA.  Another possibility is to have it run in bigdataHA as a
          precursor to launching the ServiceStarter (which starts jini and
          bigdata).
        
          The pattern should be, check the DATA_DIR.  If the journal is not
          there, then check the snapshotDir and haLogDir.  If they exist, then
          use the following command to re-generate the bigdata journal on the
          SSD partition.
        
          HARestore -o journalFile snapshotDir haLogDir
        
        - The firewall needs to be open only for the specific hosts, not
          everyone on Amazon.
        
        - bigdata should not be running as root. install a bigdata user and
          group.  sudo to the bigdata user.  Probably in the init.d/bigdataHA
          script or in the script that it shells out to start the
          ServiceStarter.
        
        - Ganglia needs to be setup. This may require unicast.  At this
          moment, this is only an issue for the HA Load balancer testing
          (EMC).
        
        Show
        bryanthompson bryanthompson added a comment - - The services are not joining so the quorum never meets. They are all added as members, getting into the pipeline, and a leader election is being performed. But progress stops there. I suspect that this is a DNS issue. /etc/hosts? /etc/resolv.conf? Maybe use IPs rather than hostnames? - Override the log files using a template. For debugging, you can turn on additional logging. E.g.: #log4j.logger.com.bigdata.ha=INFO - Chase down the SystemProperty vs Property. Why is SystemProperty not working on EC2? It works on ubuntu on my local cluster. It works on my laptop. - HARestore needs to run on machine instance restart. E.g., from /etc/init.d/rc.local. However, it needs to run before we start bigdataHA. Another possibility is to have it run in bigdataHA as a precursor to launching the ServiceStarter (which starts jini and bigdata). The pattern should be, check the DATA_DIR. If the journal is not there, then check the snapshotDir and haLogDir. If they exist, then use the following command to re-generate the bigdata journal on the SSD partition. HARestore -o journalFile snapshotDir haLogDir - The firewall needs to be open only for the specific hosts, not everyone on Amazon. - bigdata should not be running as root. install a bigdata user and group. sudo to the bigdata user. Probably in the init.d/bigdataHA script or in the script that it shells out to start the ServiceStarter. - Ganglia needs to be setup. This may require unicast. At this moment, this is only an issue for the HA Load balancer testing (EMC).
        Hide
        danielmekonnen danielmekonnen added a comment -

        startHAServices must run from the FED_DIR. Following start) (line 68) in /etc/init.d/bigdataHA I've added the line: cd /var/lib/bigdata which then allows resource files to be discovered correctly.

        The jetty web root appears to be from wherever /etc/init.d/bigdataHA is launched (even if it is in /etc/init.d !). The resourceBase as follows directs jetty to the appropriate root:

        <Property name="jetty.resourceBase" default="/var/lib/bigdata/var/jetty" />

        Show
        danielmekonnen danielmekonnen added a comment - startHAServices must run from the FED_DIR. Following start) (line 68) in /etc/init.d/bigdataHA I've added the line: cd /var/lib/bigdata which then allows resource files to be discovered correctly. The jetty web root appears to be from wherever /etc/init.d/bigdataHA is launched (even if it is in /etc/init.d !). The resourceBase as follows directs jetty to the appropriate root: <Property name="jetty.resourceBase" default="/var/lib/bigdata/var/jetty" />
        Hide
        bryanthompson bryanthompson added a comment -

        Daniel, please update this ticket

        Show
        bryanthompson bryanthompson added a comment - Daniel, please update this ticket

          People

          • Assignee:
            beebs Brad Bebee
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: