Archive for February, 2010

Converting instance-store instances to EBS instances (AWS EC2)

February 27th, 2010

A month or two ago, Amazon Web Services (AWS) announced that their EC2 instances will now be bootable from an elastic block store (EBS) volume.  This seems like a small change, but in fact has opened up a world of possibilities in the Elastic Computing Cloud (EC2).

EBS provides “block level storage volumes for use with Amazon EC2 instances.”  Keep in mind that prior to this, instances were limited to booting off of S3-backed Amazon Machine Images (AMI), which were not persistent images.  This meant 2 things:

  1. instance-store AMIs cannot be “stopped,” only rebooted or terminated
  2. rebooting an instance-store AMI reverted the instance back to the AMI defaults

Since EBS originally debuted as a high-performance attachable block device to a given instance, booting off of EBS AMIs has shown to be faster than the traditional instance-store boot as well.

Another benefit is that EC2 instances booted off of EBS volumes can be stopped, which effectively equates to shutting down a machine in the real world.  You are still responsible for the charges incurred while this instance is reserved, but all of the changes made to said instance will persist after you start it again.

The last major benefit of booting off the EBS volumes is that AWS has made it easy for you to create a new EBS AMI from a running EBS AMI.  In the Console, right clicking on an EBS instance will yield a new option, “Create Image (EBS AMI).”  This will basically shut down your instance, and proceed to generate a new EBS AMI from the contents on the disk of your instance.  This command seems to have a failure rate of ~40%, which can be a little frustrating.  I’ve found that if you put an instance into ‘stopped’ state before creating the EBS AMI, the process has a higher chance of success, but will still take anywhere from 20-45 minutes.

The rest of this article will focus on converting an instance-store AMI into an EBS AMI.

In order to perform this conversion, you will need to have an instance-store AMI that is the base OS you’d like to run (for the purposes of this article I used alestic’s Debian 5.0 base image), and access to EC2 via CLI as well as the portal (it’s all do-able from the CLI, but some of the tasks are a LOT easier and quicker through the web console).  The stuff I did in the console will be suffixed with [console],  and the stuff from CLI will be prefixed with #.

1) Booting an instance-store AMI – I executed the following to get a list of the images that fit my criteria (32bit, Debian, base install):

# ./ec2-describe-images –region eu-west-1 –all | grep -i lenny-base | grep i386

IMAGE   ami-b13a6bf4    alestic-32-us-west-1/debian-5.0-lenny-base-20090804.manifest.xml
IMAGE   ami-b33a6bf6    alestic-32-us-west-1/debian-5.0-lenny-base-20091011.manifest.xml

Note:  ec2-describe-images outputs way more data than this, above is formatted for brevity.

Once you have the AMI (newer is better, generally speaking), boot an instance with this AMI:

# ./ec2-run –region eu-west-1 -k $keypair  ami-b33a6bf6

Note: $keypair in this case is the name of keypair used to SSH into the server

2) Customizing the EBS volume – After the instance is up and running, look to see which availability zone the instance is in.  If the region is eu-west-1, the availability zone is going to be either eu-west-1a, or eu-west-1b.  In either case, find out which availability zone your instance is in, and then create a 10gb EBS volume is the same zone [console].

Why 10gb?  10gb is the maximum size for an S3-backed AMI, which makes a 10gb volume the largest any instance-store AMI will be.  Obviously EBS AMIs can exist on larger volumes (all the way up to 1tb in size), and you can easily do so once you have an EBS-backed AMI.

After the EBS volume has been created, attach it to the running instance [console].  Remember what you chose as the device name the volume identified itself as (/dev/sdf for example).

In a root shell on the instance:

# mkfs.ext3 /dev/sdf

# mkdir /mnt/target && mount /dev/sdf /mnt/target

# rsync -avHx / /mnt/target

# rsync -avHx /dev /mnt/target

# sync;sync;sync;sync && umount /mnt/target

The above commands did the following:

  • formatted the entire volume /dev/sdf as an extended 3 filesystem
  • created directory /mnt/target and mounted /dev/sdf at /mnt/target
  • rsync’d the root instance-store filesystem to the ebs volume
  • synchronized the /dev directory from the instance-store filesystem
  • flush all pending write ops, and unmount the EBS volume

3) Creating the AMI – At this point, you should have a 10gb EBS volume that shows available [console].  Simply right-click on the volume and create a snapshot for the volume [console].  Once the snapshot has completed, select from the list of available kernels on ec2 with the following command:

# ./ec2-describe-images -o amazon | grep -i xenu

Store the AKI for the kernel you want to use in the environment variable AKI:

# export AKI=aki-xxxxxxxx

Up to this point, we have booted an instance-store AMI, created an EBS volume, synchronized the instance-store filesystem with the EBS volume, and created a snapshot of the EBS volume.  The only thing we need to do now is associate an AKI with the snapshot, and register the end result as an AMI in the EC2 repository.

# ./ec2-register –region eu-west-1 -s $SNAP –name $NAME –description “$DESC” –architecture $ARCH \

–root-device-name /dev/sda1

Where $SNAP is the ID of the snapshot, $NAME is the name of your AMI, $DESC is a description of the AMI, and $ARCH is either i386 (for 32-bit) or x86_64 (for 64-bit).  The command will return an AMI, which will be yours to boot from once it finishes!

To track the progress of the AMI creation, you can do the following:

# watch -n 30 ‘./ec2-describe-images ami-xxxxxxxx’

This will execute the ec2-describe-images command for your new AMI every 30 seconds.  You can stop the command once you see that the AMI is in available state.

Now that you have an EBS-backed AMI, any further customizations you make to this image can be preserved forever by simply right-clicking on the instance [console], and clicking “Create Image (EBS AMI).”

Enjoy!

DNSSEC on the Root Zone: Are You Ready?

February 18th, 2010

According to the ICANN, US Department of Commerce, and VeriSign, DNSSEC will be implemented on the root zones by no later than May 2010 (source: http://www.root-dnssec.org/).  For those of you who don’t know what this means, here’s a deconstruction of the sentence above:

  • DNS stands for the Domain Name System.  DNS is a hierarchical naming system for the Internet (or private network) that translates human-meaningful domain names into machine-meaningful numerical identifiers (eg, www.winnersdontlose.com -> 74.52.192.250).
  • DNSSEC stands for the Domain Name Systems Security Extensions.  It is effectively a suite of IETF-created specifications for securing certain types of data that is provided by DNS.
  • A root zone is the top-level DNS zone in the above mentioned hierarchy.  Generally speaking, “the root zone” is the largest global DNS system on the Internet.  IANA and ICANN manages this zone.  Strictly speaking, this zone is where the authorities for gTLD (.com, .net, etc) and CC TLD (.uk, .us, etc) are configured and handed out to the world.  The root zone is hosted on 13 clusters of root servers (a. to m.root-servers.net), which are the only servers in the world that are allowed to be authoritative for it.

So now, re-read what I typed.

-

According to the ICANN, US Department of Commerce, and VeriSign,
DNSSEC will be implemented on the root zones by no later than May
2010 (source: http://www.root-dnssec.org/).

-

So what does this really mean to you?  Given that the majority of us don’t need to know much more than theory about the root zone, would ICANN, US Dept of Commerce, and VeriSign really risk implementing something that could break a lot of installations today?  One would think that the majority of their decisions could be implemented in a transparent fashion, unbeknownst to the most of us. To better understand this decision, let’s look at it from two different perspectives:

  1. What benefits does DNSSEC on the root zone really provide?  Why is it necessary?
  2. What is the impact to the rest of the Internet by implementing DNSSEC on the root zone?

-

1) DNS, like a whole lot of other protocols that were created decades ago, was designed for a different Internet (lower bandwidth, higher error rate, trustworthy users).  The Internet as we know it today (including the misanthropes) has forced the creation and deprecation of many protocol specifications.  DNS suffers mostly from the trust issue.  For example, a DNS resolver currently sends a query for a resource record (RR) out to the Internet, and then will accept the first response it receives, without question.  Obviously a malicious server could provide an incorrect response, forcing the original resolver to use this incorrect address until its cache expired.  If this DNS server belongs to a major consumer ISP (like Comcat or RoadRunner), a major service interruption affecting tens of thousands of users could easily happen.

-

2) From a technology standpoint, the impact of signing the root zone has a minimal effect on end users and networks.  Since DNSSEC uses a PGP-esque keypair setup, RRs that are sent and received all day long by nameservers will be a little larger.  This will probably result in slightly larger network bills for companies managing large-ish installations, with the rest of us noticing little to no change in our operating costs.

So what IS the problem?

Part of the problem is that RFC 1035 (http://www.ietf.org/rfc/rfc1035.txt) set an upper-bound for the size (512-bytes) of a DNS response.  Because of this, many firewalls and applications used this limit in their DNS implementations.  This would not have been an issue until DNSSEC.  Since DNSSEC responses will have a signature (RRSIG) as well as the RR itself, it’s not out of the question to assume that many responses will be bigger than 512 bytes in size.

In addition to the RFC, another issue is that larger DNS packets have never been well respected on the Internet.  Most of the time, large UDP packets will get fragmented by routers along the way to the destination, and the destination either doesn’t, or can’t deal with fragmented UDP packets, and drops them.  DNS clients will usually then retry the query with a smaller buffer size, and may even eventually fail back to TCP.

-

In a nutshell, most of the problems that come about as a result of DNSSEC being implemented on the root zone are caused by larger DNS packets.  The two things you want to make sure of are:

  1. Your server/network will allow in DNS packets larger than 512-bytes.
  2. Your network/firewall will accept IP fragments, and your server knows what to do with them.

The DNS Operations Analysis and Research Center (DNS-OARC) has created a tool to help ease this process.  See https://www.dns-oarc.net/oarc/services/replysizetest for more details.

Creating custom TCP monitors in Nagios

February 16th, 2010

Recently I had to configure nagios to monitor a non-standard port (9999/TCP).  This is a service of ours that apparently decides to die at random times.  In order to find out more about the issue (this service is not live yet), I configured an instance of nagios to monitor this port.

In order to do this, I used the pre-existing check_tcp with a specific run-time parameter (namely, -p 9999).

First, define a command for this service check:

define command{
     command_name check_tcp_9999
     command_line $USER1$/check_tcp -h $HOSTADDRESS$ -p 9999 -4
}

Note that we manually defined the -p (port) and -h (host) parameters to use the host using this command, as well as port 9999, respectively.

Once the command has been defined, create a new service for the host in question.  If you haven’t set one up yet, below is an example to follow:

define host{
     use generic-host   ; template name, available by default
     host_name svr-001  ; unique name of the host being defined
     alias server 001   ; description of the host
     address 10.0.0.254 ; IP of the host
}

To get the service check we defined above running against this host, reference the following stanza.

define service{
     use generic-service                          ; template name, available by default
     host_name svr-001                            ; the host against which to run this check
     service_description 'tcp check of 9999/tcp'  ; self-explanatory
     check_command check_tcp_9999                 ; name of the command we defined earlier
     check_interval 5                             ; check this service every 5 minutes
     check_period 24x7                            ; time period in which to monitor
     retry_interval 3                             ; if a check fails, re-try in 3 minutes
     max_check_attempts 3                         ; the max number of times a failed service will be checked
     notification_interval 60                     ; reminder of alerts sent every 60 minutes
     notification_period 24x7                     ; time period in which alerts are sent
     notification_options w,c,u                   ; the conditions upon which to send alerts
     contact_groups contacts                      ; contact group to alert in the event of failure
}

Verify that the additions you made to the config file(s) were valid by executing:

# nagios3 -v /etc/nagios3/nagios.cfg

If the pre-flight check goes well, update your nagios installation by executing:

# /etc/init.d/nagios3 restart

That’s it!