What is Cloud Computing?

February 16th, 2012 by Tony No comments »

These days, everyone is hot to be “in the cloud.”  But what exactly does this mean?  Is your business “in the cloud” because you use Amazon’s S3 service for an off-site backup?  In my opinion, being “in the cloud” means taking on a new approach to operations management.  Specifically, there three areas where the cloud can be of particular value:

  1. Scalability (the most obvious)
  2. Cost management (OPEX)
  3. Architectural enhancements

 

Scalability

Endless scalability is one of the key selling points of the cloud, and rightfully so.  Infrastructure as a service (IaaS) providers like Amazon EC2, NephoScale, and GoGrid all aim to provide one thing:  virtual machines (“instances”) that bill by the hour, of varying CPU/RAM sizes.  The ability to spin up new machines on-demand can provide companies with a lot of operational flexibility, which can result in a lot of clever and elegant uses for the cloud.

Platform as a service (PaaS) providers like Microsoft’s Azure allows a layer of abstraction between your code and server, and in its own way provides the same “bottomless well” of capacity.  There are rumors of MSFT taking on the challenges of running a public IaaS platform, but to date nothing has been announced.

In either IaaS or PaaS, horizontal scalability is effectively free.  That is to say, any scale factor that you can “throw more machines” at will feel right at home with IaaS, or with some development effort, PaaS.  While there is some wiggle room with vertical scalability, the reality is that if any of your key scale factors are 100% dependent on vertical scale, you need to re-think the way your platform works.

 

Cost Management (OPEX)

Putting all the industry buzz aside, there’s a very real reason for companies to be interested in the cloud:  cost management.  Any company that has an online presence knows that servers aren’t being used at 100% capacity 100% of the time.  The reality is that things like architecture, security, and company politics cause us to have machines running different services, all with unused capacity.  Add in things like high availability (HA) and disaster recovery (DR), and the amount of idle computing resources can be doubled or tripled.  The cloud paradigm allows operations professionals to more effectively manage their operating expenditures (OPEX), since machines (or instances) that are sitting idle during off-peak hours can simply be dismissed until the capacity is needed.  Being “in the cloud” also allows you to keep a barebones platform running in another location for DR, whose capacity can be rapidly increased (minutes) in the event of a site failure at your primary location.  As if that wasn’t enough, leveraging cloud environments for QA (heavily recommended if you’re running production out of the cloud), allows for much more flexibility than the traditional model of “cloning prod.”

 

Architectural Enhancements

The cloud movement represents a lot of things for business practices, operations management, etc.  However, I think the most interesting application of the “cloud paradigm” is in platform architecture.  Most importantly, you are no longer bound by the number of physical machines you have at any one time.  Let’s take a basic example.

Assume you have a platform that collects some sort of data that your customers regularly  look to you to report on.   The nature of these reports are such that a month’s worth of data will take roughly 24 hours to process (reports are emailed to the requester so they’re not an inline operation).  Shorter reports (weekly or bi-weekly) take a couple of hours.  Based on historical trending, you see that approximately 25% of the report requests are for a month’s worth of data, and the remaining 75% are requests that finish within 2-3 hours.  Most importantly, your company has established Service Level Agreements (SLA) with your customers that govern the average run-time of these reports.

Based on the above assumptions, you would need a number of high-power machines, since at worst you need the month long job to finish in 24 hours.  Since the monthly reports represent the bulk of your computation (even if it isn’t the bulk of the requests), you need hardware that’s capable of keeping up with the most computationally expensive operation, not the least.  Even worse, you need to have a number of these machines, since you want to minimize cases where customer A has started a monthly report, and subsequently blocks customer B’s request for a weekly report for more than a day.  To take it a step further, these machines are also idle some percentage of time, causing very expensive assets to go under-utilized.

In the world of the cloud, you can very easily address a situation like this, and probably even provide a better end user experience.  Assume your reporting workflow changed from “1) someone requests, 2) one of the reporting servers takes the request, and 3) crunches until done” to “1) someone requests, 2) a new instance is created to generate this report, and 3) that instance crunches that report until it’s done, after which the instance shuts down.”  In this case,  you have raised the throughput of your reporting application to be almost limitless, and you’ve done so while lowering your OPEX (which goes hand in hand with minimizing under-utilized resources).  The new approach also allows you to create specific instance sizes for specific types of reports, so that you could “slow down” or “speed up” a report’s processing by starting either a bigger or smaller instance than is necessary.

The above is a very simple illustration of how one could leverage the unlimited capacity that cloud providers love to talk about.  By making a very small tweak to the reporting workflow, we were able to 1) decrease monthly operating costs and 2) add more control over how quickly the reporting jobs were carried out.

As the world becomes more familiar with “the cloud,” I think we will start seeing an influx of really clever and elegant uses that leverage the endless capacity of the cloud.  I, for one, can’t wait to see what you guys come up with :)

 

 

 

 

 

 

Presenting a unified namespace for PHP, JSP, and Ruby

June 29th, 2010 by Tony No comments »

Ever since I was asked by a former manager to “build” a mosso.com type system, I have been intrigued by exactly how this would be done.  My original line of thinking was that it was more of a L7 network implementation, where through the magic of deep packet inspection packets were routed to clusters of machines configured to run PHP, Rails, etc.  While this is a feasible approach, I now believe this system would not be nearly as scalable as Mosso has proven to be.

Another approach that could potentially work is a massively distributed context.  Imagine if there was a machine that had directories like /code/php/, /code/ruby/, and /code/jsp/ that you could reference from http://machine_ip/php, http://machine_ip/ruby, and http://machine_ip/jsp/ respectively.  This would minimize network requirements to keeping track of things like state, for end users accessing applications that require state.  With a little bit of work on the individual stacks, this could potentially be eliminated.

In this theoretical approach, two things would be required:

  1. a reverse proxy-type setup, bound to port 80/TCP on the IP being accessed
  2. individual instances of apache, nginx, etc, through which individual language support is provided
    • while this particular setup would be more complex in setup, troubleshooting, and upgrading, having this piece allows more flexibility (for example, if rails on apache is undesired, you could run rails on nginx instead with little to no change in the system; likewise, you could also easily change out the Ruby version, etc)

Potential problems with this approach (since no design is perfect):

  1. If nginx+rails (or any other app stack+webserver) outperforms the reverse proxy, then we’re introducing an unnecessary bottleneck.  In a massive distributed context, this isn’t a big deal.
  2. Many more webservers and environments to maintain, much more complex.  Since this is meant to be baked in as a machine image (think AMI on AWS’s EC2), this complexity can mostly be hidden from users of the server.
  3. Multiple authentication and access control layers, could lead to multiple authentication, etc.  I don’t have a real good response to this one yet.  Careful coding and well-thought out implementations could take care of this, but isn’t something anyone could realistically rely on.

Reverse Proxy

A  traditional proxy server implementation sits between the end user and the public Internet.  This server will accept all of the end user’s requests, and make them on the public Internet on the end user’s behalf.  In order to use a proxy server, users would have to modify their browser settings such that all requests are sent to the proxy instead of the websites directly.

The reverse proxy does the opposite of this.  Reverse proxies sit next to webservers, and accept requests from end users on their behalf.  Note that no end user configuration update is required for this to work (all of the work is done on the side of the webserver).  It is this functionality that we will need in order to transparently present namespaces for PHP, JSP,  and Ruby.  Since what we want is in effect a webserver itself, we can use Apache’s mod_proxy implementation for most of the functionality.

Mod_proxy, great as it is, doesn’t provide all of the functionality that we will need.  For example, if I took a pre-existing PHP application and stuck it behind a reverse proxy, any PHP-generated links will most likely not work, since the paths (and port) will be relative to the PHP install, and not the reverse proxy itself.  This might lead to HTTP requests being issued to http://machine_ip:8180/code.php instead of http://machine_ip/code.php.  Since we would necessarily restrict public access on any service ports but 80/TCP, this is sub-optimal.

In order to solve this problem, there’s a module for Apache called mox_proxy_html (http://apache.webthing.com/mod_proxy_html/).  This is a 3rd party module and not an official part of the Apache distribution, but provides exactly the functionality that we need.   With mod_proxy_html, we will be able to re-write URLs in such a way that the proxy and app servers are indistinguishable from the perspective of the end user.

Over the course of the next month or so I will be trying to build a system like I described above.  I believe such a system, along with the power of cloud computing, has the ability to reshape a lot of how web applications are spec’d.  No longer being limited to any one middleware stack is a very real need, which has sadly gone unanswered.  Hopefully this will be a stab in the right direction.

The best $5 you’ll spend with the iPad

April 26th, 2010 by Tony 1 comment »

I, like many others, decided I wanted an iPad on opening weekend. Like many others, I have been trying to figure out exactly in which way this device was “magical.”

On the other hand, it seems as if there are a LOT of people for whom this device is a “game changer.”  I’ve read countless posts from bloggers, tech pundits, and other technology enthusiasts about how the iPad is changing their life (more specifically, how they live their life).  I’m sad to say that this hasn’t quite been my experience.  In an effort to further integrate the iPad into my life, I’ve decided to structure my foray by carving out $5 or $10 lists of applications that I find to be “game changing.”

I’d like to take a moment in this first post to put down some initial thoughts about my usage pattern.

  • I’m a Systems Engineer by trade, which means I probably won’t be able to work on the iPad (need lots of windows open, ssh/pgp keychains, xmpp client, etc).
  • I consume the majority of my video content digitally.  I do have Comcast, and occasionally will splurge on a PPV, but generally speaking everything I watch is sourced off of a 3TB array.
  • The website I go to the most (easily by an order of magnitude) is reddit.com.  In a nutshell, Reddit is a community link sharing site.  As it pertains to my usage patterns, think “open link in new [tab|window].” I also frequent other agg-sites, like popurls, etc.
  • I’d like to read on the iPad, but don’t see myself paying the ridiculous prices in the Bookstore.

Total Budget:  $5

With the first budget of $5, I’m hoping to get 2 or 3 apps.  I’ve certainly noticed the trend of iPad apps being significantly more expensive than the iPhone, but I’m hoping there are some apps out there that have remained reasonable in price.  The functionality I have chosen to focus on (not surprisingly) are:

  1. Reading – using the iPad as a ebook reader.
  2. Media – playing content off of my AFP share
  3. Browsing – opening multiple browser instances, easy switching

Reading: I love to read, but hate the bulk of carrying around the plethora of books that my attention span needs.  Using the iPad as a reader, to me, should be one of the primary functions.  The gorgeous display should be able to provide a new level of richness and detail that the Kindles and the Nooks of the world can’t compete with.  While the Book reader that comes shipped is great, the content for this reader is generally very expensive, often costing as much as the paperback version.  This makes keeping the variety I’d need a rather costly proposition (~$100/month by my math).  A cheaper alternative can be attained in one of two things:

  1. Converting my own files to ePub, and syncing.  This works well enough, but is a relatively time-consuming process.  Plus, the converter apps are a little clunky, and the best one behaves rather unpredictably (but works well at its best).  Syncing is pretty painless, except for the fact that I had to use my main desktop.
  2. A document viewer capable of handling 100MB+ files.  Ideally, there would be a range of support file formats (pdf, chm, lit, doc, xls, ppt, txt, xml, etc).  If I had to choose one, though, PDF would be the most important.  Large file support is also a hard requirement, as I’ve used many reader apps on the iPhone which worked well enough, but would choke on semi-largish files.

Of the two approaches, it seems like a document viewer would be the best.  The ePub converter apps are too unpredictable, and too poor at intelligently converting that it seriously degrades the reading experience.  Finding a good reader, ideally with a few ways of importing files, is probably going to be the best choice.

After doing some research, the best reader app for the iPad seems to be GoodReader, with a cost of $0.99.  GoodReader excels at handling large files, and lets me easily scroll through image packed 150MB+ PDF files.  The best part, though, is the extreme flexibility that end users get in loading their content.  Files can be retrieved directly from the Internet (HTTP, dropbox, etc), and there are additional plugins (available at an extra cost) that allow access to Google docs, IMAP, and the like.  I used GoodReader to access my dropbox, and am very happy with the setup.  For the most part, I would say that this setup takes care of my reading requirements until something new comes up.

Total Budget: $4.01

Media: Since I consume the majority of my media digitally, being able to easily access my downloaded content is key.  Luckily, conversion isn’t nearly the pain that it was the last time I did it at home (circa 1999).  It seems the best app on the market for this is HandBrake.  It works exactly as touted, and is pretty easy to use.  However, converting content takes a bit of time (I usually batch up a bunch of movies or shows, and start before going to bed).  After conversion, files still have to be added to iTunes and synced to the iPad.   This works, and since I have the 64gb model I can copy a good many movies before I’m out of space. As well as this process works, I think it’s far from ideal.  Conversion takes hours for any meaningful amount of content, and I am still tied to my desktop as the “source of truth” for content.  Having to store the content on my iPad is also restrictive; I have about 1.6TB of content (which would probably take 25 years to encode), and being restricted to 0.04% of my content is very prohibitive.

After doing some research, it seems that an app called Air Video might be the answer.  This app purports to stream any content on a Mac or Windows machine, “converting it live” for delivery to the iPad or iPhone (once you buy it you get both versions).  This app costs $2.99, which I happily ponied up for.

Total Budget: $1.02

After giving Air Video a spin, it turns out that this is the best $2.99 I’ve spent anywhere. It works exactly as touted, converting and streaming data to the iPad.  The only bad thing I have to say about all of this is that the Windows version of the Air Video server seems to be a lot crappier than the Mac version.  It’s clear that the Mac version was built first, with the Windows version as an afterthought.  It would also be nice if they had a Linux server build.  There seems to be quite a bit of people asking about it in the forums, so hopefully it will be coming soon.  Air Video also has a remote streaming mode, where you can remotely access the Air Video server through a forwarded port (or UPnP).  I haven’t had to use this yet but will check it out from work tomorrow.

Since I’d been on a roll, I went ahead and (perhaps foolishly) bought another $0.99 app. There’s an app that I’d been hearing about called Desktop for iPad that sounds really interesting.  It basically splits the iPad display into two equal panes, which can be horizontal or vertical.  In each of these panes, you can run what is basically a “widget” (think weather, iPad hardware stats, notes, calculator, etc).  This is definitely a crap shoot, as there were a number of mixed reviews about this app.  This wasn’t unexpected, being kind of a ‘prototype’ app.  Since a widget board is something I’ve always felt that the iPad should have, I went ahead and took the plunge.

Total Budget: $0.03

Desktop for iPad, by Aqua Eagle, is kind of a strange app.  It does exactly as the website claims, but I couldn’t help but feel a little let down by the execution.  In all ways relevant the two panes are completely separate from each other, offering no level of interactivity (for example, it would seem logical that browser A could open a link into browser B).  I’ve emailed the developers about this, and have received no response.  I’m a little disappointed in this, but Safari also does this pretty well (open a link in a new window).

Of the 3 apps that I got, I would say that GoodReader and Air Video are universally applicable, and should be purchased immediately.  If the end goal is to integrate the iPad into your daily routine, then having these two apps will help get you on the way.  Desktop for iPad is interesting, and might be worth your $0.99, but I don’t feel as if it’s ready for public use just yet, and is missing some key UI features (resizing window panes, for example).

Home-made bagels!

March 18th, 2010 by Tony 1 comment »

The following bagel recipe is the product of ~60 bagels, or 10 batches.  I took what seemed to be a fairly generic recipe, incorporated elements from other recipes, and experimented to produce it.  I think these are the best bagels I’ve ever had (but I am biased), and hope anyone crazy enough to try a recipe on this blog enjoys them.

Bagel Recipe (6 bagels)

1.5 cup warm water

1 pack of active dry yeast

2 tbsp sugar

2 tsp salt

3 cups flour

  1. Mix the sugar, water, and yeast together in a large bowl.  The water should be warm to the touch but not hot.  Let stand for 5-10 minutes, until the mixture is foamy.  The warm water and sugar is for kickstarting the yeast.  Water too hot will kill the yeast, resulting in a dough that will not rise.
  2. Add the salt, mixture well.  Add flour .5-1 cup at a time, mixing well before adding more.  This mixture should start resembling dough after 2 cups, at which point it’s easier to use hands than a spatula.  The final product should be a hefty feeling dough, with smooth elastic sides.  It may be necessary to add more flour or water to achieve this result.  If so, make sure to add in small amounts.
  3. Place the dough in a lightly oiled bowl and cover with a cloth.  Let rise for an hour or until the dough has roughly doubled in size.
  4. Place the dough onto a floured work surface and knead until the dough is smooth.  Cut the dough into 6 equal portions.  Roll each portion into a round ball, and place onto a resting surface (make sure to remember the order in which the balls were placed).  The easiest way to roll these portions into round balls is to place a palm completely over a piece of dough, bring all the fingers in slightly so about 1/4″ of each side is crumpled, and start moving the hand around in a circular fashion.  It takes a bit of practice, but done right this will save tons of time and frustration.  Let the balls rise for 30-45 minutes.
  5. Starting with the first ball, knead until smooth and form a bagel.  There are a number of ways to do this, but the easiest is to make a flat disk and poke a hole in the center.  Stretch the hole out carefully until the it looks like a bagel.  An alternative is to wrap this dough around a hot dog.  Except for the hot dogs, the recipe doesn’t change.
  6. Once all bagels have been formed, start boiling 10-12 cups of water with 1 tbsp of sugar.  This is to give the bagels time to rise.
  7. Boil each bagel (again starting with the first) for ~45 seconds on each side, and place onto a cooling rack.  Add toppings if desired (sesame/poppy seeds, sauteed onions, etc).  A sprinkle of kosher salt on top is also really good.
  8. Once all bagels have completely cooled (15-20 minutes), place them into a 400F oven.  For a better crust, fill a loaf pan with water and place onto the bottom rack before turning on the oven.  This will provide a very humid environment to bake the bagels, result in crust with a little more bite.
  9. After 25 minutes, take the loaf pan out, and check the tops of the bagels.  If they don’t look golden, wait another minute and check again.   Place the bagels on a cooling rack and enjoy!

Converting instance-store instances to EBS instances (AWS EC2)

February 27th, 2010 by Tony 13 comments »

A month or two ago, Amazon Web Services (AWS) announced that their EC2 instances will now be bootable from an elastic block store (EBS) volume.  This seems like a small change, but in fact has opened up a world of possibilities in the Elastic Computing Cloud (EC2).

EBS provides “block level storage volumes for use with Amazon EC2 instances.”  Keep in mind that prior to this, instances were limited to booting off of S3-backed Amazon Machine Images (AMI), which were not persistent images.  This meant 2 things:

  1. instance-store AMIs cannot be “stopped,” only rebooted or terminated
  2. rebooting an instance-store AMI reverted the instance back to the AMI defaults

Since EBS originally debuted as a high-performance attachable block device to a given instance, booting off of EBS AMIs has shown to be faster than the traditional instance-store boot as well.

Another benefit is that EC2 instances booted off of EBS volumes can be stopped, which effectively equates to shutting down a machine in the real world.  You are still responsible for the charges incurred while this instance is reserved, but all of the changes made to said instance will persist after you start it again.

The last major benefit of booting off the EBS volumes is that AWS has made it easy for you to create a new EBS AMI from a running EBS AMI.  In the Console, right clicking on an EBS instance will yield a new option, “Create Image (EBS AMI).”  This will basically shut down your instance, and proceed to generate a new EBS AMI from the contents on the disk of your instance.  This command seems to have a failure rate of ~40%, which can be a little frustrating.  I’ve found that if you put an instance into ‘stopped’ state before creating the EBS AMI, the process has a higher chance of success, but will still take anywhere from 20-45 minutes.

The rest of this article will focus on converting an instance-store AMI into an EBS AMI.

In order to perform this conversion, you will need to have an instance-store AMI that is the base OS you’d like to run (for the purposes of this article I used alestic’s Debian 5.0 base image), and access to EC2 via CLI as well as the portal (it’s all do-able from the CLI, but some of the tasks are a LOT easier and quicker through the web console).  The stuff I did in the console will be suffixed with [console],  and the stuff from CLI will be prefixed with #.

1) Booting an instance-store AMI – I executed the following to get a list of the images that fit my criteria (32bit, Debian, base install):

# ./ec2-describe-images –region eu-west-1 –all | grep -i lenny-base | grep i386

IMAGE   ami-b13a6bf4    alestic-32-us-west-1/debian-5.0-lenny-base-20090804.manifest.xml
IMAGE   ami-b33a6bf6    alestic-32-us-west-1/debian-5.0-lenny-base-20091011.manifest.xml

Note:  ec2-describe-images outputs way more data than this, above is formatted for brevity.

Once you have the AMI (newer is better, generally speaking), boot an instance with this AMI:

# ./ec2-run –region eu-west-1 -k $keypair  ami-b33a6bf6

Note: $keypair in this case is the name of keypair used to SSH into the server

2) Customizing the EBS volume – After the instance is up and running, look to see which availability zone the instance is in.  If the region is eu-west-1, the availability zone is going to be either eu-west-1a, or eu-west-1b.  In either case, find out which availability zone your instance is in, and then create a 10gb EBS volume is the same zone [console].

Why 10gb?  10gb is the maximum size for an S3-backed AMI, which makes a 10gb volume the largest any instance-store AMI will be.  Obviously EBS AMIs can exist on larger volumes (all the way up to 1tb in size), and you can easily do so once you have an EBS-backed AMI.

After the EBS volume has been created, attach it to the running instance [console].  Remember what you chose as the device name the volume identified itself as (/dev/sdf for example).

In a root shell on the instance:

# mkfs.ext3 /dev/sdf

# mkdir /mnt/target && mount /dev/sdf /mnt/target

# rsync -avHx / /mnt/target

# rsync -avHx /dev /mnt/target

# sync;sync;sync;sync && umount /mnt/target

The above commands did the following:

  • formatted the entire volume /dev/sdf as an extended 3 filesystem
  • created directory /mnt/target and mounted /dev/sdf at /mnt/target
  • rsync’d the root instance-store filesystem to the ebs volume
  • synchronized the /dev directory from the instance-store filesystem
  • flush all pending write ops, and unmount the EBS volume

3) Creating the AMI – At this point, you should have a 10gb EBS volume that shows available [console].  Simply right-click on the volume and create a snapshot for the volume [console].  Once the snapshot has completed, select from the list of available kernels on ec2 with the following command:

# ./ec2-describe-images -o amazon | grep -i xenu

Store the AKI for the kernel you want to use in the environment variable AKI:

# export AKI=aki-xxxxxxxx

Up to this point, we have booted an instance-store AMI, created an EBS volume, synchronized the instance-store filesystem with the EBS volume, and created a snapshot of the EBS volume.  The only thing we need to do now is associate an AKI with the snapshot, and register the end result as an AMI in the EC2 repository.

# ./ec2-register –region eu-west-1 -s $SNAP –name $NAME –description “$DESC” –architecture $ARCH \

–root-device-name /dev/sda1

Where $SNAP is the ID of the snapshot, $NAME is the name of your AMI, $DESC is a description of the AMI, and $ARCH is either i386 (for 32-bit) or x86_64 (for 64-bit).  The command will return an AMI, which will be yours to boot from once it finishes!

To track the progress of the AMI creation, you can do the following:

# watch -n 30 ‘./ec2-describe-images ami-xxxxxxxx’

This will execute the ec2-describe-images command for your new AMI every 30 seconds.  You can stop the command once you see that the AMI is in available state.

Now that you have an EBS-backed AMI, any further customizations you make to this image can be preserved forever by simply right-clicking on the instance [console], and clicking “Create Image (EBS AMI).”

Enjoy!

DNSSEC on the Root Zone: Are You Ready?

February 18th, 2010 by Tony No comments »

According to the ICANN, US Department of Commerce, and VeriSign, DNSSEC will be implemented on the root zones by no later than May 2010 (source: http://www.root-dnssec.org/).  For those of you who don’t know what this means, here’s a deconstruction of the sentence above:

  • DNS stands for the Domain Name System.  DNS is a hierarchical naming system for the Internet (or private network) that translates human-meaningful domain names into machine-meaningful numerical identifiers (eg, www.winnersdontlose.com -> 74.52.192.250).
  • DNSSEC stands for the Domain Name Systems Security Extensions.  It is effectively a suite of IETF-created specifications for securing certain types of data that is provided by DNS.
  • A root zone is the top-level DNS zone in the above mentioned hierarchy.  Generally speaking, “the root zone” is the largest global DNS system on the Internet.  IANA and ICANN manages this zone.  Strictly speaking, this zone is where the authorities for gTLD (.com, .net, etc) and CC TLD (.uk, .us, etc) are configured and handed out to the world.  The root zone is hosted on 13 clusters of root servers (a. to m.root-servers.net), which are the only servers in the world that are allowed to be authoritative for it.

So now, re-read what I typed.

-

According to the ICANN, US Department of Commerce, and VeriSign,
DNSSEC will be implemented on the root zones by no later than May
2010 (source: http://www.root-dnssec.org/).

-

So what does this really mean to you?  Given that the majority of us don’t need to know much more than theory about the root zone, would ICANN, US Dept of Commerce, and VeriSign really risk implementing something that could break a lot of installations today?  One would think that the majority of their decisions could be implemented in a transparent fashion, unbeknownst to the most of us. To better understand this decision, let’s look at it from two different perspectives:

  1. What benefits does DNSSEC on the root zone really provide?  Why is it necessary?
  2. What is the impact to the rest of the Internet by implementing DNSSEC on the root zone?

-

1) DNS, like a whole lot of other protocols that were created decades ago, was designed for a different Internet (lower bandwidth, higher error rate, trustworthy users).  The Internet as we know it today (including the misanthropes) has forced the creation and deprecation of many protocol specifications.  DNS suffers mostly from the trust issue.  For example, a DNS resolver currently sends a query for a resource record (RR) out to the Internet, and then will accept the first response it receives, without question.  Obviously a malicious server could provide an incorrect response, forcing the original resolver to use this incorrect address until its cache expired.  If this DNS server belongs to a major consumer ISP (like Comcat or RoadRunner), a major service interruption affecting tens of thousands of users could easily happen.

-

2) From a technology standpoint, the impact of signing the root zone has a minimal effect on end users and networks.  Since DNSSEC uses a PGP-esque keypair setup, RRs that are sent and received all day long by nameservers will be a little larger.  This will probably result in slightly larger network bills for companies managing large-ish installations, with the rest of us noticing little to no change in our operating costs.

So what IS the problem?

Part of the problem is that RFC 1035 (http://www.ietf.org/rfc/rfc1035.txt) set an upper-bound for the size (512-bytes) of a DNS response.  Because of this, many firewalls and applications used this limit in their DNS implementations.  This would not have been an issue until DNSSEC.  Since DNSSEC responses will have a signature (RRSIG) as well as the RR itself, it’s not out of the question to assume that many responses will be bigger than 512 bytes in size.

In addition to the RFC, another issue is that larger DNS packets have never been well respected on the Internet.  Most of the time, large UDP packets will get fragmented by routers along the way to the destination, and the destination either doesn’t, or can’t deal with fragmented UDP packets, and drops them.  DNS clients will usually then retry the query with a smaller buffer size, and may even eventually fail back to TCP.

-

In a nutshell, most of the problems that come about as a result of DNSSEC being implemented on the root zone are caused by larger DNS packets.  The two things you want to make sure of are:

  1. Your server/network will allow in DNS packets larger than 512-bytes.
  2. Your network/firewall will accept IP fragments, and your server knows what to do with them.

The DNS Operations Analysis and Research Center (DNS-OARC) has created a tool to help ease this process.  See https://www.dns-oarc.net/oarc/services/replysizetest for more details.

Creating custom TCP monitors in Nagios

February 16th, 2010 by Tony No comments »

Recently I had to configure nagios to monitor a non-standard port (9999/TCP).  This is a service of ours that apparently decides to die at random times.  In order to find out more about the issue (this service is not live yet), I configured an instance of nagios to monitor this port.

In order to do this, I used the pre-existing check_tcp with a specific run-time parameter (namely, -p 9999).

First, define a command for this service check:

define command{
     command_name check_tcp_9999
     command_line $USER1$/check_tcp -h $HOSTADDRESS$ -p 9999 -4
}

Note that we manually defined the -p (port) and -h (host) parameters to use the host using this command, as well as port 9999, respectively.

Once the command has been defined, create a new service for the host in question.  If you haven’t set one up yet, below is an example to follow:

define host{
     use generic-host   ; template name, available by default
     host_name svr-001  ; unique name of the host being defined
     alias server 001   ; description of the host
     address 10.0.0.254 ; IP of the host
}

To get the service check we defined above running against this host, reference the following stanza.

define service{
     use generic-service                          ; template name, available by default
     host_name svr-001                            ; the host against which to run this check
     service_description 'tcp check of 9999/tcp'  ; self-explanatory
     check_command check_tcp_9999                 ; name of the command we defined earlier
     check_interval 5                             ; check this service every 5 minutes
     check_period 24x7                            ; time period in which to monitor
     retry_interval 3                             ; if a check fails, re-try in 3 minutes
     max_check_attempts 3                         ; the max number of times a failed service will be checked
     notification_interval 60                     ; reminder of alerts sent every 60 minutes
     notification_period 24x7                     ; time period in which alerts are sent
     notification_options w,c,u                   ; the conditions upon which to send alerts
     contact_groups contacts                      ; contact group to alert in the event of failure
}

Verify that the additions you made to the config file(s) were valid by executing:

# nagios3 -v /etc/nagios3/nagios.cfg

If the pre-flight check goes well, update your nagios installation by executing:

# /etc/init.d/nagios3 restart

That’s it!

EC2 filesystem performance

January 26th, 2010 by Tony 2 comments »

I’ve been posting lately about a tool named bonnie++, which will run a suite of tests against your linux filesystem to determine metrics in 3 important areas:  data read/write speed, max random seeks, and max metadata operations.  Last time I posted about profiling one of Linode.com’s “Linode 360″ instances.  In this article I will profile a m1.small instance on Amazon Web Services’ (AWS) Elastic Compute Cloud (EC2) service.

EC2 is the first legitimate cloud offering to market, and in many contexts they are the most developed, most robust, cloud provider.  However, there are many companies quickly ramping up their offerings (GoGrid, Voxel, Flexiscale, etc), if only in one or two datacenters (Voxel is the leader of the group, with locations in NY, Singapore, and Amsterdam).

The m1.small instance comes with the following specifications:

  • 1.7gb RAM
  • 1 EC2 Compute Unit
  • 160gb instance storage
  • “moderate” I/O performance

While these specs are mostly useful in comparison to other EC2 instance sizes, the performance of this particular size will provide a useful benchmark for baselining EC2 performance.  Since this is the smallest instance, I’m assuming “moderate I/O performance” means that it’s as bad as it gets.

From the earlier post, we use the following command to invoke bonnie++:

# bonnie++ -u 0 -r 1700 -s 34000 -n 256 -b -d /

The above commanded failed to run, claiming that the filesystem was out of space.  Checking the filesystem, I see that I only have /dev/sda1, which is 15gb and mounted at /.  Since real-world testing involves what the customer actually gets and not what the marketing literature says, I adjusted the -S parameter to 3400, which should easily outpace/outpage the 1.7gb of memory in my instance.

Invoking bonnie++ with the new parameter yields me with the following result (click for larger image):

Server ip-10-226-125-238 was able to

  • sustain ~52MBS at 6% CPU for sequential block writes
  • sustain ~64MBS at 1% CPU for sequential block reads
  • max out at 939.8 random seeks per second
  • sequentially create 127 files per second
  • randomly create 174 files per second
  • sequentially read metadata from 158,584 files per second at 39% CPU
  • randomly read metadata from 203,851 files per second at 41% CPU
  • sequentially delete 121 files per second
  • randomly delete 158 files per second

As the results from running bonnie++ in various providers pile up, I will compile them into a spreadsheet which will (hopefully) ultimately shed some light on the performance boundaries of various VPS and cloud providers.

Linode filesystem performance

January 19th, 2010 by Tony No comments »

In a previous article, I wrote about using bonnie++ (http://www.coker.com.au/bonnie++/) to benchmark the performance of your hard drives and filesystem.  We went through a sample command, explained all the parameters (most notably, -s).

In this post I’d like to display and summarize the results from running bonnie++ on two systems:  a Linode (as the title suggests) and a physical server as the control.

Linode is a VPS hosting company with availability in the US and Europe.  They strive for the best service possible, and are a very well regarded provider within the hosting community.  The linode I chose for the purposes of this test is their “Linode 360″ which comes with 360MB RAM, 16GB of storage, 200GB of transit, and a price tag of $19.95 per month.  This is their entry-level setup, and their instances can scale up to 2.8GB RAM, 128GB storage, 1.6TB transit (for $159.95/month).

The command we’re running to invoke bonnie++ is:

# bonnie++ -u 0 -r 360 -s 7200 -n 256  -b -d /

Note the new values for -r and -s from my last post.

Results (click for a larger version):

On the far left of the results, in bold, you will see the machine’s hostname.  In my case, this is the Linode assigned hostname of li149-50.  The next column is labeled “Size: Chunk Size,” which is basically the size we specified as the size to use for IO performance measurements (the -s flag).

As stated in my last post, bonnie++ offers three specific metrics:

  1. Data read and write speed.
  2. Maximum seeks per second.
  3. Maximum file metadata operations per second.

Applying these three metrics to our results, the following correlations  can be established:

  • Sequential Input and Sequential Output apply to 1
  • Random Seeks apply to 2
  • Sequential Create and Random Create apply to 3

Now, let’s summarize with real numbers.

Server li149-50 was able to:

  • sustain ~85MBS at 13% CPU for sequential block writes.
  • sustain ~180MBS at 0% CPU for sequential block reads.
  • max out at 167.9 random seeks per second.
  • sequentially create 2,236 files per second.
  • randomly create 2,001 files per second.
  • sequentially read metadata from 395,067 files per second at 99% CPU.
  • randomly read metadata from 503,021 files per second at 99% CPU.
  • sequentially delete 1,014 files per second.
  • randomly delete 610 files per second.

Running the same bonnie++ command (with different -r and -s flags) on a physical server with an 80GB PATA drive yields the following (for reference, the values were “1024″ and “20480″ for “-r” and “-s”, respectively):

# bonnie++ -u 0 -r 1024 -s 20480 -n 256  -b -d /

Results (click for a larger version):

Summarizing in a similar fashion from above, we get the following:

Server tp.eliminated.org was able to

  • sustain ~41MBS at 29% CPU for sequential block writes.
  • sustain ~50MBS at 10% CPU for sequential block reads.
  • max out at 71 random seeks per second.
  • sequentially create 447 files per second.
  • randomly create 2,001 files per second.
  • sequentially read metadata from 288,111 files per second.
  • randomly read metadata from 503,021 per second.
  • sequentially delete 225 files per second.
  • randomly delete 610 files per second.

Note that the metrics regarding file metadata operations are based off of the number of files on which we test (the -n flag).  Since the value supplied to -n was the same on both servers (256), the results we get back are very similar (if not the same).

I will continue to benchmark various VPS and Cloud providers in order to ascertain IO performance from as many providers as possible.

Benchmarking hard drives and filesystems with bonnie++

January 18th, 2010 by Tony No comments »

Bonnie++ (http://www.coker.com.au/bonnie++/) is “a benchmark suite that is aimed at performing a number of simple tests of hard drive and file system performance.”   Bonnie++ outputs a 80-column display report (to fit on braille keyboards), as well as csv values that can be converted to text or HTML (with bon_csv2txt and bon_csv2html, respectively).

Bonnie++ provides performance metrics on the following 3 things:

  1. Data read and write speed:  This should be fairly obvious.  This is how fast you can read and write to the drive(s) in question.
  2. Maximum number of seeks per second:   If the blocks that your computer needs are not sequentially stored (right next to each other), then the heads in the HD will have to do a seek (physically moves the heads to the right platter).
  3. Maximum number of file metadata operations per second:  Metadata operations include things like file creation, deletion, or gathering any other metadata about a file (permissions, size, etc).

In order to run all of the tests included, I ran the following command:

# bonnie++ -u 0 -r 1024 -s 20480 -n 256  -b -d /

Let’s break down the parameters passed to bonnie++:

  • -u 0:  The -u flag is used to indicated the UID that the test should run as.  UID 0, as we all know, belongs to the root user.
  • -r 1024:  The -r flag is used to indicate how much RAM (in megabytes) is in the machine.  As you can see I have 1gb (1024mb).
  • -s 20480:  The -s flag is used to indicate the size of the file(s) (in megabytes) to be used for IO performance testing.  More on this value later.
  • -n 256:  Specifies the number of files for the file creation test.  This is in multiples of 1024 files, so 256 = 252,144 (256*1024) files.
  • -b: The -b flag is used to disable any system-level buffering.  Basically this means an fsync() is called after every write operation.
  • -d /:  The -d flag is used to indicate the path on which to perform the tests.

Of all the runtime parameters, -s is probably the most important to define properly.  This flag (in megabytes) defines the size of the files used for IO performance benchmarking.  If the supplied size is greater than 1gb, then multiple 1gb files will be used until the value is reached.  In order to get proper results, you will want to specify a number much larger than the amount of RAM you have.  If possible, use a much higher multiple, like 20x.   To be sure to bypass any caching done by the hardware, you want at least 2-3x the amount of RAM.