Scaling websites that deliver static content has sadly become an old trick. Anyone who has ever heard the letters CDN used next to each other can put together an intelligent solution for scaling static content delivery. However, what happens when a web application that requires "sticky sessions" needs to be horizontally scaled? Even Akamai (the only CDN with a dynamic acceleration product) necessarily forwards all of the dynamic requests back to origin. While they do lessen the load on origin servers, the very nature of dynamic applications is such that all the processing needs to be handled by -your- servers.
Despite the fact that this entry will focus explicitly on how to do this with .NET applications, the approach to this problem is basically the same, no matter which middleware you use.
In essence, when a user signs into an application using sticky sessions, the user is assigned a cookie that identifies the session. The server also needs to know about this, since there are server-side controls, such as session timeouts, etc. In .NET the "session" is known as "state" and the server-side controls are governed by config files scoped by the server (machine.config) or the web app itself (web.config).
Microsoft provides a few different ways of managing state. At the highest level, the distinction is drawn between "InProc" (local server memory) or not. The InProc model is a monolithic one, essentially designed for a one server + failover scenario. While this is fine for redundancy, this will not help if your user load overwhelms that single server.
If you do not use the InProc model, Microsoft has provided an ASP.NET state server service, which is a service that will store and retrieve state data from any ASP.NET web application. They also provide a handful of registry keys with which to tweak the service. I'm not a big fan of this particular mechanism, since it's very much a "black box" approach. I've found that utilizing the state server is fine most of the time, but when it doesn't work you are more or less reduced to the traditional Windows troubleshooting method (rebooting).
An alternative for storing state data out of the memory is SQL Server. You can take any SQL Server instance and create a special state database (the schema for which is publicly available), and then point your web applications to the SQL instance for storing and retrieving state data. This approach is easier to troubleshoot, since you have the full array of SQL tools available, but it is much harder to setup, and is also slower than the InProc or state server models. When I've used this in a production environment I found that it worked a little more reliably than the state server, but there was a very noticeable hit on performance. (Note: the performance impact can probably be minimized with intelligent coding but that sadly was not an option for us at the time.)
So far, I've written about InProc, state server, and SQL server as methods of storing and retrieving state data. Another way of scaling your web front-end while ensuring that your end user sessions are not lost is the following:
Say I have a web application, available at http://app.company.com. I'd like to scale this hostname such that thousands of simultaneous users will not bring my server down, but I'm also sensitive enough to end user experience such that the unreliabe or slower performance of the state server and SQL Server is not acceptable. Consider the following flow:
1) User loads http://app.company.com.
2) The webserver for http://app.company.com does an HTTP 302 on the end user agent, and sends them to either http://app1.company.com or http://app2.company.com.
3) All further interactions with the end user happens over the hostname http://app1.company.com or http://app2.company.com.
In this scenario, we assume that machines app1 and app2 are fully contained instances of the application you are trying to scale. These machines should be configured for InProc. SInce we rewrote the end user's URL in their browser, we can therefore be assured that all following interactions is with app1 or app2 directly. The downside to this approach is the introduction of a very piece of code: the redirector. This code will basically have to round robin through a list of hostnames, and ideally perform some manner of a healthcheck so end users are never sent to a machine that isn't working. Also, while this approach does not scale forever, it does introduce extra layers of scalability which can be used (load balancing to a pair of redirector servers, etc, ad nauseum).
While some of the methodologies discussed here are specific to the .NET framework, most MW tiers have the ability to maintain state in something other than local memory. The redirector approach is completely agnostic and should be applicable to most any scenario, but as always, none of the stuff I discuss here is an endorsement of any sort, these are more a synopsis of my experiences in having worked with this stuff for a relatively high volume environment with (what I consider to be) unreasonable demands ;)