Ever since I was asked by a former manager to “build” a mosso.com type system, I have been intrigued by exactly how this would be done. My original line of thinking was that it was more of a L7 network implementation, where through the magic of deep packet inspection packets were routed to clusters of machines configured to run PHP, Rails, etc. While this is a feasible approach, I now believe this system would not be nearly as scalable as Mosso has proven to be.
Another approach that could potentially work is a massively distributed context. Imagine if there was a machine that had directories like /code/php/, /code/ruby/, and /code/jsp/ that you could reference from http://machine_ip/php, http://machine_ip/ruby, and http://machine_ip/jsp/ respectively. This would minimize network requirements to keeping track of things like state, for end users accessing applications that require state. With a little bit of work on the individual stacks, this could potentially be eliminated.
In this theoretical approach, two things would be required:
- a reverse proxy-type setup, bound to port 80/TCP on the IP being accessed
- individual instances of apache, nginx, etc, through which individual language support is provided
- while this particular setup would be more complex in setup, troubleshooting, and upgrading, having this piece allows more flexibility (for example, if rails on apache is undesired, you could run rails on nginx instead with little to no change in the system; likewise, you could also easily change out the Ruby version, etc)
Potential problems with this approach (since no design is perfect):
- If nginx+rails (or any other app stack+webserver) outperforms the reverse proxy, then we’re introducing an unnecessary bottleneck. In a massive distributed context, this isn’t a big deal.
- Many more webservers and environments to maintain, much more complex. Since this is meant to be baked in as a machine image (think AMI on AWS’s EC2), this complexity can mostly be hidden from users of the server.
- Multiple authentication and access control layers, could lead to multiple authentication, etc. I don’t have a real good response to this one yet. Careful coding and well-thought out implementations could take care of this, but isn’t something anyone could realistically rely on.
Reverse Proxy
A traditional proxy server implementation sits between the end user and the public Internet. This server will accept all of the end user’s requests, and make them on the public Internet on the end user’s behalf. In order to use a proxy server, users would have to modify their browser settings such that all requests are sent to the proxy instead of the websites directly.
The reverse proxy does the opposite of this. Reverse proxies sit next to webservers, and accept requests from end users on their behalf. Note that no end user configuration update is required for this to work (all of the work is done on the side of the webserver). It is this functionality that we will need in order to transparently present namespaces for PHP, JSP, and Ruby. Since what we want is in effect a webserver itself, we can use Apache’s mod_proxy implementation for most of the functionality.
Mod_proxy, great as it is, doesn’t provide all of the functionality that we will need. For example, if I took a pre-existing PHP application and stuck it behind a reverse proxy, any PHP-generated links will most likely not work, since the paths (and port) will be relative to the PHP install, and not the reverse proxy itself. This might lead to HTTP requests being issued to http://machine_ip:8180/code.php instead of http://machine_ip/code.php. Since we would necessarily restrict public access on any service ports but 80/TCP, this is sub-optimal.
In order to solve this problem, there’s a module for Apache called mox_proxy_html (http://apache.webthing.com/mod_proxy_html/). This is a 3rd party module and not an official part of the Apache distribution, but provides exactly the functionality that we need. With mod_proxy_html, we will be able to re-write URLs in such a way that the proxy and app servers are indistinguishable from the perspective of the end user.
Over the course of the next month or so I will be trying to build a system like I described above. I believe such a system, along with the power of cloud computing, has the ability to reshape a lot of how web applications are spec’d. No longer being limited to any one middleware stack is a very real need, which has sadly gone unanswered. Hopefully this will be a stab in the right direction.