Our team has been working on a rather large and complex web application project for the past couple of years. Now, nearing its completion, we are challenged with the necessity to host, deploy and support the ready product. On one hand side, we want to retain control over the code base for each customer separately, i.e. be able to introduce individual installations which coexist but have different source code versions. On the other side, we want to offer the platform to everyone under a common URL namespace and be sure, that every user is routed to their own virtual host.
Main Challenges
- Host a large number of virtual hosts (e.g. more than 1000), each with its own code base, source code revision, database and settings;
- Introduce individual customer identification and authorization;
- Secure the virtual hosts serving the actual application and isolate them from the outside World through a firewall;
- Use the above information to route each customer to the correct virtual host, representing their individual installation;
- Bring everything transparently under the same URL namespace, advertising a single URL to everyone for product access.
The Approach
The basic idea was to authenticate each web user for the purposes of giving them access to the application. Then use this information to route the user’s requests to the correct virtual host, that contains their individual code base and settings. To achieve this, we decided to introduce a gateway server to handle the authentication and routing.
Our main URL, say https://application.finite-soft.com, points to the external IP address of the web gateway in the picture above. Customers are given individual SSL certificates, signed by our own trusted Certificate Authority (CA). We will use the Organization Unit (OU) part of the SSL certificate to identify the customer’s organization and route the web user to their own installation (virtual host).
Required Apache Modules
I have researched different configuration options and tutorials. The only working configuration I found so far uses a combination of the following Apache modules: mod_rewrite, mod_proxy and mod_proxy_http. So, you will need to enable these by doing the following as root:
a2enmod rewrite a2enmod proxy a2enmod proxy_http
The Configuration Itself
Here is the whole Web Gateway configuration:
<VirtualHost *:443> ServerName application.finite-soft.com DocumentRoot /var/www/application/main SSLEngine on SSLCertificateKeyFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.key SSLCertificateChainFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.ca-bundle SSLCertificateFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.crt SSLCACertificateFile /etc/ssl/finite-soft.com/FSS.CA.chain.pem SSLOptions +StdEnvVars SSLVerifyClient require SSLVerifyDepth 10 # Enable ProxyPass environment variables ProxyPassInterpolateEnv On # Enable Proxy engine for SSL requests SSLProxyEngine on # Proxy requests based on user's certificate RewriteMap lowercase int:tolower RewriteCond "%{SSL:SSL_CLIENT_VERIFY}" "SUCCESS" RewriteCond "${lowercase:%{SSL:SSL_CLIENT_S_DN_OU}}" "^(.+)$" RewriteRule "^/(.*)" "https://%1.finite-soft.com/$1" [P,E=VHOST:%1] ProxyPassReverse "/" "https://${VHOST}.finite-soft.com/" interpolate <Directory /var/www/application/main> AllowOverride None </Directory> </VirtualHost>
The first part is straight-forward. It sets up the server name for the gateway server and its DocumentRoot. You can use this folder to serve some generic information in case the user authentication fails. Otherwise the user will be redirected.
ServerName application.finite-soft.com DocumentRoot /var/www/application/main
The next part tells Apache to turn the SSL engine on and where to find the certificate files for this domain. In our case, we use a wildcard certificate for *.finite-soft.com.
SSLEngine on SSLCertificateKeyFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.key SSLCertificateChainFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.ca-bundle SSLCertificateFile /etc/ssl/finite-soft.com/STAR_finite-soft_com.crt
After that we proceed with user authentication requirements. The SSLCACertificateFile tells Apache to look for users who identify themselves with SSL certificates, signed by the specified Certificate Authority (CA).
The next line instructs Apache to register all SSL specific information to the standard environment. This makes them available to your application in the DocumentRoot, should you need it. If not, you can skip this line for performance reasons.
The last two commands tell Apache to require a valid certificate from the web user. The SSLVerifyDepth command controls the length of the permitted certificate chain.
SSLCACertificateFile /etc/ssl/finite-soft.com/FSS.CA.chain.pem SSLOptions +StdEnvVars SSLVerifyClient require SSLVerifyDepth 10
To achieve a flexible configuration, which does not require change every time we add new virtual hosts to the scheme, we need to somehow use environment variables in the URLs we use to redirect our users to. Therefore, we need the following line:
ProxyPassInterpolateEnv On
This basically tells Apache to make the environment variables available for use in the ProxyPass and ProxyPassReverse declarations.
Our next line in the configuration allows the backend Virtual Hosts to be accessed via HTTPS:
SSLProxyEngine on
This is not required and if you do not have specific reasons to run all your connections between the gateway server and your backend virtual hosts over SSL, you should turn this off for performance reasons.
Next in line is our actual proxy configuration. I chose mod_rewrite, as it offers most flexibility and built-in string functions, which I use to modify some of the data, carried by the end customers SSL certificates.
RewriteMap lowercase int:tolower RewriteCond "%{SSL:SSL_CLIENT_VERIFY}" "SUCCESS" RewriteCond "${lowercase:%{SSL:SSL_CLIENT_S_DN_OU}}" "^(.+)$" RewriteRule "^/(.*)" "https://%1.finite-soft.com/$1" [P,E=VHOST:%1]
The first line registers the internal function tolower and makes it available under the name lowercase to subsequent configuration lines.
The next line checks if the SSL client verification has succeeded, i.e. the web user has identified himself using a valid SSL certificate, signed by the above-mentioned CA. If so, the processing goes on and we capture the information about the user’s organization unit (SSL_CLIENT_S_DN_OU), turn it into lower case characters and store that in the reference %1.
The actual RewriteRule then does two things: first, it redirects all requests to another subdomain, which contains the customer’s organization unit as prefix and second, it stores this prefix in the environment variable VHOST. The former is achieved by proxying all requests to the newly constructed URL, hence the P flag in the RewriteRule. The latter is required for our last piece of the configuration puzzle – the reverse proxy.
ProxyPassReverse "/" "https://${VHOST}.finite-soft.com/" interpolate
This line tells Apache to modify all Location Headers sent by the application, which runs on the backend virtual host, replacing them with the URL of the gateway server. This way the users are always redirected to the gateway server and will not be routed directly to the backend host. To make this command work, you need both the interpolate keyword and the ProxyPassInterpolateEnv command we discussed earlier.
Advantages of this Solution
So, let’s look at what we’ve got. Main advantages of the described solution include:
- Scalability – this solution offers basically endless horizontal scalability, as you can add more backend hosts at any time, without even changing the configuration. All you need is to point the DNS records of the respective virtual host to the new machine and voilà – you have added more computing power;
- High level of customization – you can put many virtual hosts on the same hardware machine or use different machines. You can have dedicated database clusters on yet another type of infrastructure or choose to store your databases with the backend host – virtually infinite possibilities;
- End-user transparency – the users of your web application never get a clue of what is happening in the background. Everyone uses the externally visible URL to access your application, which never changes throughout their experience. In the background, however, everyone works with their individually configured virtual host;
- Security – the backend virtual hosts are never accessed directly from the Internet. You can define firewall rules to allow access only from the gateway server or a set of gateway servers, should you choose to introduce backup gateways;
- Backend servers require no WAN IPs – you can host thousands of virtual hosts and let them communicate with the gateway through a private network. Only the gateway server requires a real IP on the Internet;
- No public names for the backend servers – DNS entries for the actual application virtual hosts are required only locally. You can have a local DNS that covers this part of your network and use it when you need to add a new physical machine or relocate Virtual Hosts.
Further Improvements
Looking at the deployment illustration above, it is easy to see, that the gateway server presents a single point of failure. If this piece of the puzzle fails, none of your backend hosts will be accessible, although they are up and running just fine. To remedy this situation, you can introduce one or more backup gateways. Then you monitor each gateway’s health and reroute the traffic if one of them dies. This scheme can easily be upgraded to balance the load between the gateways, if needed.