As a young engineer I used to build web applications that were hosted on a single server, and this is probably how most of us get started. During my career I have worked for different companies and I have witnessed applications in different scalability evolution stages. Before we go deeper into scalability, I would like to present some of these evolution stages to better explain how you go from a single server sitting under your desk to thousands of servers spread all over the world. I will keep it at a very high level here, as I will go into more detail in later chapters. Discussing evolution stages will also allow me to introduce different concepts and gradually move toward more complex topics. Keep in mind that many of the scalability evolution stages presented here can only work if you plan for them from the beginning. In most cases, a real-world system would not evolve exactly in this way, as it would likely need to be rewritten a couple of times. Most of the time, a system is designed and born in a particular evolution stage and remains in it for its lifetime, or manages to move up one or two steps on the ladder before reaching its architectural limits.
Single-Server Configuration
Let’s begin with a single-server setup, as it is the simplest configuration possible and this is how many small projects get started. In this scenario, I assume that your entire application runs on a single machine. Figure 1-1 shows how all the traffic for every user request is handled by the same server. Usually, the Domain Name System (DNS) server is used as a paid service provided by the hosting company and is not running on your own server. In this scenario, users connect to the DNS to obtain the Internet Protocol (IP) address of the server where your website is hosted. Once the IP address is obtained, they send Hypertext Transfer Protocol (HTTP) requests directly to your web server. Since your setup consists of only one machine, it needs to perform all the duties necessary to make your application run. It may have a database management system running (like MySQL or Postgres), as well as serving images and dynamic content from within your application.
Figure 1-1 shows the distribution of traffic in a single-server configuration. Clients would first connect to the DNS server to resolve the IP address of your domain, and then they would start requesting multiple resources from your web server. Any web pages, images, Cascading Style Sheet (CSS) files, and videos have to be generated or served by your server, and all of the traffic and processing will have to be handled by your single machine. I use different weights of arrows on the diagram to indicate the proportion of traffic coming to each component. An application like this would be typical of a simple company website with a product catalog, a blog, a forum, or a self-service web application. Small websites may not even need a dedicated server and can often be hosted on a virtual private server (VPS) or on shared hosting
For sites with low traffic, a single-server configuration may be enough to handle the requests made by clients. There are many reasons, though, why this configuration is not going to take you far scalability-wise:
▶ Your user base grows, thereby increasing traffic. Each user creates additional load on the servers, and serving each user consumes more resources, including memory, CPU time, and disk input/output (I/O).
▶ Your database grows as you continue to add more data. As this happens, your database queries begin to slow down due to the extra CPU, memory, and I/O requirements.
▶ You extend your system by adding new functionality, which makes user interactions require more system resources.
▶ You experience any combination of these factors.
Making the Server Stronger: Scaling Vertically
There are a number of ways to scale vertically:
▶ Adding more I/O capacity by adding more hard drives in Redundant Array of Independent Disks (RAID) arrays. I/O throughput and disk saturation are the main bottlenecks in database servers. Adding more drives and setting up a RAID array can help to distribute reads and writes across more devices. In recent years, RAID 10 has become especially popular, as it gives both redundancy and increased throughput. From an application perspective, a RAID array looks like a single volume, but underneath it is a collection of drives sharing the reads and writes.
▶ Improving I/O access times by switching to solid-state drives (SSDs). Solid-state drives are becoming more and more popular as the technology matures and prices continue to fall. Random reads and writes using SSDs are between 10 and 100 times faster, depending on benchmark methodology. By replacing disks you can decrease I/O wait times in your application. Unfortunately, sequential reads and writes are not much faster and you will not see such a massive performance increase in real-world applications. In fact, most open-source databases (like MySQL) optimize data structures and algorithms to allow more sequential disk operations rather than depending on random access I/O. Some data stores, such as Cassandra, go even further, using solely sequential I/O for all writes and most reads, making SSD even less attractive.
▶ Reducing I/O operations by increasing RAM. (Even 128GB RAM is affordable nowadays if you are hosting your application on your own dedicated hardware.) Adding more memory means more space for the file system cache and more working memory for the applications. Memory size is especially important for efficiency of database servers.
▶ Improving network throughput by upgrading network interfaces or installing additional ones. If your server is streaming a lot of video/media content, you may need to upgrade your network provider’s connection or even upgrade your network adapters to allow greater throughput.
▶ Switching to servers with more processors or more virtual cores. Servers with 12 and even 24 threads (virtual cores) are affordable enough to be a reasonable scaling option. The more CPUs and virtual cores, the more processes that can be executing at the same time. Your system becomes faster, not only because processes do not have to share the CPU, but also because the operating system will have to perform fewer context switches to execute multiple processes on the same core.
Figure 1-2 shows the approximate relationship of price per capacity unit and the total capacity needed. It shows that you can scale up relatively cheaply first, but beyond a certain point, adding more capacity becomes extremely expensive. For example, getting 128GB of RAM (as of this writing) could cost you $3,000, but doubling that to 256GB could cost you $18,000, which is much more than double the 128GB price.
Isolation of Services
Vertical scalability is not the only option at this early stage of evolution. Another simple solution is moving different parts of the system to separate physical servers by installing each type of service on a separate physical machine. In this context, a service is an application like a web server (for example, Apache) or a database engine (for example, MySQL). This gives your web server and your database a separate, dedicated machine. In the same manner, you can deploy other services like File Transfer Protocol (FTP), DNS, cache, and others, each on a dedicated physical machine.
Cache is a server/service focused on reducing the latency and resources needed to generate the result by serving previously generated content. Caching is a very important technique for scalability.
Content Delivery Network: Scalability for Static Content
As applications grow and get more customers, it becomes beneficial to offload some of the traffic to a third-party content delivery network (CDN) service.
A content delivery network is a hosted service that takes care of global
distribution of static files like images, JavaScript, CSS, and videos.
It works as an HTTP proxy. Clients that need to download images,
JavaScript, CSS, or videos connect to one of the servers owned by the
CDN provider instead of your servers. If the CDN server does not have
the requested content yet, it asks your server for it and caches it from
then on. Once the file is cached by the CDN, subsequent clients are
served without contacting your servers at all.
Distributing the Traffic: Horizontal Scalability
All of the evolution stages discussed so far were rather simple modifications
to the single-server configuration. Horizontal scalability, on the other hand, is
much harder to achieve and in most cases it has to be considered before the
application is built. In some rare cases, it can be “added” later on by modifying
the architecture of the application, but it usually requires significant development
effort. I will describe different horizontal scalability techniques throughout this
book, but for now, let’s think of it as running each component on multiple servers
and being able to add more servers whenever necessary. Systems that are truly
horizontally scalable do not need strong servers—quite the opposite; they usually
run on lots and lots of cheap “commodity” servers rather than a few powerful
machines.
Scalability for a Global Audience
The largest of websites reach the final evolution stage, which is scalability for a
global audience. Once you serve millions of users spread across the globe, you
will require more than a single data center. A single data center can host plenty
of servers, but it causes clients located on other continents to receive a degraded
user experience.
GeoDNS is a DNS service that allows domain names to be resolved
to IP addresses based on the location of the customer. Regular DNS
servers receive a domain name, like yahoo.com, and resolve it to an IP
address, like 206.190.36.45. GeoDNS behaves the same way from the
client’s perspective.
Edge cache is a HTTP cache server located near the customer, allowing
the customer to partially cache the HTTP traffic. Requests from the
customer’s browser go to the edge-cache server. The server can then
decide to serve the page from the cache, or it can decide to assemble
the missing pieces of the page by sending background requests to
your web servers. It can also decide that the page is uncacheable and
delegate fully to your web servers. Edge-cache servers can serve entire
pages or cache fragments of HTTP responses.
A load balancer is a software or hardware component that distributes
traffic coming to a single IP address over multiple servers, which are
hidden behind the load balancer. Load balancers are used to share the
load evenly among multiple servers and to allow dynamic addition and
removal of machines. Since clients can only see the load balancer, web
servers can be added at any time without service disruption.
The second layer of our stack is the web application layer. It consists of web
application servers (4) responsible for generating the actual HTML of our web
application and handling clients’ HTTP requests. These machines would often
use a lightweight (PHP, Java, Ruby, Groovy, etc.) web framework with a minimal
amount of business logic, since the main responsibility of these servers is to
render the user interface. All the web application layer is supposed to do is handle
the user interactions and translate them to internal web services calls. The simpler
and “dumber” the web application layer, the better. By pushing most of your
business logic to web services, you allow more reuse and reduce the number of
changes needed, since the presentation layer is the one that changes most often.
The third layer of our stack consists of web services (7). It is a critical layer, as
it contains most of our application logic. We keep front-end servers simple
and free of business logic since we want to decouple the presentation layer
from the business logic. By creating web services, we also make it easier to
create functional partitions. We can create web services specializing in certain
functionality and scale them independently. For example, in an e-commerce web
application, you could have a product catalog service and a user profile service,
each providing very different types of functionality and each having very different
scalability needs.
Both front-end servers (4) and web services (7) should be stateless, web
applications often deploy additional components, such as object caches (5) and
message queues (6).
Object cache servers are used by both front-end application servers and web
services to reduce the load put on the data stores and speed up responses by storing
partially precomputed results.
Data persistence layer (8) and (9). This is usually the most
difficult layer to scale horizontally, so we’ll spend a lot of time discussing different
scaling strategies and horizontal scalability options in that layer. This is also an
area of rapid development of new technologies labeled as big data and NoSQL,
The data layer has become increasingly more exciting in the past ten years, and
the days of a single monolithic SQL database are gone. As Martin Fowler says, it
is an era of polyglot persistence, where multiple data stores are used by the same
company to leverage their unique benefits and to allow better scalability.
Overview of the Application Architecture
So far, we’ve looked at the infrastructure and scalability evolution stages. Let’s
now take a high-level look at the application itself.
The application architecture should not revolve around a framework or any
particular technology. Architecture is not about Java, PHP, PostgreSQL, or
even database schema. Architecture should evolve around the business model.
There are some great books written on domain-driven design and software
architecture1–3 that can help you get familiar with best practices of software
design. To follow these best practices, we put business logic in the center of our
architecture. It is the business requirements that drive every other decision.
Without the right model and the right business logic, our databases, message
queues, and web frameworks are useless.
Moreover, it is irrelevant if the application is a social networking website, a
pharmaceutical service, or a gambling app—it will always have some business
needs and a domain model. By putting that model in the center of our
architecture, we make sure that other components surrounding it serve the
business, not the other way around. By placing technology first, we may get a
great Rails application, but it may not be a great pharmaceutical application.t1
A domain model is created to represent the core functionality of the
application in the words of business people, not technical people. The
domain model explains key terms, actors, and operations, without
caring about technical implementation. The domain model of an
automated teller machine (ATM) would mention things like cash,
account, debit, credit, authentication, security policies, etc. At the same
time, the domain model would be oblivious to hardware and software
implementation of the problem. The domain model is a tool to create our
mental picture of the business problems that our application is supposed
to solve
The front end should have a single responsibility of becoming the user
interface. The user can be interacting with the application via web pages, mobile
applications, or web service calls. No matter what the actual delivery mechanism
is, the front-end application should be the layer translating between the public
interface and internal service calls. The front end should be considered as “skin,”
or a plugin of the application, and as something used to present the functionality
of the system to customers. It should not be considered a heart or the center of
the system
Front-end code will be closely coupled to templates and the web framework of
our choice (for example, Spring, Rails, Symfony). It will be constrained by the user
interface, user experience requirements, and the web technologies used. Frontend applications will have to be developed in a way that will allow communication
over HTTP, including AJAX and web sessions. By hiding that within the front-end
layer, we can keep our services layer simpler and focused solely on the business
logic, not on the presentation and web-specific technologies.
Templating, web flows, and AJAX are all specific problems. Keeping them
separated from your main business logic allows for fast and independent changes.
Having the front end developed as a separate application within our system gives
us another advantage: we can use a different technology stack to develop it. It is
not unreasonable to use one technology to develop web services and a different
one to develop the front-end application.
As an example, you could develop the
front end using Groovy, PHP, or Ruby, and web services could be developed in
pure Java.
Web services are where most of the processing has to happen, and also the place
where most of the business logic should live.