Scaling AWS Infrastructure: Auto Scaling & Other Services

Scaling AWS infrastructure is essential to accommodate increasing environment demands. It helps maintain high service reliability and optimizes AWS resource utilization. Even better, the process isn’t as complex as most people think.

AWS has excellent service capabilities to enable high scalability among infrastructure components and cloud-hosted applications. We will explore Amazon tools and features that are perfect for scaling the existing AWS infrastructure.

In this article, we will cover:

First step: Analyzing the AWS infrastructure

Before scaling the infrastructure horizontally or vertically, we first need to analyze it. This helps us identify bottlenecks or performance issues that can limit the effectiveness of scaling efforts. We can also identify underutilized resources for optimization and evaluate the infrastructure’s scalability requirements.

Monitoring the AWS infrastructure is essential for gathering key insights and metrics. It helps optimize performance, identify potential issues, and ensure the infrastructure’s security and compliance. This information enables data-driven decisions to enhance resource utilization and reduce costs.

AWS CloudWatch is the best service for monitoring and scaling AWS infrastructure. We can also use various third-party tools including Splunk, NewRelic, and Datadog to monitor AWS infrastructure. Each of these tools has different capabilities and is suitable for different types of AWS infrastructure.

Identify bottlenecks

Infrastructure bottlenecks are events that cause various service disruptions, such as:

Network congestion due to insufficient bandwidth
Loss of user engagement and data
Delays in infrastructure resource deployments
Physical failure of the servers, routers, databases, apps, etc.
Outdated hardware or software components

Bottlenecks can occur at any point in the infrastructure, resulting in slower processing time and reduced productivity. Addressing infrastructure bottlenecks is crucial to reducing system downtime.

Tools like AWS Trusted Advisor, New Relic, Datadog, or CloudWatch help identify the source of the bottlenecks and implement best practices such as hardware upgrades, software optimization, additional capacity provisioning, etc.

Leverage AWS CloudWatch

AWS CloudWatch is a powerful AWS monitoring and logging service for collecting and analyzing data on infrastructure resource utilization, software performance, and network traffic. Use the methods below to leverage AWS CloudWatch to scale any AWS infrastructure.

Monitoring logs: Use CloudWatch’s real-time monitoring and alerting capabilities to fetch monitor logs and resource utilization reports from security metrics. Refer to the logs to identify potential issues that might impact resource availability and performance.
Key metrics: AWS CloudWatch has a centralized dashboard to view all existing AWS resources and applications. Define key performance indicators (KPIs) and other metrics that are relevant to the service. Monitor the thresholds of AWS resources for scaling and create scaling policies accordingly with the help of the defined metrics.
CloudWatch alarms: Use the AWS Management Console or CloudWatch API to create and configure alarms in CloudWatch that notify us whenever metrics reach predefined threshold levels.
Infrastructure scaling policies: Create or define separate infrastructure scaling policies to define the precautionary actions that need to be taken when an alarm is triggered. To configure the policies, we can use the AWS Auto Scaling service or the AWS EC2 Auto Scaling service.

Scaling AWS web servers

Scaling AWS web servers ensures proper handling of web traffic with improved application performance and availability. It distributes the workload to handle an increasing number of incoming requests without affecting the availability of websites and web apps.

AWS vertical scaling (scaling up/down)

Vertical scaling in AWS involves increasing or decreasing the capacity of a single resource, such as upgrading an instance to a more powerful one or reducing its capacity.

Scaling up: Increasing the power of an individual resource. For example:
- Moving from a t3.medium EC2 instance to a m5.large instance with more CPU and memory.
- Adding more storage to an EBS volume
- Using a more powerful RDS database instance
Scaling down: Decreasing the power of a resource when demand decreases – for example:
- Downgrading an EC2 instance from m5.large to t3.small
- Reducing the allocated storage or instance type for an RDS database

AWS horizontal scaling (scaling out/in)

Horizontal scaling of AWS infrastructure involves adding or removing multiple resources to distribute the load across a larger system. It’s typically implemented by adding or removing instances or servers.

Scaling out: Adding more instances to handle the increased load – for example:
- Adding more EC2 instances to an Auto Scaling Group
- Adding more nodes to an RDS Aurora cluster
- Increasing the number of containers in a Kubernetes cluster
Scaling in: Removing instances or nodes when demand decreases – for example:
- Reducing the number of EC2 instances in an Auto Scaling Group
- Removing nodes from a distributed system like Amazon ElastiCache

Horizontal vs. vertical scaling in AWS

Let’s see how these broad ways of scaling web servers compare:

	Vertical scaling	Horizontal scaling
Definition	Increasing/decreasing the size or capacity of a single resource	Adding/removing multiple resources to distribute the workload
Key actions	Upgrade or downgrade instance type or size	Add or remove instances or servers
Scalability limit	Limited by the maximum capacity of the resource	Theoretically unlimited with proper design
Downtime	May require downtime for upgrades	Can be achieved without downtime
Complexity	Simple to implement and manage	More complex; requires distributed systems
Application suitability	Best for monolithic or single-node applications	Ideal for distributed or stateless applications
Fault tolerance	Lower, as it depends on a single resource	Higher, due to distributed architecture
AWS example	Upgrading an EC2 instance type or RDS class	Auto Scaling EC2 instances or adding Aurora replicas

To summarize, vertical scaling enhances the capacity of a single resource to handle increased demand, whereas horizontal scaling expands capacity by adding multiple resources, enabling distributed load management.

Choose the type of scaling based on your AWS infrastructure setup and application requirements. Vertical scaling can be a quick solution for resource-intensive applications with occasional or predictable traffic peaks. However, for handling long-term growth, high availability, and unpredictable traffic patterns, horizontal scaling provides a more robust and scalable approach.

What is AWS Auto Scaling?

AWS Auto Scaling is a service that automatically increases or decreases the number of compute resources, such as EC2 instances, based on the current demand for your application.

For example, if the web server’s memory usage is over 90%, the Amazon EC2 Autoscaling service will dynamically add a new server instance. It will also remove the extra instance once the memory optimization is below the threshold value.

The primary goal of Auto Scaling is to ensure that the application performs well and remains cost-efficient by scaling resources up or down as needed.

It is also possible to schedule Auto Scaling based on certain conditions on web servers. This offers us enough flexibility to scale in or scale out AWS infrastructure components during the scheduled runtimes. The system gets to the normal phase once the schedule is complete.

Read more about deploying the AWS auto-scaling group with Terraform.

Key features of AWS Auto Scaling

Here are the key features of AWS Auto Scaling:

Automatic scaling: Automatically adjusts the capacity of your AWS resources (e.g., EC2 instances, DynamoDB tables, Aurora databases) based on predefined conditions or actual demand.
Dynamic scaling: Adjusts resources in real-time based on changing demand patterns. For example, it can add or remove EC2 instances based on CPU utilization or application metrics.
Predictive scaling: Uses machine learning to predict future traffic patterns and adjusts capacity proactively to handle expected increases or decreases in demand.
Resource scaling: Ensures that you’re not over-provisioning or under-provisioning resources, helping to optimize costs while maintaining performance.
Scalability across multiple services: Supports various AWS resources, such as:
- Amazon EC2 instances
- Spot Fleet instances
- Amazon ECS (Elastic Container Service) tasks
- DynamoDB tables and indexes
- Aurora replicas

Health monitoring: Automatically replaces unhealthy instances or resources to maintain application availability.
Scaling policies: Offers several scaling policies
Target tracking scaling: Automatically adjusts capacity to maintain a target utilization metric (e.g., CPU usage).
Step scaling: Scales resources in steps based on alarms triggered by CloudWatch metrics.
Scheduled scaling: Increases or decreases capacity at specific times.
Cost-optimization: Helps reduce costs by dynamically allocating just the right amount of resources, avoiding both underutilization and overprovisioning.

Auto Scaling use cases

Generally speaking, AWS Auto Scaling makes it easier to build highly available and scalable applications while minimizing operational overhead and costs. It can also be useful in the following scenarios:

Handling unpredictable traffic spikes, such as during flash sales or marketing campaigns.
Scaling applications to meet seasonal demand changes (e.g., holiday shopping).
Maintaining consistent performance for applications with cyclical workloads (e.g., daily or weekly traffic patterns).
Dynamically scaling compute resources to handle large-scale data processing tasks, such as rendering videos or running analytics pipelines, and de-scales once the jobs are completed.
Automatically scaling development or testing instances up during working hours and down during off-hours to minimize costs while maintaining developer availability.

AWS Auto Scaling is more agile, cost-efficient, and automated compared to traditional scaling methods, making it ideal for dynamic and modern application environments. Traditional scaling methods, while still useful in some on-premises scenarios, lack the automation and flexibility needed to handle rapidly changing workloads.

Scaling with Application Load Balancer (ALB)

Application Load Balancer (ALB) is an AWS service that allows us to divide the application load between multiple AWS EC2 instances or Lambda functions. Here are the main types of AWS services that support ALB:

EC2 instances
EKS (Elastic Kubernetes Service)
ECS (Elastic Container Service)

ALB is suited for handling HTTP or HTTPS traffic. It takes only a few minutes to set up the ALB in web servers and balance the traffic load between AWS EC2 instances.

Scaling AWS databases

Amazon Relational Database Service (RDS) is a collection of AWS-managed services that simplify database setup, scaling, and management in the cloud. Amazon supports all the popular relational database management systems and offers excellent scalability features.

Scaling AWS with Amazon RDS Multi-AZ

Amazon RDS Multi-AZ enhances the availability of the Amazon RDS database instances, making them ideal for handling production workloads. Below are some important reasons for using RDS Multi-AZ to scale AWS infrastructure.

Automatic failover: This feature ensures the high availability of AWS databases by performing automatic database failovers within 60 seconds with no manual intervention and zero data loss.
Protect database performance: This feature ensures that I/O activity is not suspended during the ongoing backup of the database standby instance.
Enhanced durability: AWS RDS Multi-AZ synchronous replication can hold the data on standby database instances side-by-side with the primary instance.
Increased availability: It allows us to deploy a standby database instance in another AZ and achieve excellent fault tolerance during instance failure.

The Multi-AZ feature of AWS RDS places a standby database instance in another availability zone to ensure high availability during hardware failures. Enabling this through the RDS dashboard is straightforward.

Learn how to create an AWS RDS Instance using Terraform.

Scaling AWS with RDS Read replicas

Amazon RDS Read Replicas are clone servers of the primary database server with similar features and capabilities. Being a secondary database instance, RDS Read Replicas offer enhanced read performance for Amazon RDS database instances by elastically scaling out the primary instance.

Both primary and secondary database servers are auto-synced in real time to maintain data synchronization. However, it is possible to route web app traffic that only needs to read from the database to the Read Replicas directly, reducing the primary database instance’s workload.

Read replicas are available in Amazon Aurora and AWS RDS for MariaDB, MySQL, Oracle, PostgreSQL, and SQL Server.

💡 You might also like:

Scaling AWS with Aurora

Amazon Aurora offers unparalleled high availability and performance at a global scale with end-to-end PostgreSQL and MySQL compatibility. This relational database combines the capabilities of traditional enterprise databases with its simple yet cost-effective open-source databases. Amazon Aurora is perfect for:

Modernizing the operations of enterprise applications like ERP, CRM, etc.
Supporting reliable and multi-tenant SaaS applications with DB flexibility.
Developing and deploying distributed applications at scale across different regions.
Instantaneous serverless scaling to reduce operational expenses.

Compared to RDS, Amazon Aurora has built-in DR (disaster recovery) and HA (high availability) capabilities. It is easy to migrate from commercial database engines like SQL or Oracle to relational database instances. Aurora is perfect for scaling small to medium workloads in AWS infrastructure.

Scalable event-driven architecture

Event-driven architecture (EDA) is a design pattern in which decoupled components of a system communicate with each other by producing and consuming events. In AWS, this architecture leverages managed services to build scalable, reliable, and loosely coupled systems. It is also one of the strategies used in scaling AWS infrastructure in the following contexts:

It can handle asynchronous communication between AWS services that are distributed across multiple servers and regions.
Its main components are events generated by various sources, such as users, system components, and external components.
It promotes loose service coupling and reduces dependencies across AWS resources without affecting the rest of the system.
It supports high fault tolerance so that AWS resources can communicate asynchronously.

AWS several services: such as Lambda and Simple Queue Service (SQS), enable EDA events to trigger the execution of specific code.

SQS to implement loose coupling

SQS is a fully managed message queuing service that enables us to decouple and scale various microservices, serverless applications, and distributed systems within the AWS infrastructure. We can use SQS to decouple the system components so that they can work and scale independently.

Leverage serverless architecture using Lambda

AWS Lambda is Amazon’s serverless computing service, enabling us to run code without provisioning any physical servers. The Lambda function can automatically scale our AWS-based applications by considering the incoming traffic, so we don’t need to consider capacity planning.

How to scale AWS Lambda

Scaling AWS Lambda happens by default. When you make multiple requests behind the scenes, Lambda automatically creates more instances of your function to handle your load. The scaling experience is seamless, as Lambda adjusts the number of instances based on the number of incoming events.

Lambda functions are very useful for real-time data processing and event-driven applications.

Benefits of scaling AWS infrastructure

Some of the benefits of scaling your AWS infrastructure include:

Improved performance and availability: Distributing workloads across multiple AWS instances or servers ensures all applications remain available.
Cost optimization: Scaling allows us to use AWS resources more efficiently, avoiding overprovisioning of cloud resources. Learn more about AWS Cost Optimization.
Increased flexibility: AWS provides enough capabilities to scale up or scale down the infrastructure to match the demand.
Reduced downtime: Scaling ensures that all infrastructure components are up and running even during unexpected spikes in traffic, reducing the risk of service downtime and outages.
Automatic scaling: AWS offers auto-scaling capabilities so that the infrastructure scales automatically based on predefined policies.
Geographic scalability: AWS allows infrastructure scaling across different regions globally. Hence, we can deploy resources in multiple regions to reduce latency.

Does your organization have extra compliance concerns? Spacelift has you covered with the possibility of self-hosting it in AWS. You can also read about Spacelift integration with AWS, with the new Cloud Integrations section and update to support account-level AWS integrations.

Example: Scaling AWS EC2 instances with Terraform

Terraform IaC enables us to create, change, and scale AWS infrastructure by defining resources with a loop using for_each or count.

Here’s a Terraform code example with which we can create multiple EC2 instances using count.

resource "aws_instance" "my_web_server" {
  count         = 3
  ami           = "ami-0c94855ba95c71c99"
  instance_type = "t2.micro"
  
  tags = {
    Name = "My web server ${count.index + 1}"
  }
}

In the example above, the number of EC2 instances to be created can be declaratively mentioned using the count attribute. Creating an input variable to adjust this number helps update the number of instances dynamically.

Similarly, we can also use for_each construct to create multiple resources with similar configurations dynamically.

The example below implements an input variable “bucket_names” along with a for_each attribute to create multiple S3 buckets.

variable "bucket_names" {
  type = set(string)
  default = [
    "example-bucket-1",
    "example-bucket-2",
    "example-bucket-3"
  ]
}

resource "aws_s3_bucket" "example_buckets" {
  for_each = var.bucket_names
  
  bucket = each.value
  
  tags   = {
    Name = "${each.value} Bucket"
    Environment = "Production"
  }
}

Learn more about best practices when managing Terraform at scale.

Scaling Kubernetes clusters

The kubectl CLI, Kubernetes dashboard, and Kubernetes API are some of the most commonly used tools for scaling Kubernetes clusters to manage varying demands.

Scaling a Kubernetes cluster using these tools requires monitoring to identify changes in demand and act accordingly. Horizontal Pod Autoscaler (HPA) helps automatically perform Kubernetes cluster scaling tasks. However, it is not a continuous process and needs to be scheduled, making it difficult to scale the pods accurately.

Auto scalers worth considering include Cluster Autoscaler and Karpenter.

Cluster Autoscaler

The Kubernetes Cluster Autoscaler is a Kubernetes deployment component that monitors demand, manages pod creation, and provides additional nodes for scaling purposes.

It monitors resource utilization and workload, provisioning additional nodes to spin more pods if required and deprovisioning them when demand subsides. In this way, it adjusts cloud resources automatically for optimum cost benefits.

Karpenter

Karpenter is an open-source serverless auto-scaling solution for Kubernetes. It is designed to work natively with Kubernetes and all the major cloud providers, making it easy for organizations to adopt for running their workloads on Kubernetes.

Its automatic node-scaling capabilities enable organizations to run their workloads cost-effectively, on demand. Karpenter uses Kubernetes API to manage nodes and workloads, making it easy to deploy and use.

Best practices for scaling AWS infrastructure

Leverage Auto Scaling: AWS Auto Scaling allows dynamic adjustments of resources like EC2 instances and ECS tasks based on demand. Use target tracking, step scaling, and predictive scaling policies to handle traffic variations while maintaining performance and cost efficiency.
Design for stateless applications: Stateless applications store state externally in services like DynamoDB, S3, or ElastiCache. Combined with Elastic Load Balancers (ELB), this design enables seamless horizontal scaling by eliminating dependencies between application instances.
Monitor and optimize resource utilization: AWS CloudWatch provides real-time insights into resource usage metrics, such as CPU and memory utilization. Tools like AWS Compute Optimizer and Cost Explorer help identify underutilized resources, ensuring cost-effective scaling.
Adopt purpose-built databases and caching: Use workload-specific databases (e.g., DynamoDB for NoSQL or RDS for relational data) to optimize performance. Integrate Amazon ElastiCache for caching frequently accessed data to reduce database load and latency.
Enable fault tolerance and high availability: Multi-AZ deployments for services like RDS and DynamoDB ensure high availability. Health checks in Auto Scaling groups and load balancers automatically detect and replace failed instances, maintaining application uptime.
Adopt infrastructure as code (IaC): Tools like AWS CloudFormation, OpenTofu and Terraform allow you to define and manage infrastructure configurations as code. This ensures consistent, scalable, and reproducible deployments and simplifies updates and scaling adjustments.

Using Spacelift to build a scalable infrastructure

Spacelift is an infrastructure orchestration platform that increases your infrastructure deployment speed without sacrificing control.

With Spacelift, you can provision, configure, and govern with one or more automated workflows that orchestrate Terraform, OpenTofu, Terragrunt, Pulumi, CloudFormation, Ansible, and Kubernetes.

You don’t need to define all the prerequisite steps for installing and configuring the infrastructure tool you are using, nor the deployment and security steps, as they are all available in the default workflow.

Spacelift offers a unique set of infrastructure orchestration capabilities, such as:

Policies (based on Open Policy Agent) — You can control how many approvals you need for runs, the kind of resources you can create, and the kind of parameters these resources can have, and you can also control the behavior when a pull request is open or merged.
Multi-IaC workflows — Combine Terraform with Kubernetes, Ansible, and other IaC tools such as OpenTofu, Pulumi, and CloudFormation, create dependencies among them, and share outputs
Build self-service infrastructure — You can use Blueprints to build self-service infrastructure; simply complete a form to provision infrastructure based on Terraform and other supported tools.
Integrations with any third-party tools — You can integrate with your favorite third-party tools and even build policies for them. For example, you can Integrate security tools in your workflows using Custom Inputs.
Drift detection and remediation

Spacelift enables you to create private workers inside your infrastructure, which helps you execute Spacelift-related workflows on your end. The documentation provides more information on configuring private workers.

Check out this video where we explore the issues related to managing IaC at scale:

If you want to learn more about what you can do with Spacelift, check out this article, create a free account today, or book a demo with one of our engineers.

Key points

Scaling AWS infrastructure is essential to ensure that web applications, servers, and databases can handle increased traffic and workload demands. By analyzing the infrastructure and leveraging AWS services like Aurora, CloudWatch, and Autoscaling Groups, we can effectively scale our web servers and databases.

Additionally, event-driven architecture and serverless technologies such as Lambda Function and SQS can help us implement loose coupling and improve scalability. We can leverage Terraform’s for_each/count loops to dynamically create or destroy multiple resources. In this post, we also discussed how Karpenter and AWS Cluster Autoscaler automate the scaling of AWS infrastructure.

Solve your infrastructure challenges

Spacelift is a flexible orchestration solution for IaC development. It delivers enhanced collaboration, automation, and controls to simplify and accelerate the provisioning of cloud-based infrastructures.

Learn more