Master the fundamentals of Application Load Balancers, Auto Scaling Groups, and EC2 instances. Learn how to configure health checks, achieve high availability, and avoid those dreaded 500 errors in your production environment.
Building Highly Available AWS Infrastructure: ALB, Auto Scaling, and EC2 - Part 1
You've deployed your web application to AWS. Everything works great... until it doesn't. A sudden traffic spike hits, your single EC2 instance can't handle the load, and your users are greeted with timeout errors and 500s. Sound familiar?
Welcome to the world of high availability architecture, where we need more than just a single server hoping for the best. In this three-part series, we'll explore how to build truly resilient infrastructure on AWS, starting with the foundational trio: Application Load Balancers (ALBs), Auto Scaling Groups, and EC2 instances.
This is Part 1, where we'll cover the fundamentals and get you from zero to a production-ready, highly available setup.
π― The Big Picture: What Are We Building?
Before diving into the details, let's understand what each component does and why you need all three:
- Application Load Balancer (ALB): Your traffic cop, distributing incoming requests across multiple servers
- Auto Scaling Group (ASG): Your capacity manager, automatically adding or removing servers based on demand
- EC2 Instances: Your actual compute resources running your application
Together, these three components create a self-healing, self-scaling infrastructure that can handle traffic spikes, server failures, and everything in between.
π¦ How Does an Application Load Balancer Work?
An Application Load Balancer sits between your users and your application servers. When a request comes in, the ALB:
- Receives the request on port 80 (HTTP) or 443 (HTTPS)
- Checks which targets are healthy in its target group
- Routes the request to one of the healthy targets using its routing algorithm (typically round-robin)
- Returns the response back to the user
Key ALB Concepts
Listeners: These define what ports and protocols your ALB accepts traffic on. Common setup:
- Listener on port 80 β Redirects to HTTPS
- Listener on port 443 β Forwards to your target group
SSL Termination: One of the ALB's most powerful features is SSL/TLS termination. This means the ALB handles the HTTPS encryption/decryption, not your application servers.
Here's what happens:
- Client makes HTTPS request (encrypted) to ALB on port 443
- ALB decrypts the request using the SSL certificate
- ALB forwards the request to your targets over HTTP (unencrypted) on port 8080
- Your application responds with plain HTTP
- ALB encrypts the response and sends it back to the client as HTTPS
Benefits of SSL termination at the ALB:
- Simplified certificate management: Install and renew certificates in one place (the ALB), not on every instance
- Reduced instance CPU load: SSL encryption/decryption is computationally expensive; offload it to the ALB
- Easier instance management: Your application code doesn't need to handle SSL certificates
- Automatic certificate renewal: Use AWS Certificate Manager (ACM) for free SSL certificates with auto-renewal
End-to-End Encryption: If you need encryption all the way to your instances (for compliance or security requirements), you can configure:
- HTTPS listener on the ALB (port 443)
- HTTPS target group (your instances need SSL certificates too)
- ALB re-encrypts traffic before forwarding to instances
This is less common but necessary for highly regulated industries where data must be encrypted in transit at all times.
ALB HTTP Headers: The ALB automatically adds useful headers to requests forwarded to your targets:
X-Forwarded-For: The original client IP address (since requests come from the ALB's IP)X-Forwarded-Proto: The original protocol used by the client (usuallyhttps)X-Forwarded-Port: The original port used by the client (typically443)
These headers let your application know details about the original request, which is especially useful for logging, security checks, and generating correct redirect URLs.
Target Groups: A logical grouping of targets (EC2 instances, containers, Lambda functions) that receive traffic from the ALB.
Health Checks: The ALB regularly pings your targets to ensure they're responding correctly. If a target fails health checks, the ALB stops sending it traffic.
# Example ALB Configuration Concept
ALB:
Listeners:
- Port: 80
Protocol: HTTP
DefaultAction: Redirect to HTTPS
- Port: 443
Protocol: HTTPS
SSLCertificate: arn:aws:acm:us-east-1:123456789:certificate/abc-123
DefaultAction: Forward to TargetGroup
TargetGroup:
Protocol: HTTP # Note: HTTP, not HTTPS! (SSL terminated at ALB)
Port: 5000 # Common port for .NET apps, Node.js, etc.
HealthCheck:
Path: /health
Interval: 30 seconds
Timeout: 5 seconds
HealthyThreshold: 2
UnhealthyThreshold: 3
π How Does an Auto Scaling Group Work?
An Auto Scaling Group (ASG) is like having an operations team that never sleeps. It continuously monitors your application's health and performance, automatically:
- Launching new instances when demand increases
- Terminating instances when demand decreases
- Replacing unhealthy instances with healthy ones
- Maintaining your desired capacity even during infrastructure failures
ASG Core Components
Launch Template: Defines what kind of EC2 instances to create (AMI, instance type, security groups, user data script, etc.)
Scaling Policies: Rules that determine when to scale:
- Target Tracking: "Keep CPU at 70%"
- Step Scaling: "Add 2 instances when CPU > 80%, add 4 when CPU > 90%"
- Scheduled Scaling: "Scale up at 9 AM, scale down at 6 PM"
Capacity Settings:
- Minimum: Never go below this (e.g., 2 for high availability)
- Desired: Current target capacity
- Maximum: Never exceed this (cost protection)
# Conceptual Auto Scaling Configuration
AutoScalingGroup:
MinSize: 2
DesiredCapacity: 2
MaxSize: 10
LaunchTemplate:
ImageId: ami-12345678
InstanceType: t3.medium
SecurityGroups: [web-server-sg]
ScalingPolicies:
- Type: TargetTracking
Metric: CPUUtilization
TargetValue: 70
π» How Does an EC2 Instance Fit In?
EC2 instances are your workhorsesβthe actual virtual machines running your application code. In our high availability setup, each EC2 instance:
- Runs your application (web server, API, etc.)
- Responds to health checks from both the ALB and the ASG
- Reports metrics to CloudWatch (CPU, memory, network, custom metrics)
- Can be automatically replaced if it becomes unhealthy
The Instance Lifecycle
Launch β Running β Healthy β Serving Traffic β (Eventually) Terminated
β
Unhealthy β Removed from ALB β Terminated by ASG β New instance launched
π Connecting It All Together
Here's where the magic happens. Let's walk through how to connect an ALB to an Auto Scaling Group with EC2 instances.
Step 1: Create Your Target Group
The target group is where your ASG will register instances:
# AWS CLI example
aws elbv2 create-target-group \
--name my-web-app-targets \
--protocol HTTP \
--port 8080 \
--vpc-id vpc-12345678 \
--health-check-enabled \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3
Step 2: Create Your ALB
aws elbv2 create-load-balancer \
--name my-web-app-alb \
--subnets subnet-12345 subnet-67890 \
--security-groups sg-12345678 \
--scheme internet-facing \
--type application
Step 3: Create a Listener
aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:... \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=arn:aws:acm:... \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...
Step 4: Create Your Auto Scaling Group
The critical pieceβattach the target group to your ASG:
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name my-web-app-asg \
--launch-template LaunchTemplateName=my-web-app-template \
--min-size 2 \
--max-size 10 \
--desired-capacity 2 \
--target-group-arns arn:aws:elasticloadbalancing:.../targetgroup/my-web-app-targets/... \
--health-check-type ELB \
--health-check-grace-period 300 \
--vpc-zone-identifier "subnet-12345,subnet-67890"
Key point: --target-group-arns connects your ASG to the ALB target group. Any instance launched by the ASG will automatically register with this target group.
β οΈ Instance Protection from Scale-In
When using ECS with EC2 (covered in Part 2), you'll need instance protection:
# CRITICAL: Required for ECS Capacity Providers with managed termination
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name my-asg \
--new-instances-protected-from-scale-in
Why? ECS Capacity Providers need to control instance termination to ensure:
- Tasks are gracefully stopped before instance termination
- No tasks are abruptly killed during scale-in
- Proper draining of connections happens first
Without this, the ASG might terminate instances with running containers, causing service disruption!
π₯ The Health Check Trinity
This is where things get interesting (and where many people get confused). You actually have THREE different health check systems working together:
1. ALB Target Health Checks
Purpose: Determine if a target should receive traffic from the ALB
Configuration:
- Path:
/health(your application's health endpoint) - Interval: 30 seconds
- Timeout: 5 seconds
- Healthy threshold: 2 consecutive successes
- Unhealthy threshold: 3 consecutive failures
Behavior: If an instance fails ALB health checks, it's marked "unhealthy" in the target group and stops receiving traffic, but it's NOT terminated.
2. Auto Scaling Group Health Checks
Purpose: Determine if an instance should be terminated and replaced
Types:
- EC2 Health Check: Is the instance running? (basic AWS infrastructure check)
- ELB Health Check: Is the instance passing ALB health checks?
Configuration:
--health-check-type ELB # Use ALB health status
--health-check-grace-period 300 # Wait 5 minutes after launch before checking
Behavior: If an instance fails ASG health checks, it's TERMINATED and a new one is launched.
3. EC2 Status Checks
Purpose: Monitor the underlying infrastructure and instance health
Types:
- System Status Check: AWS infrastructure (host, network)
- Instance Status Check: Your instance's operating system and configuration
Behavior: If system checks fail, AWS may automatically recover your instance. If instance checks fail, you may need to intervene or let ASG replace it.
π The Complete Health Check Flow
Let's trace what happens when an instance becomes unhealthy:
1. Application starts failing β Returns 500 errors
β
2. ALB health check fails β Instance marked unhealthy in target group
β ALB stops sending new requests to this instance
β
3. ASG detects unhealthy target (if using ELB health check type)
β
4. ASG waits for grace period to expire
β
5. ASG terminates the unhealthy instance
β
6. ASG launches a new instance to maintain desired capacity
β
7. New instance starts, passes health checks, begins receiving traffic
π High Availability Setup: Best Practices
Want to build a truly resilient system? Follow these principles:
1. Multi-AZ Deployment
Why: AWS Availability Zones are isolated data centers. If one goes down, your app stays up.
--vpc-zone-identifier "subnet-in-us-east-1a,subnet-in-us-east-1b,subnet-in-us-east-1c"
Minimum: 2 AZs Recommended: 3 AZs
2. Minimum Instance Count
Never use a minimum of 1. With minimum capacity of 2:
- You can handle an entire AZ failure
- You can perform rolling deployments without downtime
- You have redundancy during instance replacement
3. Load Balancer in Multiple Subnets
Your ALB should also span multiple AZs:
--subnets subnet-in-us-east-1a subnet-in-us-east-1b subnet-in-us-east-1c
4. Proper Health Check Configuration
Health check endpoint must:
- Check critical dependencies (database, cache, external services)
- Return quickly (< 1 second)
- Return 200 OK when healthy, 5xx when unhealthy
# Example health check endpoint in your app
@app.route('/health')
def health_check():
try:
# Check database connection
db.execute('SELECT 1')
# Check cache
cache.ping()
# Check critical external service
if not can_reach_payment_api():
return {'status': 'unhealthy'}, 503
return {'status': 'healthy'}, 200
except Exception as e:
return {'status': 'unhealthy', 'error': str(e)}, 503
β οΈ IMPORTANT - Health Check Best Practice:
Be cautious with database checks in health endpoints! ALB checks every 10-30 seconds. If every health check queries the database, you could overwhelm it:
# Better approach: Cached health checks
import time
last_db_check = {'time': 0, 'status': 'unknown'}
DB_CHECK_INTERVAL = 60 # Only check DB every 60 seconds
@app.route('/health')
def health_check():
current_time = time.time()
# Return cached status if we checked recently
if current_time - last_db_check['time'] < DB_CHECK_INTERVAL:
if last_db_check['status'] == 'healthy':
return {'status': 'healthy', 'cached': True}, 200
return {'status': 'unhealthy', 'cached': True}, 503
# Actually check database
try:
db.execute('SELECT 1')
last_db_check['time'] = current_time
last_db_check['status'] = 'healthy'
return {'status': 'healthy'}, 200
except Exception as e:
last_db_check['time'] = current_time
last_db_check['status'] = 'unhealthy'
return {'status': 'unhealthy', 'error': str(e)}, 503
Why this matters: With 10 instances and 10-second health check intervals, you'd hit the database 60 times per minute just for health checks! Caching reduces this to once per minute while still detecting failures quickly.
5. Connection Draining / Deregistration Delay
When an instance is deregistered, don't immediately kill active connections:
--deregistration-delay-timeout-seconds 30
This gives in-flight requests 30 seconds to complete before the instance is fully removed.
π Monitoring: Know Before Your Users Do
You can't fix what you can't see. Set up comprehensive monitoring:
Key Metrics to Watch
ALB Metrics (CloudWatch):
TargetResponseTime: How fast are your instances responding?HTTPCode_Target_4XX_Count: Client errorsHTTPCode_Target_5XX_Count: Server errors (π¨ alert on this!)HealthyHostCount: How many targets are healthy?UnHealthyHostCount: How many targets are failing?
ASG Metrics:
GroupDesiredCapacity: What the ASG is trying to maintainGroupInServiceInstances: How many instances are actually runningGroupTotalInstances: Total instances (including pending/terminating)
EC2 Metrics:
CPUUtilization: Are you scaling appropriately?NetworkIn/Out: Traffic patternsStatusCheckFailed: Infrastructure problems
Setting Up Alerts
# CloudWatch Alarm: Alert on 5XX errors
aws cloudwatch put-metric-alarm \
--alarm-name high-5xx-errors \
--alarm-description "Alert when 5XX errors exceed threshold" \
--metric-name HTTPCode_Target_5XX_Count \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 60 \
--evaluation-periods 2 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=LoadBalancer,Value=app/my-web-app-alb/...
π‘οΈ Avoiding Unexpected Outages
Common Failure Scenarios and Solutions
Scenario 1: All instances fail health checks simultaneously
- Cause: Bad deployment, database failure, misconfigured health check
- Solution:
- Set up staged deployments
- Use ASG instance refresh with
min_healthy_percentage - Make health checks graceful during startup (grace period)
Scenario 2: Scaling can't keep up with traffic spike
- Cause: Scaling policies too conservative, instances too slow to start
- Solution:
- Use predictive scaling or scheduled scaling for known traffic patterns
- Optimize your AMI and user data script for faster startup
- Consider using warm pools (keep pre-initialized instances ready)
Scenario 3: Instance keeps restarting in a loop
- Cause: Application fails to start, health check runs before app is ready
- Solution:
- Increase
health-check-grace-period(gives instances time to fully start) - Fix application startup issues
- Check CloudWatch Logs and EC2 user data logs
- Increase
π« Avoiding Those Ugly 500 Errors
500 errors are the bane of every operations team. Here's how to minimize them:
1. Implement Graceful Shutdown
When an instance receives a termination signal, it should:
- Stop accepting new requests
- Complete in-flight requests
- Shut down cleanly
# Python example with signal handling
import signal
import sys
def graceful_shutdown(signum, frame):
print("Received termination signal, shutting down gracefully...")
# Stop accepting new requests
server.stop_accepting_connections()
# Wait for active requests to complete
server.wait_for_active_requests(timeout=30)
# Exit
sys.exit(0)
signal.signal(signal.SIGTERM, graceful_shutdown)
2. Use Appropriate Health Check Thresholds
- Too aggressive: Healthy instances get marked unhealthy during brief hiccups
- Too lenient: Unhealthy instances keep serving traffic and causing errors
Sweet spot:
- Interval: 10-30 seconds
- Unhealthy threshold: 2-3 consecutive failures
- Healthy threshold: 2 consecutive successes
3. Implement Circuit Breakers
If a dependency is down, don't keep trying to call it:
from pybreaker import CircuitBreaker
# If external service fails 5 times, stop calling it for 60 seconds
breaker = CircuitBreaker(fail_max=5, timeout_duration=60)
@breaker
def call_external_service():
return requests.get('https://api.example.com/data')
4. Set Proper Timeouts
Don't let slow requests pile up and overwhelm your instances:
# Application timeout configuration
REQUEST_TIMEOUT = 30 # seconds
UPSTREAM_TIMEOUT = 5 # seconds for external calls
# ALB idle timeout (default 60s, adjust based on your needs)
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn arn:aws:elasticloadbalancing:... \
--attributes Key=idle_timeout.timeout_seconds,Value=60
π¬ Putting It All Together: A Real-World Example
Let's say you're building a web API that needs to handle variable traffic throughout the day:
# 1. Create launch template
aws ec2 create-launch-template \
--launch-template-name web-api-template \
--launch-template-data '{
"ImageId": "ami-12345678",
"InstanceType": "t3.medium",
"SecurityGroupIds": ["sg-12345678"],
"UserData": "<base64-encoded-startup-script>",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "web-api-instance"}]
}]
}'
# 2. Create target group
aws elbv2 create-target-group \
--name web-api-targets \
--protocol HTTP \
--port 8080 \
--vpc-id vpc-12345678 \
--health-check-path /health \
--health-check-interval-seconds 30 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 2 \
--deregistration-delay-timeout-seconds 30
# 3. Create ALB
aws elbv2 create-load-balancer \
--name web-api-alb \
--subnets subnet-1a subnet-1b subnet-1c \
--security-groups sg-alb-12345678
# 4. Create listener
aws elbv2 create-listener \
--load-balancer-arn <alb-arn> \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=<acm-cert-arn> \
--default-actions Type=forward,TargetGroupArn=<target-group-arn>
# 5. Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name web-api-asg \
--launch-template LaunchTemplateName=web-api-template \
--min-size 2 \
--max-size 10 \
--desired-capacity 2 \
--target-group-arns <target-group-arn> \
--health-check-type ELB \
--health-check-grace-period 300 \
--vpc-zone-identifier "subnet-1a,subnet-1b,subnet-1c"
# 6. Create scaling policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name web-api-asg \
--policy-name target-tracking-cpu \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0
}'
π― What's Next?
You now have a solid foundation in building highly available infrastructure with ALB, Auto Scaling Groups, and EC2 instances. But we're just getting started!
In Part 2, we'll level up by introducing Amazon ECS with EC2βcontainerizing your applications and discovering the new challenges (and opportunities) that come with managing both container scaling AND instance scaling.
In Part 3, we'll explore AWS Fargateβthe serverless compute engine that eliminates the need to manage EC2 instances entirely, letting you focus purely on your application containers.
In Part 4, we'll tackle graceful failure handlingβbecause even with perfect infrastructure scaling, dependencies like databases can still fail. Learn how to handle failures gracefully and provide great user experiences during outages.
π Quick Reference
Health Check Configuration Checklist
- β
ALB health check path:
/health - β Health check interval: 10-30 seconds
- β Unhealthy threshold: 2-3 failures
- β Healthy threshold: 2 successes
- β
ASG health check type:
ELB - β ASG grace period: 300+ seconds
- β Deregistration delay: 30 seconds
High Availability Checklist
- β Minimum 2 instances
- β Deployed across 3 AZs
- β ALB in multiple subnets
- β Target tracking scaling policy
- β CloudWatch alarms for 5XX errors
- β CloudWatch alarms for unhealthy hosts
- β Graceful shutdown handling
- β Connection draining enabled
Building reliable infrastructure is a journey, not a destination. Start with these foundations, monitor continuously, and iterate based on what you learn. Your future self (and your users) will thank you!
Stay tuned for Part 2, where we'll dive into the world of containers with Amazon ECS!
Comments