The Lambda@Edge Log Retention Nightmare: A Tale of CloudFormation Frustration

When Lambda@Edge log groups refuse to exist during deployment. A journey through AWS documentation, failed attempts, and the realization that some problems don't have elegant solutions.

The Lambda@Edge Log Retention Nightmare: A Tale of CloudFormation Frustration

๐ŸŽญ The Setup

You've just built an amazing Lambda@Edge function. It's deployed globally, handling requests at the edge with lightning speed. You're feeling proud - until you get the AWS bill and realize those edge logs are costing you a fortune because they never expire (and you're logging way more than you should).

Here's the real nightmare: You're not dealing with one log group in one region. You're hunting down dozens of log groups scattered across every AWS region where your edge function executes, each with the same us-east-1 prefix but stored in different regional CloudWatch instances. And none of them have retention policies.

"No problem," you think. "I'll just set a log retention policy in my CloudFormation template."

cr.AwsCustomResource(
    self, "EdgeLogRetention",
    on_update={
        "service": "CloudWatchLogs",
        "action": "putRetentionPolicy",
        "parameters": {
            "logGroupName": "/aws/lambda/us-east-1.my-function",
            "retentionInDays": 7
        }
    }
)

Famous last words.

๐Ÿšจ The Problem: "The specified log group does not exist"

Your deployment starts, and then you see it:

EdgeLogRetentioneucentral180637808 - FAILED
Received response status [FAILED] from custom resource. 
Message returned: The specified log group does not exist.

Wait, what? How can the log group not exist? I'm deploying a Lambda function!

๐Ÿ” The Investigation: Understanding Lambda@Edge Logging

After hours of digging through AWS documentation and Stack Overflow posts, you discover the truth:

Lambda@Edge creates TWO types of log groups:

  1. Normal log group (/aws/lambda/your-function) - Created during deployment in us-east-1, but remains empty
  2. Edge log groups (/aws/lambda/us-east-1.your-function) - Created on-demand when function executes at edge locations

Here's what trips everyone up: The normal log group will never show any logs or metrics in the AWS console because your Lambda@Edge function never actually runs in us-east-1 - it only runs at edge locations!

Think of the us-east-1 function as a template that gets synced to edge locations. AWS uses this template to deploy copies of your function to edge regions worldwide, but the template itself never handles real traffic.

You'll only see activity in the normal log group if you run local tests or invoke the function directly in us-east-1. But for real edge traffic, it's a ghost town.

The edge log groups are the problem children. They're not created during deployment. Not when you create the function. Only when the function is actually invoked at an edge location.

This means:

  • Your CloudFormation template runs
  • Lambda@Edge function gets deployed to edge locations
  • Custom resource tries to set log retention
  • FAILS because log groups don't exist yet
  • Log groups get created later when someone actually uses your function

๐Ÿคฏ The Mind-Bending Architecture

Here's where it gets even more confusing:

# Log group naming pattern
/aws/lambda/us-east-1.{function-name}

# Examples
/aws/lambda/us-east-1.my-edge-function

# BUT THEY'RE STORED IN THEIR RESPECTIVE REGIONS!
arn:aws:logs:eu-central-1:123456789:log-group:/aws/lambda/us-east-1.my-edge-function
arn:aws:logs:ap-southeast-1:123456789:log-group:/aws/lambda/us-east-1.my-edge-function
arn:aws:logs:us-west-2:123456789:log-group:/aws/lambda/us-east-1.my-edge-function

Yes, you read that right. A log group named /aws/lambda/us-east-1.my-function is physically located in eu-central-1.

Here's the key insight: Lambda@Edge functions must be created in us-east-1, and all log groups use the us-east-1 prefix, but when they execute at edge locations, the logs are stored in the region where that edge location is located. The log group naming convention always uses us-east-1 regardless of where the logs are actually stored.

๐Ÿ’ก The Failed Attempts

So now that we know the pattern, let's try to set the retention policy in CloudFormation.

Result: "The specified log group does not exist"

Other Creative Attempts We Considered

We brainstormed numerous other approaches:

  • Pre-creating log groups with custom resources (risky - could break Lambda@Edge logging)
  • Adding CloudFormation delays and wait conditions (doesn't help - only user traffic creates logs)
  • Using SSM parameters to track log group creation state (clever, but still timing-dependent)
  • Trying custom resources in all regions simultaneously (impressive, but still fails everywhere)
  • Using nested stacks to create deployment delays (just makes the failure more complex)

The Pattern: Every creative approach fails because they all assume the log groups exist during deployment. But Lambda@Edge only creates logs when real users actually invoke the function at edge locations.

๐ŸŒ The Geographic Timing Challenge

Here's another layer of complexity: When do edge log groups actually appear?

If you're a US-based company:

  • ๐Ÿ‡บ๐Ÿ‡ธ US regions: Log groups appear within hours (your primary user base)
  • ๐Ÿ‡ฌ๐Ÿ‡ง European regions: Might take days/weeks before European users trigger the function
  • ๐Ÿ‡ฆ๐Ÿ‡บ Asia-Pacific regions: Could be months before you get Australian traffic
  • ๐ŸŒ Other regions: Some edge locations might never create logs until you have truly global users

If you're a UK-based company:

  • ๐Ÿ‡ฌ๐Ÿ‡ง European regions: Log groups appear quickly (your primary user base)
  • ๐Ÿ‡บ๐Ÿ‡ธ US regions: Might take days before American users discover your service
  • ๐Ÿ‡ฆ๐Ÿ‡บ Asia-Pacific regions: Could be months before Asia-Pacific traffic appears

The key insight: Your edge log groups appear based on where your actual users are located, not where your company is based. This makes timing unpredictable and requires ongoing monitoring.

But there's another factor: Lambda@Edge functions only deploy to regions that are enabled in your AWS account.

The complete formula: Edge log groups appear only when BOTH conditions are met:

  1. โœ… Region is enabled in your AWS account
  2. โœ… Users actually invoke the function in that region

This makes the geographic timing even more unpredictable - you might have users in a region but never see logs there because that region isn't enabled!

๐ŸŽญ The Realization

You finally understand: You're trying to set retention on something that doesn't exist yet.

It's like trying to put a roof on a house before the foundation is poured. Lambda@Edge log groups are created when the function is first invoked at that edge location, which could be minutes, hours, or days after deployment.

๐Ÿค” The Solutions We Considered

Solution 1: EventBridge + Lambda (The "Proper" Way)

EventPattern:
  source: ["aws.logs"]
  detail-type: ["AWS API Call via CloudTrail"]
  detail:
    eventSource: ["logs.amazonaws.com"]
    eventName: ["CreateLogGroup"]

Pros: Real-time, automatic, handles all edge regions Cons: Requires CloudTrail, another Lambda to maintain, complex setup

Solution 2: Periodic Lambda Function (The "Good Enough" Way)

def lambda_handler(event, context):
    # Scan for edge log groups every week (or month)
    # Apply retention if needed

Pros: Simple, no CloudTrail dependency Cons: Not real-time, runs even when not needed

Solution 3: Manual Script Execution (The "Flawed" Way)

# create a custom script to set retention on edge log groups
# run it manually after deployment
python edge_log_retention.py

# ๐Ÿšซ This doesn't work for Edge logs!
# ๐Ÿšซ You never know when edge log groups will be created

Pros: Gives you control Cons: Fundamentally flawed - Edge log groups appear unpredictably based on user traffic, not on your schedule

Solution 4: Boto3 Script (The "Practical" Way)

Here's a Python script that hunts down Lambda@Edge log groups across all regions and sets retention policies:

#!/usr/bin/env python3
import boto3
from botocore.exceptions import ClientError
import os

profile_name = os.getenv('AWS_PROFILE') # ๐Ÿ‘ˆ useful for local testing
session = boto3.Session(region_name='us-east-1', profile_name=profile_name)
ec2 = session.client('ec2')

def set_edge_log_retention(retention_days=7, dry_run=True):
    """
    Find Lambda@Edge log groups across all regions and set retention policies.
    
    Args:
        retention_days (int): Number of days to retain logs
        dry_run (bool): If True, only show what would be changed
    """
    # Get all AWS regions
    regions = [region['RegionName'] for region in ec2.describe_regions()['Regions']]
    
    edge_log_groups = []
    total_changed = 0
    
    print(f"๐Ÿ” Hunting for Lambda@Edge log groups across {len(regions)} regions...")
    print(f"๐ŸŽฏ Target retention: {retention_days} days")
    print(f"๐Ÿงช Dry run: {dry_run}")
    print("=" * 60)
    
    for region in regions:
        try:
            logs = session.client('logs', region_name=region)
            
            # Find log groups with us-east-1 prefix (indicating Edge functions)
            paginator = logs.get_paginator('describe_log_groups')
            for page in paginator.paginate():
                for log_group in page.get('logGroups', []):
                    log_group_name = log_group['logGroupName']
                    
                    # Check if it's a Lambda@Edge log group
                    if '/aws/lambda/us-east-1.' in log_group_name:
                        current_retention = log_group.get('retentionInDays')
                        
                        edge_log_groups.append({
                            'region': region,
                            'name': log_group_name,
                            'current_retention': current_retention,
                            'stored_bytes': log_group.get('storedBytes', 0)
                        })
                        
                        # Set retention if needed
                        if current_retention != retention_days:
                            if dry_run:
                                print(f"๐Ÿ“ {region}: Would set {log_group_name} to {retention_days} days (current: {current_retention})")
                            else:
                                try:
                                    logs.put_retention_policy(
                                        logGroupName=log_group_name,
                                        retentionInDays=retention_days
                                    )
                                    print(f"โœ… {region}: Set {log_group_name} to {retention_days} days")
                                    total_changed += 1
                                except ClientError as e:
                                    print(f"โŒ {region}: Failed to set {log_group_name} - {e}")
                        else:
                            print(f"โœ“ {region}: {log_group_name} already has {retention_days} days retention")
                            
        except ClientError as e:
            # Skip regions where CloudWatch Logs isn't available
            continue
    
    print("=" * 60)
    print(f"๐Ÿ“Š Summary:")
    print(f"   Found {len(edge_log_groups)} Lambda@Edge log groups")
    print(f"   Total storage: {sum(g['stored_bytes'] for g in edge_log_groups) / (1024**3):.2f} GB")
    if not dry_run:
        print(f"   Changed {total_changed} log groups")
    
    return edge_log_groups

if __name__ == "__main__":
    # Dry run first to see what would be changed
    edge_logs = set_edge_log_retention(retention_days=7, dry_run=True)
    
    # Uncomment the line below to actually make changes
    # set_edge_log_retention(retention_days=7, dry_run=False)

Usage:

# Install requirements
pip install boto3

# Dry run (safe - shows what would be changed)
python edge_log_retention.py

# Actually apply changes (uncomment the last line in the script)
python edge_log_retention.py

Pros:

  • Finds all edge logs automatically across all regions
  • Safe dry-run mode
  • Handles IAM permissions properly
  • Shows storage usage and summary

Cons:

  • Requires manual execution
  • Needs IAM permissions for all regions
  • Not automated during deployment

Solution 5: (The Winner) EventBridge Scheduled Automation ๐ŸŽ‰

Instead of manual execution, create an EventBridge scheduled rule during deployment:

# First, create the Lambda function
edge_log_retention_lambda = _lambda.Function(
    self, "EdgeLogRetentionFunction",
    runtime=_lambda.Runtime.PYTHON_3_12,
    handler="app.lambda_handler",
    code=_lambda.Code.from_asset("lambda/edge_log_retention"),
    timeout=Duration.minutes(15)   
)

# Then create the EventBridge rule
events.Rule(
    self, "EdgeLogRetentionScheduler",
    schedule=events.Schedule.cron(minute="0", hour="2", day_of_week="1"),  # Weekly on Monday 2AM
    targets=[targets.LambdaFunction(edge_log_retention_lambda, 
        event=events.RuleTargetInput.from_object({
            "detail": {
                "days": 7,
                "dry_run": False
            }
        })
    )]
)

The Lambda function

#app.py
from edge_log_retention import set_edge_log_retention
from aws_lambda_powertools import Logger
import json

logger = Logger(service="LambdaEdgeLogRetentionManager")

def lambda_handler(event, context):
    """
    Lambda handler for EventBridge scheduled log retention management
    """
    # Extract parameters from EventBridge detail section

    try:
        logger.info(f"Event: {event}")
        
        detail = event.get('detail', {})
        days = int(detail.get('days', 7))
        dry_run = str(detail.get('dry_run', True)).lower() == 'true'

        set_edge_log_retention(retention_days=days, dry_run=dry_run)
    except Exception as e:

        logger.error(f"Error: {e}, event: {event}, detail: {detail}")


    return {
        'statusCode': 200,
        'body': json.dumps('Log retention management completed successfully')
    }
        

This approach actually works because:

  • โœ… Log groups exist by the time the script runs (for regions that have been used)
  • โœ… Handles all regions automatically when they appear
  • โœ… Safe dry-run mode for testing
  • โœ… Automatically scheduled - no manual intervention needed
  • โœ… Created during deployment - part of your infrastructure as code
  • โœ… Runs weekly/monthly - ensures timely configuration as new edge locations appear
  • โœ… Provides detailed reporting and feedback
  • โœ… Handles the gradual rollout of edge locations as users discover your service

๐Ÿคทโ€โ™‚๏ธ The Frustration

Here's what frustrates us most:

  1. Buried Documentation: AWS does document that Edge log groups are created on-demand, but it's not prominently highlighted as a deployment limitation
  2. Misleading Error Messages: "Log group does not exist" doesn't explain WHY
  3. Counterintuitive Architecture: Why do log groups use us-east-1 prefix but store in different regions?
  4. No Built-in Solution: AWS should provide a way to set default retention for Edge logs during deployment
  5. The Learning Curve: You have to understand the timing, geographic, and regional complexity before you can even start solving the problem

๐Ÿ“ฃ The Call for Help

We're throwing this out to the community:

Has anyone solved this elegantly? Are we missing something obvious? Is there a magical CloudFormation feature we don't know about?

What We Tried:

  • โœ… Custom resources with proper IAM permissions
  • โœ… Correct service names (CloudWatchLogs not Logs)
  • โœ… Right region targeting (logs are stored in edge regions, not us-east-1)
  • โœ… Multiple deployment approaches

What We Need:

  • ๐Ÿค” A way to set retention policies on edge log groups during deployment
  • ๐Ÿค” Or a clean post-deployment solution that doesn't require manual intervention
  • ๐Ÿค” Or confirmation that this is impossible and we should stop trying

๐ŸŽฏ The Current State

For now, our Lambda@Edge functions deploy successfully with a warning:

Edge log retention configuration disabled - log groups are created on-demand.
Desired retention: 7 days. See TODO for implementation approach.

We'll monitor the costs and implement one of the solutions if it becomes a real problem. But for now, we're choosing shipping features over perfect logging architecture.

๐Ÿ”ฎ The Future

Hopefully, AWS will:

  1. Document this limitation clearly
  2. Provide a built-in solution for edge log retention
  3. Fix the confusing architecture (or at least explain it better)

Until then, we'll keep an eye on our logging costs and hope someone in the community has a better solution.


Have you solved this? Drop your solutions in the comments! Let's save the next developer from this nightmare.

Comments

Comments are not available. Feel free to share your feedback on LinkedIn or connect with Geek Cafe.

Geek Cafe LogoGeek Cafe

Your trusted partner for cloud architecture, development, and technical solutions. Let's build something amazing together.

Quick Links

ยฉ 2025 Geek Cafe LLC. All rights reserved.

Research Triangle Park, North Carolina

Version: 8.11.0