Why Isolated Stacks Beat Nested Stacks in CloudFormation

CloudFormation nested stacks seem like the obvious way to organize large infrastructure. You break things into logical units, reference them from a parent stack, and deploy everything together. It looks like good engineering. Clean hierarchy, DRY templates, one deploy command.

Then something goes wrong. A nested stack fails mid-update. The parent stack rolls back, which rolls back every sibling stack, which rolls back resources that had nothing to do with the failure. You're staring at a wall of UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS events trying to figure out which of the 14 nested stacks actually broke.

That's the moment you start questioning nested stacks.

The appeal of nested stacks

The pitch is compelling:

Reusability — define a VPC template once, reference it from multiple parent stacks
Organization — one parent stack represents your entire environment
Single deploy — one aws cloudformation deploy command updates everything
Output passing — nested stack outputs feed into sibling stack parameters naturally

This works for small environments. Two or three nested stacks, a VPC and an app layer, manageable. The problems start when you have five, ten, fifteen nested stacks and real traffic depending on them.

Where nested stacks break down

Coupled blast radius

The biggest issue. When you update a parent stack, CloudFormation evaluates every nested stack, even if only one changed. If any nested stack update fails, the entire tree rolls back. Your database stack, your networking stack, your monitoring stack — all rolling back because someone typo'd an IAM policy in the app stack.

This isn't a theoretical risk. It's a Tuesday afternoon.

Opaque error messages

When a nested stack fails, the parent stack shows:

Embedded stack arn:aws:cloudformation:us-east-1:123456789:stack/parent-AppStack-ABC123/... was not successfully updated.

Helpful. You now have to click into the nested stack, find the actual failed resource, read its status reason, and mentally map that back to which template parameter or resource property caused it. For deeply nested stacks (yes, people nest inside nests), this becomes archaeology.

Painful rollbacks and stuck states

Nested stack rollbacks are slow. CloudFormation processes them sequentially, waiting for each nested stack to finish rolling back before moving to the next. A rollback that would take 5 minutes for a single stack takes 30 minutes across a tree of nested stacks.

Worse, if a rollback itself fails (a resource can't be deleted because it's in use, for example), the entire parent stack enters UPDATE_ROLLBACK_FAILED. Now you need continue-update-rollback with resource skipping, which is the CloudFormation equivalent of percussive maintenance.

Drift detection is almost useless

CloudFormation drift detection on a parent stack doesn't automatically detect drift in nested stacks. You have to run drift detection on each nested stack individually. If you have 12 nested stacks, that's 12 separate drift detection operations. Most teams just don't bother, which means drift accumulates silently.

Delete ordering nightmares

Deleting a parent stack deletes all nested stacks. Sounds convenient until you realize CloudFormation doesn't always get the ordering right for cross-stack dependencies. A security group in the networking nested stack that's referenced by an EC2 instance in the app nested stack — CloudFormation might try to delete the security group first and fail.

The alternative: isolated stacks with parameter sharing

The pattern is simple. Each logical unit of infrastructure is its own independent CloudFormation stack. They share data through SSM parameters or CloudFormation exports.

How it works

Your VPC stack creates subnets, NAT gateways, and route tables. It writes the subnet IDs, VPC ID, and security group IDs to SSM Parameter Store:

Resources:
  VpcIdParam:
    Type: AWS::SSM::Parameter
    Properties:
      Name: /infra/vpc/id
      Type: String
      Value: !Ref VPC

  PrivateSubnetsParam:
    Type: AWS::SSM::Parameter
    Properties:
      Name: /infra/vpc/private-subnets
      Type: StringList
      Value: !Join [",", [!Ref SubnetA, !Ref SubnetB, !Ref SubnetC]]

Your app stack reads those parameters:

Parameters:
  VpcId:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /infra/vpc/id

  PrivateSubnets:
    Type: AWS::SSM::Parameter::Value<List<String>>
    Default: /infra/vpc/private-subnets

Each stack deploys independently. The VPC stack and the app stack have no CloudFormation-level relationship. They're connected only by the SSM parameters they read and write.

Why SSM parameters over CloudFormation exports

CloudFormation exports (!ImportValue) work but have a major limitation: you can't update or delete an export that's referenced by another stack. This creates tight coupling that makes updates painful. SSM parameters have no such restriction — you can update a parameter value without caring who reads it.

What you gain with isolated stacks

Independent blast radius

The app stack fails? The VPC stack doesn't care. The database stack doesn't care. The monitoring stack doesn't care. You fix the app stack, redeploy it, and everything else stays exactly where it was. No cascading rollbacks, no waiting for unrelated stacks to stabilize.

Faster, simpler rollbacks

A rollback affects one stack. It takes seconds or minutes instead of the 20-30 minute cascading rollbacks of nested stacks. And if a rollback gets stuck, you're dealing with one stack's resources, not an entire tree.

Real drift detection

Drift detection on an isolated stack works exactly as designed. You see which resources drifted, what changed, and can decide whether to fix it or update the template. No nested stack indirection to deal with.

Independent lifecycles

Your VPC changes maybe once a quarter. Your app stack changes daily. With nested stacks, every app deploy re-evaluates the VPC template. With isolated stacks, the VPC stack sits untouched until you actually need to change it. This isn't just cleaner — it's safer. You're not touching networking infrastructure on every application deploy.

Easier to reason about

Each stack is a self-contained unit. You can read the template, understand what it creates, and deploy it without thinking about the rest of the infrastructure. New team members can understand one stack at a time instead of tracing through a parent-child hierarchy.

The shared VPC pattern

The canonical example of isolated stacks is the shared VPC pattern. You have one VPC stack that owns all the networking: subnets, NAT gateways, route tables, VPC endpoints. It publishes everything to SSM.

Then you have N application stacks that read subnet IDs and security group IDs from SSM. Each application stack is completely independent. You can add, update, or delete application stacks without touching the VPC stack. You can update the VPC stack (add a subnet, change a route) without redeploying any application stacks.

This pattern scales to dozens of stacks. VPC, database, cache, app, monitoring, CI/CD, DNS — each its own stack, each deployable independently, each with its own rollback boundary.

When nesting still makes sense

Nested stacks aren't always wrong. They make sense when:

Resources are truly inseparable — an ALB and its target groups and listener rules are one logical unit. Splitting them into separate stacks adds complexity without benefit.
You need atomic deploys — if two resources must update together or not at all, a single stack (nested or not) guarantees that.
Template reuse across accounts — nested stacks can be useful for packaging reusable infrastructure modules that get deployed into multiple AWS accounts.

But for the common case — organizing a production environment with multiple services, databases, and networking — isolated stacks are the better default.

Making the switch

If you're already deep in nested stacks, you don't have to rewrite everything at once. Start with the next new stack. Make it isolated. Have it read from SSM parameters that your existing nested stacks already write. Over time, extract nested stacks into independent ones as you touch them.

The transition is gradual, and each step makes your infrastructure easier to operate.

The appeal of nested stacks is organizational. The appeal of isolated stacks is operational. When it's 2 AM and something is broken, you want the operational advantage.

Published by Yaw Labs.