Nested stacks look clean on paper. In practice, isolated stacks are simpler, safer, and easier to operate.
CloudFormation nested stacks seem like the obvious way to organize large infrastructure. You break things into logical units, reference them from a parent stack, and deploy everything together. It looks like good engineering. Clean hierarchy, DRY templates, one deploy command.
Then something goes wrong. A nested stack fails mid-update. The parent stack rolls back, which rolls back every sibling stack, which rolls back resources that had nothing to do with the failure. You're staring at a wall of UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS events trying to figure out which of the 14 nested stacks actually broke.
That's the moment you start questioning nested stacks.
The pitch is compelling:
aws cloudformation deploy command updates everythingThis works for small environments. Two or three nested stacks, a VPC and an app layer, manageable. The problems start when you have five, ten, fifteen nested stacks and real traffic depending on them.
The biggest issue. When you update a parent stack, CloudFormation evaluates every nested stack, even if only one changed. If any nested stack update fails, the entire tree rolls back. Your database stack, your networking stack, your monitoring stack — all rolling back because someone typo'd an IAM policy in the app stack.
This isn't a theoretical risk. It's a Tuesday afternoon.
When a nested stack fails, the parent stack shows:
Embedded stack arn:aws:cloudformation:us-east-1:123456789:stack/parent-AppStack-ABC123/... was not successfully updated.
Helpful. You now have to click into the nested stack, find the actual failed resource, read its status reason, and mentally map that back to which template parameter or resource property caused it. For deeply nested stacks (yes, people nest inside nests), this becomes archaeology.
Nested stack rollbacks are slow. CloudFormation processes them sequentially, waiting for each nested stack to finish rolling back before moving to the next. A rollback that would take 5 minutes for a single stack takes 30 minutes across a tree of nested stacks.
Worse, if a rollback itself fails (a resource can't be deleted because it's in use, for example), the entire parent stack enters UPDATE_ROLLBACK_FAILED. Now you need continue-update-rollback with resource skipping, which is the CloudFormation equivalent of percussive maintenance.
CloudFormation drift detection on a parent stack doesn't automatically detect drift in nested stacks. You have to run drift detection on each nested stack individually. If you have 12 nested stacks, that's 12 separate drift detection operations. Most teams just don't bother, which means drift accumulates silently.
Deleting a parent stack deletes all nested stacks. Sounds convenient until you realize CloudFormation doesn't always get the ordering right for cross-stack dependencies. A security group in the networking nested stack that's referenced by an EC2 instance in the app nested stack — CloudFormation might try to delete the security group first and fail.
The pattern is simple. Each logical unit of infrastructure is its own independent CloudFormation stack. They share data through SSM parameters or CloudFormation exports.
Your VPC stack creates subnets, NAT gateways, and route tables. It writes the subnet IDs, VPC ID, and security group IDs to SSM Parameter Store:
Resources:
VpcIdParam:
Type: AWS::SSM::Parameter
Properties:
Name: /infra/vpc/id
Type: String
Value: !Ref VPC
PrivateSubnetsParam:
Type: AWS::SSM::Parameter
Properties:
Name: /infra/vpc/private-subnets
Type: StringList
Value: !Join [",", [!Ref SubnetA, !Ref SubnetB, !Ref SubnetC]]
Your app stack reads those parameters:
Parameters:
VpcId:
Type: AWS::SSM::Parameter::Value<String>
Default: /infra/vpc/id
PrivateSubnets:
Type: AWS::SSM::Parameter::Value<List<String>>
Default: /infra/vpc/private-subnets
Each stack deploys independently. The VPC stack and the app stack have no CloudFormation-level relationship. They're connected only by the SSM parameters they read and write.
CloudFormation exports (!ImportValue) work but have a major limitation: you can't update or delete an export that's referenced by another stack. This creates tight coupling that makes updates painful. SSM parameters have no such restriction — you can update a parameter value without caring who reads it.
The app stack fails? The VPC stack doesn't care. The database stack doesn't care. The monitoring stack doesn't care. You fix the app stack, redeploy it, and everything else stays exactly where it was. No cascading rollbacks, no waiting for unrelated stacks to stabilize.
A rollback affects one stack. It takes seconds or minutes instead of the 20-30 minute cascading rollbacks of nested stacks. And if a rollback gets stuck, you're dealing with one stack's resources, not an entire tree.
Drift detection on an isolated stack works exactly as designed. You see which resources drifted, what changed, and can decide whether to fix it or update the template. No nested stack indirection to deal with.
Your VPC changes maybe once a quarter. Your app stack changes daily. With nested stacks, every app deploy re-evaluates the VPC template. With isolated stacks, the VPC stack sits untouched until you actually need to change it. This isn't just cleaner — it's safer. You're not touching networking infrastructure on every application deploy.
Each stack is a self-contained unit. You can read the template, understand what it creates, and deploy it without thinking about the rest of the infrastructure. New team members can understand one stack at a time instead of tracing through a parent-child hierarchy.
The canonical example of isolated stacks is the shared VPC pattern. You have one VPC stack that owns all the networking: subnets, NAT gateways, route tables, VPC endpoints. It publishes everything to SSM.
Then you have N application stacks that read subnet IDs and security group IDs from SSM. Each application stack is completely independent. You can add, update, or delete application stacks without touching the VPC stack. You can update the VPC stack (add a subnet, change a route) without redeploying any application stacks.
This pattern scales to dozens of stacks. VPC, database, cache, app, monitoring, CI/CD, DNS — each its own stack, each deployable independently, each with its own rollback boundary.
Nested stacks aren't always wrong. They make sense when:
But for the common case — organizing a production environment with multiple services, databases, and networking — isolated stacks are the better default.
If you're already deep in nested stacks, you don't have to rewrite everything at once. Start with the next new stack. Make it isolated. Have it read from SSM parameters that your existing nested stacks already write. Over time, extract nested stacks into independent ones as you touch them.
The transition is gradual, and each step makes your infrastructure easier to operate.
The appeal of nested stacks is organizational. The appeal of isolated stacks is operational. When it's 2 AM and something is broken, you want the operational advantage.
Published by Yaw Labs.