AWS “Pending” States Are a Blueprint for Async Deployments
I keep noticing the same shape across AWS provisioning APIs:
- the API returns quickly
- the real work continues in the control plane
- and you learn the outcome by watching an explicit status machine
We treat this as “just how AWS works”, but it’s also a pretty solid blueprint for how deployments should behave.
In this post, I’m going to reverse-engineer that shape using two AWS examples most of us have touched:
- ACM certificates that sit in
PENDING_VALIDATION - RDS instances that are
creatinguntil they becomeavailable
If you’ve ever stared at a console page refreshing it like it owes you certainty, you already understand the problem this is trying to solve.
What I mean by “async” (in plain words)
When an AWS API is “async”, the request/response is not the same thing as “the work is done”.
- You call an API that starts something.
- AWS responds quickly (often with an ARN/ID).
- The resource enters some intermediate state.
- You either poll (or use a waiter) to find out when it’s finished.
That “intermediate state” is the key detail.
It’s AWS being honest: real infrastructure changes are multi-step, distributed, and sometimes blocked on things outside AWS.
Example 1: ACM and the most human state: PENDING_VALIDATION
This one hit me because it feels less like “AWS is computing” and more like “AWS is waiting”.
You request a certificate from AWS Certificate Manager (ACM) and you get back a certificate ARN.
At that point, you feel like you created a certificate.
But then it sits in:
PENDING_VALIDATION
If you’re doing DNS validation (which many of us do), AWS gives you a CNAME record to create. Until that record exists and propagates, the system can’t move forward.
Here’s what I find interesting:
- The system is not “broken”.
- The system is not “slow”.
- The system is waiting on an external dependency.
If I translate that into deployment language, this feels like:
- waiting for a human approval
- waiting for a change freeze to end
- waiting for a metric window to complete
- waiting for a dependency to be ready
The empathy bit
When I’m new to a system, PENDING_VALIDATION can be emotionally uncomfortable.
Not because it’s wrong—because it’s unclear.
The difference between calm waiting and anxious waiting is:
- do I know what it’s waiting for?
- do I know what I can do next?
- do I know how long it might take?
ACM is actually pretty good UX here because it tells you the exact DNS record it expects.
That is a lesson I want to steal for “async deployments”: if a workflow is blocked, it should be explicit about the reason and the next safe action.
Example 2: RDS and the classic “it’s creating…” loop
Creating an RDS instance is the other kind of async.
This time, it’s not waiting on a human. It’s waiting on the control plane to do real work.
You call CreateDBInstance, and then the instance status is creating for a while, and eventually (if things go well) it becomes available.
If you’ve ever written automation around RDS, you’ve probably faced the question:
How do I wait for the DB to be ready without doing something silly?
What I used to do (and what I now see as a smell)
The naive solution is “poll aggressively”.
But aggressive polling is:
- noisy
- flaky
- sometimes rate-limit prone
- and often unclear about terminal states
The nicer way: waiters
AWS SDKs often give you waiters, which is basically AWS saying:
“Yes, this operation is async. Here’s a safe, standardized way to wait.”
Here’s a small example in Python using boto3:
import boto3
rds = boto3.client("rds", region_name="ap-south-1")
rds.create_db_instance(
DBInstanceIdentifier="demo-db",
Engine="postgres",
DBInstanceClass="db.t4g.micro",
AllocatedStorage=20,
MasterUsername="postgres",
MasterUserPassword="REDACTED",
)
waiter = rds.get_waiter("db_instance_available")
waiter.wait(DBInstanceIdentifier="demo-db")
print("DB is available")
The waiter is doing something that looks a lot like a workflow primitive:
- poll
- back off
- stop when a terminal state happens
- raise if it fails
That’s not just convenience. It’s an opinionated interface for async reality.
A deployment API shape worth copying
Here’s the pattern I want to steal from AWS:
Make deployments behave like a good AWS API.
Not “run this script and hope”, but:
- Start:
StartDeployment()returns quickly with adeploymentId - Track:
GetDeploymentStatus(deploymentId)is the source of truth - Wait:
WaitForDeployment(deploymentId)uses a sane policy (like waiters) - Decide: the status tells you what’s happening and what you can safely do next
If you’ve used modern deployment systems, you’ve probably seen pieces of this:
- canary rollouts that pause
- staged promotions
- long-running “in progress” deploys after CI has already finished
My point is: a lot of “async deployment” ideas are already familiar because AWS made them familiar.
What I’m taking from ACM + RDS (and trying to apply everywhere)
I’m collecting these as personal rules of thumb.
1) Waiting is a first-class state (not a failure)
ACM doesn’t pretend issuance is instant.
It gives you a meaningful state: PENDING_VALIDATION.
If a deploy is waiting, that should be visible and named.
2) A handle matters more than a log
AWS gives you a stable identity (ARN / identifier) and you ask for status.
That’s much better than:
- “go read CI logs and guess what happened”
3) Polling is an API design problem
RDS waiters are AWS admitting that polling will happen—and helping you do it sanely.
If you build internal tooling, giving people a standard wait mechanism is not a nice-to-have. It’s part of the contract.
A checklist I want in every async deployment system
This is the part I’m personally going to keep coming back to.
- [Operation handle] Every deploy returns a stable
deploymentIdyou can refer to later. - [Explicit states] Model
IN_PROGRESS,WAITING_ON_EXTERNAL,PAUSED,FAILED,COMPLETE. - [Narrated waiting] If it’s waiting, it should say what for and how to unblock it (ACM energy).
- [Waiters / polling policy] Provide a first-class
waitthat has:- bounded timeouts
- backoff + jitter
- terminal-state detection
- [Idempotent retries] Retrying shouldn’t create duplicates or restart dangerous steps.
- [Source of truth] Status comes from a durable store /
Describe*-style API, not from logs. - [Terminal state taxonomy] Differentiate:
- retryable transient failures
- blocked states
- permanent failures
- [Next safe actions] Every state should suggest what a human can do without making things worse.
If I had to compress it:
A good async deployment system doesn’t remove waiting.
It removes uncertainty.
Closing
I used to think of “async deployments” as something you only get once you have a big internal platform.
But AWS makes the idea feel much more practical.
We already trust async workflows every time we provision infrastructure:
- we accept
PENDING_VALIDATIONbecause the system tells us what it needs - we accept
creatingbecause there’s a reliable way to wait
So if you’re building deployment tooling (even if it’s “just scripts” today), my suggestion is simple:
Copy the parts AWS gets right—handles, explicit states, waiters, narration, safe retries.
Waiting isn’t the problem.
Waiting without a clear status model is.