AWS “Pending” States Are a Blueprint for Async Deployments

I keep noticing the same shape across AWS provisioning APIs:

the API returns quickly
the real work continues in the control plane
and you learn the outcome by watching an explicit status machine

We treat this as “just how AWS works”, but it’s also a pretty solid blueprint for how deployments should behave.

In this post, I’m going to reverse-engineer that shape using two AWS examples most of us have touched:

ACM certificates that sit in PENDING_VALIDATION
RDS instances that are creating until they become available

If you’ve ever stared at a console page refreshing it like it owes you certainty, you already understand the problem this is trying to solve.

What I mean by “async” (in plain words)

When an AWS API is “async”, the request/response is not the same thing as “the work is done”.

You call an API that starts something.
AWS responds quickly (often with an ARN/ID).
The resource enters some intermediate state.
You either poll (or use a waiter) to find out when it’s finished.

That “intermediate state” is the key detail.

It’s AWS being honest: real infrastructure changes are multi-step, distributed, and sometimes blocked on things outside AWS.

Example 1: ACM and the most human state: `PENDING_VALIDATION`

This one hit me because it feels less like “AWS is computing” and more like “AWS is waiting”.

You request a certificate from AWS Certificate Manager (ACM) and you get back a certificate ARN.

At that point, you feel like you created a certificate.

But then it sits in:

PENDING_VALIDATION

If you’re doing DNS validation (which many of us do), AWS gives you a CNAME record to create. Until that record exists and propagates, the system can’t move forward.

Here’s what I find interesting:

The system is not “broken”.
The system is not “slow”.
The system is waiting on an external dependency.

If I translate that into deployment language, this feels like:

waiting for a human approval
waiting for a change freeze to end
waiting for a metric window to complete
waiting for a dependency to be ready

The empathy bit

When I’m new to a system, PENDING_VALIDATION can be emotionally uncomfortable.

Not because it’s wrong—because it’s unclear.

The difference between calm waiting and anxious waiting is:

do I know what it’s waiting for?
do I know what I can do next?
do I know how long it might take?

ACM is actually pretty good UX here because it tells you the exact DNS record it expects.

That is a lesson I want to steal for “async deployments”: if a workflow is blocked, it should be explicit about the reason and the next safe action.

Example 2: RDS and the classic “it’s creating…” loop

Creating an RDS instance is the other kind of async.

This time, it’s not waiting on a human. It’s waiting on the control plane to do real work.

You call CreateDBInstance, and then the instance status is creating for a while, and eventually (if things go well) it becomes available.

If you’ve ever written automation around RDS, you’ve probably faced the question:

How do I wait for the DB to be ready without doing something silly?

What I used to do (and what I now see as a smell)

The naive solution is “poll aggressively”.

But aggressive polling is:

noisy
flaky
sometimes rate-limit prone
and often unclear about terminal states

The nicer way: waiters

AWS SDKs often give you waiters, which is basically AWS saying:

“Yes, this operation is async. Here’s a safe, standardized way to wait.”

Here’s a small example in Python using boto3:

import boto3

rds = boto3.client("rds", region_name="ap-south-1")

rds.create_db_instance(
    DBInstanceIdentifier="demo-db",
    Engine="postgres",
    DBInstanceClass="db.t4g.micro",
    AllocatedStorage=20,
    MasterUsername="postgres",
    MasterUserPassword="REDACTED",
)

waiter = rds.get_waiter("db_instance_available")
waiter.wait(DBInstanceIdentifier="demo-db")

print("DB is available")

The waiter is doing something that looks a lot like a workflow primitive:

poll
back off
stop when a terminal state happens
raise if it fails

That’s not just convenience. It’s an opinionated interface for async reality.

A deployment API shape worth copying

Here’s the pattern I want to steal from AWS:

Make deployments behave like a good AWS API.

Not “run this script and hope”, but:

Start: StartDeployment() returns quickly with a deploymentId
Track: GetDeploymentStatus(deploymentId) is the source of truth
Wait: WaitForDeployment(deploymentId) uses a sane policy (like waiters)
Decide: the status tells you what’s happening and what you can safely do next

If you’ve used modern deployment systems, you’ve probably seen pieces of this:

canary rollouts that pause
staged promotions
long-running “in progress” deploys after CI has already finished

My point is: a lot of “async deployment” ideas are already familiar because AWS made them familiar.

What I’m taking from ACM + RDS (and trying to apply everywhere)

I’m collecting these as personal rules of thumb.

1) Waiting is a first-class state (not a failure)

ACM doesn’t pretend issuance is instant.

It gives you a meaningful state: PENDING_VALIDATION.

If a deploy is waiting, that should be visible and named.

2) A handle matters more than a log

AWS gives you a stable identity (ARN / identifier) and you ask for status.

That’s much better than:

“go read CI logs and guess what happened”

3) Polling is an API design problem

RDS waiters are AWS admitting that polling will happen—and helping you do it sanely.

If you build internal tooling, giving people a standard wait mechanism is not a nice-to-have. It’s part of the contract.

A checklist I want in every async deployment system

This is the part I’m personally going to keep coming back to.

[Operation handle] Every deploy returns a stable deploymentId you can refer to later.
[Explicit states] Model IN_PROGRESS, WAITING_ON_EXTERNAL, PAUSED, FAILED, COMPLETE.
[Narrated waiting] If it’s waiting, it should say what for and how to unblock it (ACM energy).
[Waiters / polling policy] Provide a first-class wait that has:
- bounded timeouts
- backoff + jitter
- terminal-state detection
[Idempotent retries] Retrying shouldn’t create duplicates or restart dangerous steps.
[Source of truth] Status comes from a durable store / Describe*-style API, not from logs.
[Terminal state taxonomy] Differentiate:
- retryable transient failures
- blocked states
- permanent failures
[Next safe actions] Every state should suggest what a human can do without making things worse.

If I had to compress it:

A good async deployment system doesn’t remove waiting.

It removes uncertainty.

Closing

I used to think of “async deployments” as something you only get once you have a big internal platform.

But AWS makes the idea feel much more practical.

We already trust async workflows every time we provision infrastructure:

we accept PENDING_VALIDATION because the system tells us what it needs
we accept creating because there’s a reliable way to wait

So if you’re building deployment tooling (even if it’s “just scripts” today), my suggestion is simple:

Copy the parts AWS gets right—handles, explicit states, waiters, narration, safe retries.

Waiting isn’t the problem.

Waiting without a clear status model is.

AWS "Pending" States Are a Blueprint for Async Deployments