In plain words

Picture a function that charges a card, emails a receipt, then updates a ledger. The server crashes right after the charge. Restart naive code and it runs from the top: it charges the card a second time. Durable execution makes that function resume after the charge instead, never repeating the step that already happened.

It manages this by recording every step to durable storage as the step completes. After a crash, the runtime replays that record: finished steps return their saved result instead of running again, and execution picks up at the first step that had not completed. Temporal calls it "the ultimate autosave."

How it works

The engine keeps a durable log of what the workflow has done. Vendors name it differently - Temporal calls it the event history, Azure the orchestration history, Restate a journal, DBOS a set of checkpoints - but the idea is identical. Each step you run, and the result it returned, is appended to that log before control comes back to your code.

On recovery the function is re-executed from the start, with a twist: the engine intercepts each step, and if the log already holds a result for it, that result is returned and the step's own code never runs. This is the event sourcing pattern applied to code execution. The durable record is an append-only series of actions, not a snapshot of current state, which is why a completed charge is never issued twice.

Because the function replays, its code has to be deterministic. As Azure's docs put it, "an orchestrator function replays multiple times, and it must produce the same result each time." Anything non-deterministic - random values, the wall clock, direct network calls - has to go through the engine so it replays the same way every time. Workflows read as ordinary procedural code:

const df = require("durable-functions");

df.app.orchestration("helloSequence", function* (context) {
    const output = [];
    output.push(yield context.df.callActivity("sayHello", "Tokyo"));
    output.push(yield context.df.callActivity("sayHello", "Seattle"));
    output.push(yield context.df.callActivity("sayHello", "London"));

    // Return ["Hello Tokyo!", "Hello Seattle!", "Hello London!"].
    return output;
});

Each yield is an automatic checkpoint. The engine also handles the slow parts for you: failed steps are retried automatically, and a workflow can pause for seconds, days, or weeks on a durable timer while its resources are freed, then wake up exactly where it stopped.

Why it matters

Durable execution removes a whole category of code you would otherwise hand-write: checkpointing, retry bookkeeping, idempotency keys, and recovery logic. The engine guarantees the workflow runs to completion across crashes, restarts, and deploys, so long-running processes like payment flows, user onboarding, and order fulfillment become reliable by default.

The pattern is spreading fast into AI and the edge. Long-running agent loops need the same survive-a-crash-and-continue guarantee, which is why the Model Context Protocol added an experimental Tasks primitive that wraps requests in durable execution. It is also moving onto the edge runtime, where Cloudflare now offers per-tenant durable execution with no infrastructure to manage.

One caveat worth holding onto: the guarantees are strong but not magic. Engines advertise "exactly-once" or "effectively once" completion, yet a step can run, fail to record its result, and run again on recovery. Steps that touch the outside world should still be idempotent.

Origin

There is no single spec for durable execution. The term spread through a cluster of platforms that implement it - Temporal, Azure Durable Functions, Restate, DBOS, Inngest, and Cloudflare Workflows - rather than from one standard, and Temporal is most often credited with popularizing it. Underneath the different names they share the event-sourcing pattern: store the log of what happened, then rebuild state by replaying it.

Common confusions

Confused withHow it differs from durable execution
Message queuesA queue hands off a message; managing state, ordering, and long waits across steps is your application's job. A durable execution engine manages workflow state and long-running waits for you.
Workflow engines (BPMN / DSL)Traditional engines define flows in declarative YAML or a visual designer with limited control flow. Durable execution uses ordinary procedural code with built-in retries. Temporal notes that "relatively few" systems called workflow engines actually provide it.
Simple retriesRetrying a whole function re-runs its side effects. Durable execution replays completed steps from the log so they never run twice, and retries are per-step and automatic.
Event sourcingNot a competitor. Event sourcing is the underlying persistence pattern: durable execution applies it to your code's execution state rather than to domain data.