Skip to content

Add WaitForConditionAsync polling primitive (DOTNET-8665)#2376

Draft
GarrettBeatty wants to merge 1 commit into
feature/durablefunctionfrom
gcbeatty/durable-waitforcondition
Draft

Add WaitForConditionAsync polling primitive (DOTNET-8665)#2376
GarrettBeatty wants to merge 1 commit into
feature/durablefunctionfrom
gcbeatty/durable-waitforcondition

Conversation

@GarrettBeatty
Copy link
Copy Markdown
Contributor

@GarrettBeatty GarrettBeatty commented May 14, 2026

#2216

What

Adds service-mediated polling to Amazon.Lambda.DurableExecution. WaitForConditionAsync repeatedly evaluates a check function with a configurable wait strategy between attempts; each iteration runs in its own Lambda invocation (suspended via STEP + RETRY checkpoints carrying NextAttemptDelaySeconds), so polling does not consume compute time while waiting between checks.

Public API:

Type Purpose
IDurableContext.WaitForConditionAsync<TState>(...) Poll until the user-supplied predicate signals done or maxAttempts is exhausted. Per-iteration state is serialized via the ILambdaSerializer registered on ILambdaContext.Serializer (same pattern as StepAsync / RunInChildContextAsync).
IConditionCheckContext Passed to the check func: Logger, AttemptNumber.
WaitForConditionConfig<TState> Required InitialState + WaitStrategy.
IWaitStrategy<TState> Decide(state, attempt) returning WaitDecision.
WaitDecision Readonly record struct: ShouldContinue, Delay, factories Stop() / ContinueAfter(TimeSpan).
WaitStrategy factories Exponential, Linear, Fixed, FromDelegate — each accepting an optional Func<TState, bool> isDone predicate.
WaitForConditionException Thrown when maxAttempts is exhausted; carries AttemptsExhausted and LastState (preserved across both live execution and replay).

How

Internal/WaitForConditionOperation<TState> runs as a STEP with SubType = "WaitForCondition":

  • Each iteration calls the user check function with the current state. The strategy's Decide(state, attempt) returns either Stop() (success — return state) or ContinueAfter(delay) (suspend until next attempt). Continuing emits Action=/proxy/https/github.com/aws/aws-lambda-dotnet/pull/RETRY%3C/code> with the new state in the payload and NextAttemptDelaySeconds set so the durable execution service schedules the next invocation without consuming compute time.
  • Strategies signal max-attempts exhausted by throwing WaitForConditionException directly from Decide(); the operation enriches with LastState before checkpointing FAIL. LastState survives FAIL replay: serialized into the FAIL payload at write time and deserialized in BuildFailureException with a warning-logged fallback for legacy / corrupt data.
  • Serialization is delegated to the registered ILambdaSerializer via stream-based Serialize<T> / Deserialize<T> calls — no AOT trim attributes on the public API. Mirrors StepOperation / ChildContextOperation.
  • ExponentialBackoff helper is extracted for sharing between this operation and ExponentialRetryStrategy. Math is byte-for-byte identical to the existing implementation.
  • Reuses OperationSubTypes.WaitForCondition from Wave 0 (doc updates #2372).

Defaults: 60 attempts, 5s initial delay, 300s max delay, 1.5x rate, Full jitter — distinct from RetryStrategy.Default and matching Python/JS/Java reference SDKs.

Cross-SDK note: Python returns success on max-attempts exhausted; .NET / Java / JS throw. Workflows ported from Python should review for new failure modes. Documented in the design doc.

Stacked on top of #2372.

Fixes DOTNET-8665.

Testing

41 new unit tests covering each wait strategy, the isDone predicate paths, max-attempts exhaustion, user-check exceptions, replay determinism, exponential-backoff bounds, and corrupt-payload fallback logging:

  • Each WaitStrategy factory: Exponential, Linear, Fixed, FromDelegate — both with and without isDone predicate.
  • WaitForConditionException thrown after max-attempts; AttemptsExhausted + LastState populated.
  • Replay determinism: SUCCEEDED returns cached state, FAILED reconstructs WaitForConditionException with LastState, STARTED re-suspends.
  • LastState survives FAIL replay; corrupt FAIL payload falls back gracefully with a warning.
  • ExponentialBackoff shared math: byte-for-byte identical to ExponentialRetryStrategy (regression test).
  • User check function throws are surfaced with the original exception preserved.

5 new integration tests build successfully (require AWS credentials to run).

Build clean: 0 warnings, 0 errors on net8.0 and net10.0.

Out of scope (follow-up PRs)

  • MapAsync / continued parallel-suite work.
  • DurableLogger replay-suppression (currently NullLogger).
  • Annotations source-generator integration / [DurableExecution] attribute.
  • DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package.
  • dotnet new lambda.DurableFunction blueprint.


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from 464c591 to d308c3b Compare May 14, 2026 21:49
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-waitforcondition branch 2 times, most recently from 7f91202 to 3fa06ce Compare May 14, 2026 22:19
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch from d308c3b to be4c3ad Compare May 18, 2026 15:23
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-waitforcondition branch from 3fa06ce to 67f0c0c Compare May 18, 2026 15:50
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-wave0 branch 3 times, most recently from ad4d208 to 3acbed5 Compare May 20, 2026 17:46
Base automatically changed from gcbeatty/durable-wave0 to gcbeatty/durable-child-context May 20, 2026 17:46
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-child-context branch 2 times, most recently from 4d97473 to 8a6c41c Compare May 21, 2026 18:56
Base automatically changed from gcbeatty/durable-child-context to feature/durablefunction May 23, 2026 15:58
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-waitforcondition branch from 67f0c0c to feec401 Compare May 28, 2026 16:06
Adds service-mediated polling to the .NET Durable Execution SDK.
WaitForConditionAsync repeatedly evaluates a check function with
configurable wait strategy between attempts; each iteration is its own
Lambda invocation (suspended via STEP+RETRY checkpoints carrying
NextAttemptDelaySeconds), so polling does not consume compute time.

Public surface:
- IDurableContext.WaitForConditionAsync<TState> (single overload; the
  per-iteration state checkpoint is serialized via the ILambdaSerializer
  registered on ILambdaContext.Serializer, configured via
  LambdaBootstrapBuilder.Create(handler, serializer))
- IConditionCheckContext (Logger + AttemptNumber)
- WaitForConditionConfig<TState> (required InitialState + WaitStrategy)
- IWaitStrategy<TState> with Decide(state, attempt) returning
  WaitDecision
- WaitDecision (readonly record struct, ShouldContinue + Delay,
  Stop() / ContinueAfter(TimeSpan) factories)
- WaitStrategy factories: Exponential / Linear / Fixed / FromDelegate,
  each accepting an optional Func<TState, bool> isDone predicate
- WaitForConditionException with AttemptsExhausted and LastState
  (preserved across both live execution and replay)

Internal:
- WaitForConditionOperation<TState> wire format = STEP + SubType
  "WaitForCondition". Each polling iteration emits Action=/proxy/https/github.com/aws/aws-lambda-dotnet/pull/RETRY with
  the new state in payload and NextAttemptDelaySeconds for the
  service to schedule the next invocation.
- Serialization is delegated to the registered ILambdaSerializer via
  Stream-based Serialize<T>/Deserialize<T> calls; no AOT trim attributes
  on the public API. Mirrors StepOperation/ChildContextOperation.
- Strategies signal max-attempts exhausted by throwing
  WaitForConditionException directly from Decide(); the operation
  enriches with LastState before checkpointing FAIL.
- LastState survives FAIL replay: serialized into FAIL payload at
  write time, deserialized in BuildFailureException with
  warning-logged fallback for legacy/corrupt data.
- ExponentialBackoff helper extracted for sharing with
  ExponentialRetryStrategy. Math is byte-for-byte identical.
- Reuses OperationSubTypes.WaitForCondition from Wave 0.

Defaults: 60 attempts / 5s initial / 300s max / 1.5x rate / Full jitter -
distinct from RetryStrategy.Default and matching Python/JS/Java reference
SDKs. (Note: Python returns success on max-attempts; .NET/Java/JS throw
- documented in design doc.)

Adds 41 unit tests + 5 integration tests covering each wait strategy,
isDone predicate paths, max-attempts exhaustion, user-check exceptions,
replay determinism, exponential backoff bounds, and corrupt-payload
fallback logging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-waitforcondition branch from feec401 to 299ac83 Compare May 28, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants