Skip to content

UIAP Action Runtime

FieldValue
StatusDraft
Version0.1
Date2026-03-27
Dependencies[UIAP-CORE], [UIAP-CAP]
EditorsPatrick

UIAP Action Runtime v0.1 defines how a UIAP counterpart accepts a requested Action, resolves the target, checks actionability, selects the execution mode, verifies success, and reports structured results.

The Runtime is not limited to the Web, but this document includes a normative Web binding via [email protected].

The key words MUST, MUST NOT, SHOULD, MAY in this document are to be interpreted as described in RFC 2119 and BCP 14, when and only when they appear in ALL CAPS.

The Executor receives action.request and carries out execution.

The Controller sends action.request, action.cancel, and Confirmation responses.

Optional. Displays Ghost Cursor, Highlights, Focus Rings, or Narration.


  1. An Action MUST have a well-defined lifecycle.
  2. Non-idempotent Actions MUST NOT be blindly retried.
  3. Target resolution MUST be deterministic and verifiable.
  4. Success MUST be verified through observable signals.
  5. User Activation and Trusted Event boundaries MUST be explicitly modeled.
  6. Presentation is optional and MUST NOT falsify execution outcomes.

An Action is in exactly one of these states:

  • RECEIVED
  • RESOLVING_TARGET
  • CHECKING_PRECONDITIONS
  • AWAITING_CONFIRMATION
  • EXECUTING
  • VERIFYING
  • WAITING_FOR_USER
  • RECOVERING
  • SUCCEEDED
  • FAILED
  • CANCELLED

interface ActionRequestPayload {
actionId: ActionId;
target?: ActionTarget;
args?: Record<string, unknown>;
preferredExecutionModes?: ExecutionMode[];
verification?: VerificationSpec;
presentation?: PresentationHints;
timeoutMs?: number;
idempotencyKey?: string;
metadata?: Record<string, unknown>;
}
interface ActionTarget {
ref?: TargetRef;
expectedRole?: UIRole;
expectedName?: string;
expectedScopeId?: ScopeId;
expectedDocumentId?: DocumentId;
allowAmbiguous?: false; // default false
}
interface PresentationHints {
ghostCursor?: boolean;
highlight?: "none" | "outline" | "spotlight";
narration?: string;
pace?: "instant" | "humanized";
}
interface ActionAcceptedPayload {
actionHandle: string;
actionId: ActionId;
status: "accepted";
}
  • A valid action.request MUST be answered with action.accepted or error.
  • actionHandle MUST be unique within the session.

interface ActionProgressPayload {
actionHandle: string;
stage:
| "resolving_target"
| "checking_preconditions"
| "awaiting_confirmation"
| "executing"
| "verifying"
| "waiting_for_user"
| "recovering";
chosenExecutionMode?: ExecutionMode;
resolvedTarget?: ResolvedTarget;
note?: string;
detail?: Record<string, unknown>;
}

interface ActionConfirmationRequestPayload {
actionHandle: string;
actionId: ActionId;
risk: RiskDescriptor;
preview?: {
summary?: string;
target?: ResolvedTarget;
args?: Record<string, unknown>;
};
}
interface ActionConfirmationGrantPayload {
actionHandle: string;
}
interface ActionConfirmationDenyPayload {
actionHandle: string;
reason?: string;
}
  • When risk.level="confirm" or policy requires it, the Executor MUST send action.confirmation.request and pause.
  • Without action.confirmation.grant, the Action MUST NOT proceed.
  • A denial MUST result in action.result.status="cancelled".

interface ActionCancelPayload {
actionHandle: string;
reason?: string;
}
interface ActionCancelledPayload {
actionHandle: string;
status: "cancelled";
reason?: string;
}

interface ActionResultPayload {
actionHandle: string;
actionId: ActionId;
status: "succeeded" | "failed" | "cancelled";
chosenExecutionMode?: ExecutionMode;
resolvedTarget?: ResolvedTarget;
verification: VerificationOutcome;
sideEffectState?: "none" | "applied" | "unknown";
stateRevision?: RevisionId;
returnValue?: Record<string, unknown>;
error?: RuntimeErrorDescriptor;
metadata?: Record<string, unknown>;
}

interface ResolvedTarget {
by: "stableId" | "instanceId" | "semantic" | "annotation" | "runtimeHint";
instanceId: ElementInstanceId;
stableId?: StableId;
documentId: DocumentId;
scopeId?: ScopeId;
role: UIRole;
name?: string;
bbox?: DOMRectLike;
}

interface VerificationSpec {
policy?: "capability-default" | "any" | "all" | "none";
signals?: SuccessSignal[];
timeoutMs?: number;
requireRevisionAdvance?: boolean;
}

interface VerificationOutcome {
passed: boolean;
policy: "capability-default" | "any" | "all" | "none";
observed: SuccessSignal[];
missing?: SuccessSignal[];
timeoutMs?: number;
}

type RuntimeErrorCode =
| "action_unsupported"
| "target_required"
| "target_not_found"
| "target_ambiguous"
| "stale_target"
| "target_not_interactable"
| "confirmation_denied"
| "user_activation_required"
| "cross_origin_unavailable"
| "closed_shadow_unavailable"
| "execution_mode_unavailable"
| "verification_failed"
| "unsafe_retry_refused"
| "cancelled"
| "internal_runtime_error";
interface RuntimeErrorDescriptor {
code: RuntimeErrorCode;
message: string;
retryable?: boolean;
detail?: Record<string, unknown>;
}

The Runtime SHOULD use the following mode fallback by default:

  1. appAction
  2. semanticUi
  3. externalDriver
  4. inputSynthesis
  5. visionAssist
  • appAction is the most stable from a domain perspective.
  • semanticUi leverages existing Web semantics.
  • externalDriver can provide real browser-level control.
  • inputSynthesis in the page context is often only partially trustworthy.
  • visionAssist is expensive and more fragile.

The specific order MAY be adjusted per app or policy.


An Executor MUST resolve targets in this order:

  1. TargetRef.by = "stableId"
  2. TargetRef.by = "semantic" within expected scopes/documents
  3. App-specific annotations such as meaning or defaultAction
  4. Local runtime hints (css, xpath) only as a last technical fallback

When multiple candidates are found, the Executor MUST compute a score based on:

  • Scope match
  • Role match
  • Name match
  • StableId match
  • Visual proximity to the current focus
  • Declared default action

If no clear winner emerges, target_ambiguous MUST be returned.

If an already-resolved target is no longer attached or no longer consistent before execution, the Runtime MUST re-resolve once. If that also fails, stale_target or target_not_found MUST be returned.


Playwright does not treat robust interaction as “just click on it” but checks states such as visibility, stability, event receivability, and enabled status before performing actions. UIAP explicitly adopts this mental model because browser UIs would otherwise start acting like offended prima donnas on every re-render. scrollIntoView() is the standard mechanism to bring targets into view. ([Playwright][4])

Before execution, the Executor MUST check at minimum:

  • Session is active
  • Action is supported
  • Policy permits the Action
  • Target is present
  • Target does not belong to an opaque region

9.2 For Pointer-like Actions (ui.activate, ui.toggle, ui.choose)

Section titled “9.2 For Pointer-like Actions (ui.activate, ui.toggle, ui.choose)”
  • attached = true
  • visible = true
  • enabled != false
  • blocked != true
  • stable != false
  • obscured != true or a safe recovery exists
  • Target is in the viewport or can be scrolled into view

9.3 For Text Input (ui.enterText, ui.clearText, ui.setValue)

Section titled “9.3 For Text Input (ui.enterText, ui.clearText, ui.setValue)”
  • editable = true or role is textually editable
  • readonly != true
  • enabled != false
  • Target can be focused or is already focused
  • Target route or link must be present or directly executable

If a check fails, either recovery MUST be attempted or an error MUST be returned.


Direct domain-level execution via an App Registry.

  • SHOULD be preferred when available.
  • MUST respect the domain-level semantics of the Action.
  • MAY be executed without a concrete DOM target if the Action permits it.

Semantic execution in the Web context.

  • SHOULD prefer platform methods such as focus(), click(), scrollIntoView(), or control-specific setters.
  • SHOULD only produce synthetic event sequences when no semantically richer method exists.
  • MUST NOT rely on raw dispatchEvent() tricks as the default approach.

HTMLElement.click() simulates a click and fires the click event, provided the element is not disabled. Events created via dispatchEvent(), however, are not isTrusted, and features requiring activation demand trusted triggering input events according to the platform. Therefore, the Runtime MUST treat user_activation_required as a first-class case instead of pretending that JavaScript alone can conjure a genuine user interaction. ([MDN Web Docs][5])


Execution via external browser control.

  • MAY be used when an out-of-process driver is available.
  • MUST adhere to the same Action lifecycle and Verification rules.
  • WebDriver BiDi or equivalent mechanisms ARE permitted implementations.

Low-level input synthesis.

  • MAY be used when semanticUi is insufficient.
  • An in-page Executor MUST treat this mode as not user-activation-safe.
  • This mode SHOULD NOT be used for security-critical or browser-gated flows.

Visual fallback.

  • MAY only be used when other modes are unavailable or unsuccessful.
  • MUST document the chosen target point and subsequent verification.

Reads the state, text, value, or description of a target.

  • returnValue SHOULD carry the read state or content.

Focuses the target.

  • focus.target == instanceId

Presentation-only action.

  • Need not change the Web state.
  • verification.policy="none" is permissible.

Activates a target.

  • Prefer appAction
  • otherwise semanticUi with scrollIntoView() and activation
  • other modes only when necessary
  • capability-default or requested Success Signals

Writes text into an editable field.

  • Focus the target
  • Optionally clear the existing value
  • Set the value
  • Trigger app-compatible input semantics
  • Verify the resulting value
  • value.equals or element.state.textValue

Sets textual content to empty.

Selects an option in combobox, listbox, select, or similar controls.

Changes a binary or tri-state condition.

Opens or closes expandable targets.

Opens or closes modal or temporary surfaces.

Brings a target into the visible area.

Scrolls an element or the viewport.

Completes a form-like interaction.

Changes the route or URL.

Executes a declared Domain Action.


When verification is not set, the Runtime MUST use:

  1. success signals from the ActionDescriptor, if present
  2. otherwise success signals of the target, if present
  3. otherwise a profile-typical minimal verification

For the Web, minimal verification SHOULD look like this:

  • For ui.focus: Focus is on the target
  • For ui.enterText: Target value matches the expected value
  • For ui.activate: At least one plausible state change, e.g., route change, dialog transition, toast, state delta
  • For nav.navigate: Route or URL has changed
  • For ui.highlight: No state verification required
  • none: No formal verification
  • any: At least one signal must occur
  • all: All signals must occur
  • capability-default: App- or target-side defaults

When requireRevisionAdvance=true, at least one new PageGraph revision MUST be observed before success is declared.


The Runtime MAY attempt recovery but MUST remain cautious.

  • Re-resolve on stale target
  • scrollIntoView on offscreen target
  • Short re-waits during transient busy/loading states
  • Re-verification after a documented UI update
  • Re-triggering non-idempotent Actions without reliable knowledge of the side-effect status
  • Skipping required confirmations
  • Accessing opaque cross-origin or closed shadow regions

On failure, sideEffectState MUST be set to:

  • none
  • applied
  • unknown

For non-idempotent Actions, unknown MUST be treated as a hard warning signal.


The Web intentionally restricts APIs that could cause poor user experiences or abuse; some features only work with active or previously completed User Activation, and the triggering input events must be trusted. Therefore, UIAP MUST be able to cleanly pause an Action and report waiting_for_user or user_activation_required, rather than trying to circumvent browser security boundaries with script trickery. ([MDN Web Docs][2])

When an Action fails due to this, the Executor MUST:

  1. Send action.progress with stage="waiting_for_user",
  2. Publish a comprehensible note text,
  3. RESUME after actual user interaction or cleanly FAIL.

No matching target found.

Multiple plausible targets, no clear winner.

Target exists but is not actionable.

Execution requires genuine user interaction.

Target resides in an opaque cross-origin region.

Target resides in a closed Shadow Root without a bridge.

Execution ran but expected signals did not occur.

Retry would be impermissible for domain or security reasons.


A conforming Runtime Executor MUST:

  • Accept action.request,
  • Support target resolution from stableId and semantic targets,
  • Perform precondition checks,
  • Support at least appAction or semanticUi,
  • Support Confirmation Flows,
  • Verify Success Signals,
  • Send action.result with verification and optionally error.

17. Example: ui.activate on a Submit Button

Section titled “17. Example: ui.activate on a Submit Button”
{
"uiap": "0.1",
"kind": "request",
"type": "action.request",
"id": "msg_77",
"sessionId": "sess_123",
"ts": "2026-03-26T14:03:00.000Z",
"source": { "role": "agent", "id": "onboarding-agent" },
"payload": {
"actionId": "ui.activate",
"target": {
"ref": { "by": "stableId", "value": "video.submit" },
"expectedRole": "button",
"expectedName": "Video erstellen"
},
"verification": {
"policy": "all",
"signals": [
{ "kind": "route.changed", "pattern": "/videos/:id" },
{ "kind": "toast.contains", "text": "erstellt" }
],
"timeoutMs": 8000,
"requireRevisionAdvance": true
},
"presentation": {
"ghostCursor": true,
"highlight": "spotlight",
"pace": "humanized"
},
"timeoutMs": 12000
}
}
{
"uiap": "0.1",
"kind": "response",
"type": "action.accepted",
"id": "msg_78",
"correlationId": "msg_77",
"sessionId": "sess_123",
"ts": "2026-03-26T14:03:00.015Z",
"source": { "role": "bridge", "id": "web-runtime" },
"payload": {
"actionHandle": "act_991",
"actionId": "ui.activate",
"status": "accepted"
}
}
{
"uiap": "0.1",
"kind": "event",
"type": "action.progress",
"id": "msg_79",
"sessionId": "sess_123",
"ts": "2026-03-26T14:03:00.020Z",
"source": { "role": "bridge", "id": "web-runtime" },
"payload": {
"actionHandle": "act_991",
"stage": "resolving_target",
"resolvedTarget": {
"by": "stableId",
"instanceId": "el_submit",
"stableId": "video.submit",
"documentId": "doc_root",
"scopeId": "scope_form",
"role": "button",
"name": "Video erstellen",
"bbox": { "x": 120, "y": 420, "width": 180, "height": 40 }
}
}
}
{
"uiap": "0.1",
"kind": "event",
"type": "action.result",
"id": "msg_80",
"sessionId": "sess_123",
"ts": "2026-03-26T14:03:01.220Z",
"source": { "role": "bridge", "id": "web-runtime" },
"payload": {
"actionHandle": "act_991",
"actionId": "ui.activate",
"status": "succeeded",
"chosenExecutionMode": "semanticUi",
"resolvedTarget": {
"by": "stableId",
"instanceId": "el_submit",
"stableId": "video.submit",
"documentId": "doc_root",
"scopeId": "scope_form",
"role": "button",
"name": "Video erstellen"
},
"verification": {
"passed": true,
"policy": "all",
"observed": [
{ "kind": "route.changed", "pattern": "/videos/:id" },
{ "kind": "toast.contains", "text": "erstellt" }
],
"timeoutMs": 8000
},
"sideEffectState": "applied",
"stateRevision": "rev_20"
}
}

  • [UIAP-CORE] UIAP Core v0.1
  • [UIAP-CAP] UIAP Capability Model v0.1
  • [RFC2119] Key words for use in RFCs to Indicate Requirement Levels, BCP 14
  • Action handlers MUST validate input parameters before executing domain operations.
  • sideEffectState MUST be reported correctly to make unintended state changes traceable.
  • Non-idempotent Actions SHOULD NOT perform automatic retries.
  • The Confirmation flow MUST be tamper-proof; a granted response MUST NOT originate from an unauthorized source.
VersionDateChanges
0.12026-03-27Initial draft