UIAP Agent Integration Guide
UIAP Agent Integration Guide v0.1
Section titled “UIAP Agent Integration Guide v0.1”| Field | Value |
|---|---|
| Status | Informative (non-normative) |
| Version | 0.1 |
| Date | 2026-03-27 |
| Dependencies | [UIAP-CORE], [UIAP-CAP], [UIAP-WEB], [UIAP-ACTION], [UIAP-POLICY], [UIAP-WORKFLOW] |
| Editors | Patrick |
Companion document. Informative, not normative.
UIAP describes Transport, Capability Model, Web Profile, Runtime, Policy, and Workflows, but deliberately leaves open how an agent plans internally. This is exactly the point where everyone builds their own bridge. This guide describes a pragmatic integration layer for an LLM-based agent that takes UIAP artifacts seriously without making the model the source of truth.
Classification
Section titled “Classification”The basic idea is simple:
- The complete UIAP state lives outside the LLM in a local State Store.
- The LLM sees only a redacted, compressed, and step-relevant view of this state.
- Capability, Policy, and Workflow are not dumped into the prompt as loose documentation, but translated into dedicated runtime components.
- Every write operation or non-trivial action passes through Policy Preflight, Action Runtime, and Observation/Verification.
- Success is never “believed” but derived from
action.result,web.state.delta, andweb.signal.
When Authoring Bundles are available, the agent integration should ideally consume the compiled bundle, not loose manifests. The bundle is the runtime-proximate source for effective Actions, Policies, and Workflows.
Note on the current draft state: The Policy Extension appears in the material sometimes as
uicp.policyand elsewhere asuiap.policy. The examples in this guide useuiap.policy.*. A real implementation should alias both namespaces until this is cleaned up.
0. Recommended Integration Architecture
Section titled “0. Recommended Integration Architecture”A sensible agent host layer separates six responsibilities:
-
Session/Transport Handshake, Capability Fetch, Observe Subscription, Action Requests.
-
State Store Holds the latest complete
PageGraph, applies Deltas, buffers Signals and Revisions. -
Policy Gateway Evaluates Policy before actions and additionally produces a redacted model view of Snapshot, Signals, and Return Values.
-
Workflow Index Matches available Workflows against goal, route, roles, grants, and mode.
-
Tool Registry Translates
ActionDescriptor[]into LLM Tools or into a smaller, context-dependent tool selection. -
Context Builder Builds a compact planner view from raw UIAP state for the LLM.
A minimal runtime model looks like this:
interface AgentRuntime { state: StateStore; policy: PolicyGateway; workflows: WorkflowIndex; tools: ToolRegistry; context: ContextBuilder; planner: Planner;}
interface PlannerInput { goal: string; policyHints: PolicyHints; context: PlanningContext; workflows: WorkflowRecipe[]; tools: LLMTool[];}The important thing is the separation between raw state and prompt state. The LLM must never be the sole store for route, scope, revision, or last results. Models forget, confuse, and cheerfully fabricate. Charming, but unhelpful.
1. PageGraph -> LLM Context
Section titled “1. PageGraph -> LLM Context”1.1 Do Not Dump the Entire Graph Into the Prompt
Section titled “1.1 Do Not Dump the Entire Graph Into the Prompt”The PageGraph is already a semantically reduced web state, not a full DOM image. Still, it can grow large. The recommended strategy is therefore not “everything into the model” but a two-tier model:
- State Store holds the complete, last known graph.
- Context Builder produces a smaller planner view from it per turn.
The Planner typically does not need the full raw structure. For planning, these fields are primarily important:
| Primarily for Planning | Primarily for Execution / Recovery |
|---|---|
route.routeId, pathname, title | documentId |
scopeId, stableId, role, name | bbox, zIndexHint |
States like visible, enabled, focused, editable, required, invalid, open, busy, loading | targetHints.runtime.css, targetHints.runtime.xpath |
affordances, supportedActions | semantics.attached, inViewport, obscured, stable |
risk, success, current signals, focus | shadowHostId, framePath |
Annotated meaning like meaning or defaultAction | Low-level viewport/scroll details |
Rule of thumb:
- Planning works with stable semantics.
- Execution works with target resolution, actionability, and runtime hints.
- CSS/XPath almost never belong in the LLM planning context. They are technical fallbacks, not identity.
1.2 A Useful Planner View
Section titled “1.2 A Useful Planner View”For the LLM, an object like this usually suffices:
type PlanningElement = { stableId?: string; scopeId?: string; role: string; name?: string; meaning?: string; defaultAction?: string; state: Record<string, unknown>; supportedActions: string[]; risk?: { level: "safe" | "confirm" | "blocked"; tags?: string[]; }; success?: Array<Record<string, unknown>>; confidence?: "high" | "medium" | "low";};
interface PlanningContext { revision: string; route?: { routeId?: string; pathname?: string; title?: string; }; activeScopes: Array<{ scopeId: string; kind: string; stableId?: string; name?: string; parentScopeId?: string; }>; focus?: { stableId?: string; role?: string; name?: string; }; candidateElements: PlanningElement[]; recentSignals: Array<{ kind: string; level?: string; text?: string; scopeId?: string; }>;}1.3 Filter by Relevance Instead of Blindly Truncating
Section titled “1.3 Filter by Relevance Instead of Blindly Truncating”A workable heuristic for candidateElements prioritizes:
- Elements in open dialogs, drawers, popovers
- Elements in the focused scope
- Elements with
stableId - Elements with
defaultActionor domain action insupportedActions - invalid, required, busy, open, or selected states
- Elements with
risk.level = confirm|blocked - Visible feedback elements like Toast, Alert, Status
And it deprioritizes:
- Purely decorative items
- Repeated list entries without current relevance
- Large amounts of text without control relevance
- Offscreen or obscured targets, as long as they are not needed for the current step
A simple builder function can look like this:
function buildPlanningContext(graph: PageGraph, maxElements = 30): PlanningContext { const activeScopeIds = pickActiveScopes(graph);
const candidateElements = graph.elements .filter((el) => isRelevantForPlanning(el, activeScopeIds)) .map((el) => ({ stableId: el.stableId, scopeId: el.scopeId, role: el.role, name: el.name, meaning: el.targetHints?.annotations?.meaning, defaultAction: el.targetHints?.annotations?.defaultAction, state: pickState(el.state, [ "visible", "enabled", "focused", "editable", "required", "invalid", "selected", "expanded", "open", "busy", "loading" ]), supportedActions: el.supportedActions, risk: el.risk, success: el.success, confidence: deriveConfidence(el) })) .sort((a, b) => scorePlanningElement(b) - scorePlanningElement(a)) .slice(0, maxElements);
return { revision: graph.revision, route: graph.route ? { routeId: graph.route.routeId, pathname: graph.route.pathname, title: graph.route.title } : undefined, activeScopes: pickScopes(graph, activeScopeIds).map((scope) => ({ scopeId: scope.scopeId, kind: scope.kind, stableId: scope.stableId, name: scope.name, parentScopeId: scope.parentScopeId })), focus: resolveFocus(graph), candidateElements, recentSignals: (graph.signals ?? []).slice(-8).map((sig) => ({ kind: sig.kind, level: sig.level, text: sig.text, scopeId: sig.scopeId })) };}1.4 Separate Planning Context and Execution Context
Section titled “1.4 Separate Planning Context and Execution Context”The model should not immediately see all execution details. For the next step, the semantic form is almost always sufficient:
{ "route": { "routeId": "videos.new", "title": "Neues Video" }, "activeScopes": [ { "scopeId": "scope_form", "kind": "form", "stableId": "video.create.form", "name": "Video erstellen" } ], "focus": { "stableId": "video.title", "role": "textbox", "name": "Titel" }, "candidateElements": [ { "stableId": "video.title", "role": "textbox", "name": "Titel", "state": { "enabled": true, "required": true }, "supportedActions": ["ui.focus", "ui.enterText", "ui.clearText"] }, { "stableId": "video.submit", "role": "button", "name": "Video erstellen", "state": { "enabled": true }, "supportedActions": ["ui.activate", "video.create"], "risk": { "level": "confirm", "tags": ["external_effect"] } } ]}When the Planner actually selects an action, the host layer can enrich the target with additional information for execution:
interface ExecutionTarget { stableId?: string; documentId: string; scopeId?: string; role: string; name?: string; bbox?: { x: number; y: number; width: number; height: number }; runtimeHints?: { css?: string; xpath?: string }; actionability?: { attached?: boolean; inViewport?: boolean; obscured?: boolean; stable?: boolean; };}1.5 Redaction First, Compression Second
Section titled “1.5 Redaction First, Compression Second”If the snapshot contains sensitive values, the host layer should redact first and then build the planner context. Otherwise the password field with masking might not end up in the prompt, but the raw value is still sitting in textValue or semanticValue of an earlier intermediate representation. People call this kind of thing a “minor oversight.” Auditors do not.
2. Capability Document -> Tool Schema
Section titled “2. Capability Document -> Tool Schema”2.1 Yes, the Capability Document Can Be Directly Translated Into Tools
Section titled “2.1 Yes, the Capability Document Can Be Directly Translated Into Tools”The Capability Document is practically a template for tool definitions already. An ActionDescriptor contains:
idkindtargetKindsrequiredAffordancesexecutionModesargsidempotencyrisksuccess
Pragmatically, this means:
- Domain Actions are almost always exposed as individual tools.
- Primitive Actions can either be exposed directly or routed through a generic
run_uiap_actiontool. - The tools currently visible in the prompt are ideally the intersection of the global Capability Document and local
supportedActionson currently relevant elements/scopes.
2.2 Two Workable Variants
Section titled “2.2 Two Workable Variants”Variant A: One Tool per Action
Good for smaller apps or when you want to explicitly give the model many domain actions.
Variant B: Generic Runtime Tool + Action Shortlist
Good for large catalogs. The model then gets a small list of allowed actions in its context and calls a single tool that builds action.request.
In practice, B is often more stable because the model does not have to juggle 200 function names like a sleep-deprived circus director.
2.3 Mapping to a Tool Schema
Section titled “2.3 Mapping to a Tool Schema”interface LLMTool { name: string; description: string; inputSchema: Record<string, unknown>; meta: { uiapActionId: string; risk: ActionDescriptor["risk"]; idempotency?: ActionDescriptor["idempotency"]; executionModes: ActionDescriptor["executionModes"]; success?: ActionDescriptor["success"]; };}
function compileActionTool(action: ActionDescriptor): LLMTool { const needsTarget = action.targetKinds.some((kind) => kind !== "none");
const properties: Record<string, unknown> = {}; const required: string[] = [];
if (needsTarget) { properties.target = { type: "object", additionalProperties: false, properties: { stableId: { type: "string" }, scopeId: { type: "string" }, role: { type: "string" }, name: { type: "string" } } }; required.push("target"); }
for (const arg of action.args ?? []) { properties[arg.name] = argToJsonSchema(arg); if (arg.required) required.push(arg.name); }
return { name: action.id.replaceAll(".", "_").replaceAll("-", "_"), description: action.description ?? action.title ?? action.id, inputSchema: { type: "object", additionalProperties: false, properties, required }, meta: { uiapActionId: action.id, risk: action.risk, idempotency: action.idempotency, executionModes: action.executionModes, success: action.success } };}
function argToJsonSchema(arg: ActionArgDescriptor): Record<string, unknown> { if (arg.type === "enum") { return { type: "string", enum: arg.enum ?? [] }; }
if (arg.type === "array") return { type: "array" }; if (arg.type === "object") return { type: "object" }; if (arg.type === "number") return { type: "number" }; if (arg.type === "boolean") return { type: "boolean" };
return { type: "string" };}2.4 Do Not Confuse Expected Results With Tool Args
Section titled “2.4 Do Not Confuse Expected Results With Tool Args”success signals generally do not belong in the tool input, but in the tool metadata or in the controller logic. The tool should semantically say: “I want video.create with these args,” not: “And here are eight internal verification details, because the model gets anxious otherwise.”
A workable tool return is shaped more like this:
type ToolResult = | { status: "accepted"; actionHandle: string } | { status: "waiting_confirmation"; actionHandle: string; preview?: unknown } | { status: "waiting_user"; note: string } | { status: "completed"; result: ActionResultPayload } | { status: "blocked"; reason: string };2.5 Curated Tool Exposition
Section titled “2.5 Curated Tool Exposition”Not every Capability needs to be visible in the LLM at all times. A good host layer shows per turn only:
- Actions that match the current route or active scopes
- Actions backed by visible
supportedActions - Domain actions from top workflows
- A few safe primitives like
ui.read,ui.focus,ui.enterText,ui.activate,nav.navigate
The rest remains internally available but outside the current planner budget.
3. Workflow Definitions -> Planning Context
Section titled “3. Workflow Definitions -> Planning Context”3.1 Workflows Are Neither Holy Scripture Nor Mere Decoration
Section titled “3.1 Workflows Are Neither Holy Scripture Nor Mere Decoration”An agent should use available workflow definitions in three modes:
-
As Executable Recipes When a workflow matches well, the required inputs are known, and the desired mode (
guide,assist,auto) is permitted,uiap.workflow.startis often better than freeform step-by-step planning. -
As Plan Skeletons When the workflow fundamentally fits but the current UI or inputs deviate slightly, the planner can use the steps as a template and adapt locally.
-
As Negative Guardrails
handoff,collect,ensure, andconfirmpatterns tell the agent where it should not get creative.
3.2 Do Not Put the Entire Catalog Into the Prompt
Section titled “3.2 Do Not Put the Entire Catalog Into the Prompt”A workflow catalog can be large. In the planner context, the top 1 to 3 candidates usually suffice. The matching process matters:
- Match intent or goal text
- Check
routeIdand current scopes - Check
requiredActionsagainst capabilities - Check roles, grants, and policy situation
- Filter by
mode
interface WorkflowRecipe { workflowId: string; score: number; reason: string; missingInputs: string[]; steps: Array<{ id: string; type: string; actionId?: string; parameterNames?: string[]; }>;}
async function buildWorkflowRecipes(goal: string, routeId?: string): Promise<WorkflowRecipe[]> { const matches = await workflowClient.match({ intent: goal, routeId, mode: "assist", maxResults: 3 });
return matches.candidates.map((candidate) => ({ workflowId: candidate.workflowId, score: candidate.score, reason: candidate.reason ?? "workflow matched", missingInputs: candidate.missingInputs ?? [], steps: projectWorkflow(candidate.workflowId) }));}3.3 What the LLM Should Actually See
Section titled “3.3 What the LLM Should Actually See”Instead of the full definition, this format often suffices:
[ { "workflowId": "video.create_first_video", "score": 0.92, "missingInputs": ["title"], "steps": [ { "id": "collect_title", "type": "collect", "parameterNames": ["title"] }, { "id": "go_to_form", "type": "action", "actionId": "nav.navigate" }, { "id": "fill_title", "type": "action", "actionId": "ui.enterText" }, { "id": "create_video", "type": "action", "actionId": "video.create" }, { "id": "done", "type": "complete" } ] }]For planning purposes, this is usually far more useful than the complete workflow definition with every localization and review detail.
3.4 Take Authoring and Discovery Provenance Seriously
Section titled “3.4 Take Authoring and Discovery Provenance Seriously”When workflows come from an authoritative bundle, they can serve as genuine recipes. When they are merely discovery candidates, they should be treated more as hints or skeletons, not as autonomous truth. Discovery delivers candidates with confidence and review requirements; Authoring turns them into published, effective runtime artifacts.
3.5 A Good Division of Labor Between LLM and Workflow Engine
Section titled “3.5 A Good Division of Labor Between LLM and Workflow Engine”A robust division of labor looks like this:
- Workflow Engine: Applicability, step ordering, checkpoints, policy integration, resume
- LLM: Support intent matching, procure missing inputs, formulate suggestions, locally adapt when UI and recipe diverge, guide the user intelligibly through
waiting_confirmationorwaiting_user
When the workflow fits cleanly, the engine should lead. Freeform planning is not heroic when a good recipe already exists.
4. Policy as Constraint
Section titled “4. Policy as Constraint”4.1 Policy Does Not Primarily Belong in the Prompt
Section titled “4.1 Policy Does Not Primarily Belong in the Prompt”The Policy specification is deliberately decision-oriented, locally enforceable, and to be evaluated before non-trivial actions. Practically, this means:
- The Policy document itself is not the primary interface to the LLM.
- The Policy decision is the operative truth.
- The LLM gets at most a compact summary of global rules plus the concrete decision per action.
The sensible structure is three-layered:
-
System/Planner Hints Short, stable rules such as: no credential entry,
confirmdoes not mean continue autonomously,handoffis not a failure. -
Policy Summary in Context What grants does the current Principal have? Which domains tend toward handoff or deny?
-
Preflight Before Every Action The host layer calls
policy.evaluateand treats the result as a hard constraint.
4.2 A Small Planner View Usually Suffices
Section titled “4.2 A Small Planner View Usually Suffices”interface PolicyHints { principal: { id: string; roles?: string[]; grants?: string[]; }; hardStops: string[]; confirmRules: string[]; handoffRules: string[]; redaction: Array<{ applyTo: string[]; replacement: string; }>;}Example:
{ "principal": { "id": "workspace-admin", "roles": ["admin"], "grants": ["observe", "guide", "draft", "act"] }, "hardStops": [ "credential/secret data is never exposed directly to the model", "blocked actions are not executed autonomously" ], "confirmRules": [ "confirm-risk actions require explicit confirmation before execution" ], "handoffRules": [ "user activation, credential entry and payment approval cause handoff" ], "redaction": [ { "applyTo": ["snapshot", "audit", "returnValue"], "replacement": "[REDACTED]" } ]}4.3 Preflight Is the Actual Enforcement
Section titled “4.3 Preflight Is the Actual Enforcement”async function preflightAction(input: { actionId: string; target?: { stableId?: string; role?: string; scopeId?: string; name?: string }; risk?: RiskDescriptor; dataClasses?: string[]; sideEffectClass?: string; args?: Record<string, unknown>;}): Promise<PolicyDecision> { return policyClient.evaluate({ context: { principal: currentPrincipal, actionId: input.actionId, target: input.target, risk: input.risk, dataClasses: input.dataClasses, sideEffectClass: input.sideEffectClass, args: input.args } });}The result is then treated strictly:
function handlePolicyDecision(decision: PolicyDecision): | { kind: "proceed" } | { kind: "confirm"; obligations?: unknown[] } | { kind: "handoff"; obligations?: unknown[] } | { kind: "deny"; reasonCodes: string[] } { switch (decision.decision) { case "allow": return { kind: "proceed" }; case "confirm": return { kind: "confirm", obligations: decision.obligations }; case "handoff": return { kind: "handoff", obligations: decision.obligations }; case "deny": default: return { kind: "deny", reasonCodes: decision.reasonCodes }; }}4.4 Keep Redaction and Action Admissibility Separate
Section titled “4.4 Keep Redaction and Action Admissibility Separate”Policy models redaction separately from action permission. This matters for agent integration:
- A field can be redacted without the entire screen disappearing from planning.
- A snapshot for the model can contain placeholders while the host layer continues to work internally with structural knowledge.
- The LLM should see red zones as visible but masked, not as “invisible,” if the surroundings would otherwise become unintelligible.
4.5 Treat confirm and handoff as Structured States
Section titled “4.5 Treat confirm and handoff as Structured States”These cases should not end as free-form chat instructions like “I need a moment of help.” Better is a structured state in the controller:
confirm-> explicit confirmation request, thenaction.confirmation.grantordenyhandoff-> clear transfer to user responsibilitydeny-> alternative planning
The model may explain, but must not simulate enforcement.
5. Observation Loop
Section titled “5. Observation Loop”5.1 Recommended Standard Loop
Section titled “5.1 Recommended Standard Loop”A sensible agent loop looks like this:
- Establish session
- Load Capabilities / Policy / Workflows
- Start Snapshot or Observe stream
- Build redacted planning context
- Plan next step
- Policy Preflight
- Execute Action or Workflow
- Observe result, deltas, and signals
- Update context and replan
With UIAP messages, this is typically:
agent -> session.initializeapp -> session.initialized
agent -> capabilities.getagent -> uiap.policy.get (oder uicp.policy.get in älteren Drafts)agent -> uiap.workflow.getagent -> web.observe.start
app -> capabilities.listapp -> uiap.policy.documentapp -> uiap.workflow.documentapp -> web.state.snapshotapp -> web.state.delta*app -> web.signal*
agent -> action.requestapp -> action.acceptedapp -> action.progress*app -> action.confirmation.request?agent -> action.confirmation.grant / deny?app -> action.resultapp -> web.state.delta*app -> web.signal*5.2 Maintain a Local State Store
Section titled “5.2 Maintain a Local State Store”The State Store is not glamorous, but indispensable.
class StateStore { private graph?: PageGraph; private revision?: string; private recentSignals: WebSignal[] = [];
applySnapshot(graph: PageGraph) { this.graph = graph; this.revision = graph.revision; this.recentSignals = graph.signals ?? []; }
applyDelta(delta: WebStateDeltaPayload) { if (!this.graph || delta.baseRevision !== this.revision) { throw new Error("Revision gap: full snapshot required"); }
this.graph = applyWebDelta(this.graph, delta.ops); this.graph.revision = delta.revision; this.revision = delta.revision; this.recentSignals.push(...(delta.signals ?? [])); this.recentSignals = this.recentSignals.slice(-20); }
current(): PageGraph { if (!this.graph) throw new Error("No snapshot available"); return this.graph; }}When baseRevision does not match, the agent should not guess but request a new web.state.get snapshot.
5.3 A Turn Controller
Section titled “5.3 A Turn Controller”class UIAPAgentController { constructor( private readonly state: StateStore, private readonly policy: PolicyGateway, private readonly tools: ToolRegistry, private readonly workflows: WorkflowIndex, private readonly planner: Planner, private readonly runtime: RuntimeClient ) {}
async next(goal: string) { const rawGraph = this.state.current(); const modelGraph = await this.policy.redactGraph(rawGraph); const context = buildPlanningContext(modelGraph);
const plannerInput: PlannerInput = { goal, policyHints: await this.policy.hints(), context, workflows: await this.workflows.match(goal, context), tools: this.tools.forContext(context) };
const plan = await this.planner.next(plannerInput);
if (plan.kind === "workflow.start") { return this.runtime.startWorkflow(plan.workflowId, plan.inputs); }
if (plan.kind === "action") { const decision = await this.policy.preflight(plan.toPolicyContext()); const policyResult = handlePolicyDecision(decision);
if (policyResult.kind === "deny") { return { kind: "replan", reason: policyResult.reasonCodes.join(",") }; }
if (policyResult.kind === "handoff") { return { kind: "waiting_user", obligations: policyResult.obligations }; }
if (policyResult.kind === "confirm") { return { kind: "waiting_confirmation", obligations: policyResult.obligations }; }
return this.runtime.requestAction(plan); }
return { kind: "respond", message: plan.message }; }}5.4 Take Verification Seriously
Section titled “5.4 Take Verification Seriously”action.result is important, but not the sole truth. Good controllers use at least three signals:
action.result.verification- Observed
web.signalevents web.state.delta/ revision progress
Especially with ui.activate, ui.submit, or domain actions, the agent should not infer success merely from a tool saying “ok.” A clean UI transition, toast, dialog state, or entity signal is considerably more reliable.
5.5 Workflow Loop Instead of Individual Actions
Section titled “5.5 Workflow Loop Instead of Individual Actions”When a workflow has been started, the loop is similar but shifts to workflow messages:
uiap.workflow.starteduiap.workflow.progressuiap.workflow.input.requestuiap.workflow.input.provideuiap.workflow.result
The controller should still consider workflow and action events together, because action steps internally run through the runtime again.
6. Context Budget Management
Section titled “6. Context Budget Management”6.1 The Key Principle
Section titled “6.1 The Key Principle”PageGraph can grow large. Yet the solution is almost never to compress even more aggressively and throw away semantics until only “there’s something with a button” remains. Better is progressive detailing:
- Overview first
- Then the active scope
- Then targeted detail request for the current step
6.2 Three Budget Tiers
Section titled “6.2 Three Budget Tiers”Tier A: Overview Context
For the planner start:
- Route
- Active dialogs/drawers/popovers
- Focus
- 15 to 30 relevant elements
- Last 5 to 10 signals
- 1 to 3 workflow recipes
- Curated tool list
Tier B: Scope Detail
When a particular area becomes relevant:
- Only one or two scopes
- All important controls in the scope
- Validation and status states
- Possibly relations like label -> field or submit -> form
Tier C: Execution Detail
Only when a concrete action is imminent or ambiguity exists:
documentId- Actionability fields
bbox- Runtime hints
- Possibly additional same-named candidates for disambiguation
6.3 Using Scope Filtering in Practice
Section titled “6.3 Using Scope Filtering in Practice”The Web Profile already provides the right levers for this:
web.state.get.scopesweb.state.get.documentsincludeHiddenincludeNonInteractivemaxNodes
A host layer can thus load targeted details:
async function ensureScopeDetail(scopeId: string) { if (stateHasEnoughScopeDetail(scopeId)) return;
const snapshot = await transport.request<WebStateSnapshotPayload>("web.state.get", { scopes: [scopeId], includeHidden: false, includeNonInteractive: false, maxNodes: 80 });
stateStore.applySnapshot(mergeScopedSnapshot(stateStore.current(), snapshot.graph));}6.4 Element Prioritization
Section titled “6.4 Element Prioritization”A robust prioritization favors:
stableIdover purely heuristic targets- Visible, interactive elements over passive text
- Open modal scope over background content
- invalid/required/busy/open over neutral state
- Elements with domain action over purely generic controls
- Annotated or registry-backed semantics over
inferred
When WebSemantics.sources or discovery evidence relies only on heuristics, the planner should treat this as lower confidence. This helps the model, on fuzzy surfaces, ask for detail or confirmation rather than confidently building nonsense. A rare but lovely virtue.
6.5 Summarize Collections
Section titled “6.5 Summarize Collections”Large tables or lists should not land fully in the prompt. Better is a signature like:
interface CollectionSummary { scopeId: string; name?: string; count: number; visibleItems: Array<{ stableId?: string; role: string; name?: string; selected?: boolean; supportedActions: string[]; }>; omittedCount: number;}The planner then sees something like:
- “37 invoices visible, 5 in the current viewport, 1 of which is selected”
- “each row item supports
ui.activateandbilling.openSettings”
Only when a specific row becomes relevant is its scope or row context loaded on demand.
6.6 Separate Cached and Stable Contexts From Live Context
Section titled “6.6 Separate Cached and Stable Contexts From Live Context”Not everything belongs in every turn:
- Stable across session: Capabilities, global policy hints, workflow metadata
- Slowly changing: Route, visible main scopes, principal role
- Highly dynamic: Focus, validation, toasts, busy states, action results
A good prompt contains only what is currently needed. Capabilities or workflow descriptions do not need to be re-serialized every time a spinner appears.
6.7 Recommended Default Budget per Planner Turn
Section titled “6.7 Recommended Default Budget per Planner Turn”As a starting point, this often works:
- 1 route context
- Up to 4 active scopes
- 20 to 30 elements
- 8 current signals
- 1 to 3 workflow candidates
- 8 to 15 tools or 1 generic runtime tool + action shortlist
This is small enough for plannable turns and large enough that the model still genuinely understands the interface.
7. Practical Guidelines for a Robust Integration
Section titled “7. Practical Guidelines for a Robust Integration”7.1 The LLM Plans, but the System Decides
Section titled “7.1 The LLM Plans, but the System Decides”The model may make suggestions such as:
- Next action
- Preferred workflow
- Required additional details
- Natural language to the user
The host layer, however, decides on:
- Policy admissibility
- Redaction
- Actual action requests
- Confirmation/Handoff
- State authority and revisions
7.2 Think supportedActions Locally, actions[] Globally
Section titled “7.2 Think supportedActions Locally, actions[] Globally”The Capability Document says what is fundamentally possible. The PageGraph says what is here and now sensibly addressable on the current screen. An agent should combine both levels, not confuse them.
7.3 When in Doubt, Load More Details Rather Than Guessing
Section titled “7.3 When in Doubt, Load More Details Rather Than Guessing”When two buttons share the same name or a target was only found heuristically, the right response is not “it’ll probably work,” but:
- Narrow the scope
- Explicitly show target candidates in the context
- If needed, include
bboxor neighborhood information - Or obtain additional clarification from the user/workflow
7.4 User Activation and Human Handoff Are Normal States
Section titled “7.4 User Activation and Human Handoff Are Normal States”When Runtime or Policy reports waiting_for_user, user_activation_required, or handoff, this is not a failure of the agent but the correct response to platform and security boundaries.
7.5 Discovery Is Input, Not Runtime Truth
Section titled “7.5 Discovery Is Input, Not Runtime Truth”Discovery packages are excellent for preparing bindings, actions, and workflow candidates. In live agent integration, however, unreviewed discovery candidates should not have the same status as published authoring/bundle artifacts.
8. A Simple Reference Strategy
Section titled “8. A Simple Reference Strategy”If you boil the whole thing down to one practical statement, it goes like this:
Keep the full UIAP state locally, show the LLM only a redacted and scope-scoped semantic view, compile Capabilities into Tools, treat Workflows as prioritized recipes, enforce Policy hard outside the model, and verify every side effect through results, deltas, and signals.