UIAP Agent Integration Guide

UIAP Agent Integration Guide v0.1

Field	Value
Status	Informative (non-normative)
Version	0.1
Date	2026-03-27
Dependencies	[UIAP-CORE], [UIAP-CAP], [UIAP-WEB], [UIAP-ACTION], [UIAP-POLICY], [UIAP-WORKFLOW]
Editors	Patrick

Companion document. Informative, not normative.

UIAP describes transport, capability model, web profile, runtime, policy and workflows, but deliberately leaves open how an agent plans internally. This is exactly where everyone otherwise builds their own bridge. This guide describes a pragmatic integration layer for an LLM-based agent that takes UIAP artifacts seriously without making the model the source of truth.

Classification

The basic idea is simple:

The complete UIAP state lives outside the LLM in a local state store.
The LLM sees only a redacted, compressed and step-relevant view of this state.
Capability, Policy and Workflow are not dumped as loose documentation into the prompt, but translated into dedicated runtime components.
Every write or non-trivial action passes through policy preflight, action runtime and observation/verification.
Success is never “believed” but derived from action.result, web.state.delta and web.signal.

When authoring bundles are available, agent integration should ideally consume the compiled bundle, not loose manifests. The bundle is the runtime-proximate source for effective actions, policies and workflows.

Note: The examples in this guide consistently use the namespace uiap.policy.*.

0. Recommended Integration Architecture

A reasonable agent host layer separates six responsibilities:

Session/Transport Handshake, capability fetch, observe subscription, action requests.
State Store Holds the latest complete PageGraph, applies deltas, buffers signals and revisions.
Policy Gateway Evaluates policy before actions and additionally produces a redacted model view of snapshots, signals and return values.
Workflow Index Matches available workflows against goal, route, roles, grants and mode.
Tool Registry Translates ActionDescriptor[] into LLM tools or into a smaller, context-dependent tool selection.
Context Builder Builds a compact planner view for the LLM from the raw UIAP state.

A minimal runtime model looks like this:

interface AgentRuntime {
  state: StateStore;
  policy: PolicyGateway;
  workflows: WorkflowIndex;
  tools: ToolRegistry;
  context: ContextBuilder;
  planner: Planner;
}

interface PlannerInput {
  goal: string;
  policyHints: PolicyHints;
  context: PlanningContext;
  workflows: WorkflowRecipe[];
  tools: LLMTool[];
}

What matters is the separation between raw state and prompt state. The LLM must never be the sole store for route, scope, revision or latest results. Models forget, confuse and happily invent things. Truly endearing, but unhelpful.

1. PageGraph -> LLM Context

1.1 Do Not Dump the Entire Graph into the Prompt

The PageGraph is already a semantically reduced web state, not a full image of the DOM. Nevertheless, it can grow large. The recommended strategy is therefore not “everything into the model” but a two-tier model:

State Store holds the complete, latest known graph.
Context Builder produces a smaller planner view from it per turn.

The planner generally does not need the full raw structure. For planning, these fields are primarily important:

Primary for Planning	Primary for Execution / Recovery
`route.routeId`, `pathname`, `title`	`documentId`
`scopeId`, `stableId`, `role`, `name`	`bbox`, `zIndexHint`
States such as `visible`, `enabled`, `focused`, `editable`, `required`, `invalid`, `open`, `busy`, `loading`	`targetHints.runtime.css`, `targetHints.runtime.xpath`
`affordances`, `supportedActions`	`semantics.attached`, `inViewport`, `obscured`, `stable`
`risk`, `success`, current signals, focus	`shadowHostId`, `framePath`
Annotated meaning such as `meaning` or `defaultAction`	Low-level viewport/scroll details

Rule of thumb:

Planning works with stable semantics.
Execution works with target resolution, actionability and runtime hints.
CSS/XPath almost never belong in the LLM planning context. They are technical fallbacks, not identity.

1.2 A Sensible Planner View

For the LLM, an object like this usually suffices:

type PlanningElement = {
  stableId?: string;
  scopeId?: string;
  role: string;
  name?: string;
  meaning?: string;
  defaultAction?: string;
  state: Record<string, unknown>;
  supportedActions: string[];
  risk?: {
    level: "safe" | "confirm" | "blocked";
    tags?: string[];
  };
  success?: Array<Record<string, unknown>>;
  confidence?: "high" | "medium" | "low";
};

interface PlanningContext {
  revision: string;
  route?: {
    routeId?: string;
    pathname?: string;
    title?: string;
  };
  activeScopes: Array<{
    scopeId: string;
    kind: string;
    stableId?: string;
    name?: string;
    parentScopeId?: string;
  }>;
  focus?: {
    stableId?: string;
    role?: string;
    name?: string;
  };
  candidateElements: PlanningElement[];
  recentSignals: Array<{
    kind: string;
    level?: string;
    text?: string;
    scopeId?: string;
  }>;
}

1.3 Filter by Relevance Instead of Blindly Truncating

A useful heuristic for candidateElements prioritizes:

Elements in open dialogs, drawers, popovers
Elements in the focused scope
Elements with a stableId
Elements with a defaultAction or domain action in supportedActions
invalid, required, busy, open or selected states
Elements with risk.level = confirm|blocked
Visible feedback elements such as toast, alert, status

And it deprioritizes:

Purely decorative items
Repeated list entries without current relevance
Large amounts of text without control relevance
Offscreen or obscured targets, as long as they are not needed for the current step

A simple builder function can look like this:

function buildPlanningContext(graph: PageGraph, maxElements = 30): PlanningContext {
  const activeScopeIds = pickActiveScopes(graph);

  const candidateElements = graph.elements
    .filter((el) => isRelevantForPlanning(el, activeScopeIds))
    .map((el) => ({
      stableId: el.stableId,
      scopeId: el.scopeId,
      role: el.role,
      name: el.name,
      meaning: el.targetHints?.annotations?.meaning,
      defaultAction: el.targetHints?.annotations?.defaultAction,
      state: pickState(el.state, [
        "visible",
        "enabled",
        "focused",
        "editable",
        "required",
        "invalid",
        "selected",
        "expanded",
        "open",
        "busy",
        "loading"
      ]),
      supportedActions: el.supportedActions,
      risk: el.risk,
      success: el.success,
      confidence: deriveConfidence(el)
    }))
    .sort((a, b) => scorePlanningElement(b) - scorePlanningElement(a))
    .slice(0, maxElements);

  return {
    revision: graph.revision,
    route: graph.route
      ? {
          routeId: graph.route.routeId,
          pathname: graph.route.pathname,
          title: graph.route.title
        }
      : undefined,
    activeScopes: pickScopes(graph, activeScopeIds).map((scope) => ({
      scopeId: scope.scopeId,
      kind: scope.kind,
      stableId: scope.stableId,
      name: scope.name,
      parentScopeId: scope.parentScopeId
    })),
    focus: resolveFocus(graph),
    candidateElements,
    recentSignals: (graph.signals ?? []).slice(-8).map((sig) => ({
      kind: sig.kind,
      level: sig.level,
      text: sig.text,
      scopeId: sig.scopeId
    }))
  };
}

1.4 Separate Planning Context and Execution Context

The model should not immediately see all execution details. For the next step, the semantic form almost always suffices:

{
  "route": { "routeId": "videos.new", "title": "New Video" },
  "activeScopes": [
    { "scopeId": "scope_form", "kind": "form", "stableId": "video.create.form", "name": "Create Video" }
  ],
  "focus": { "stableId": "video.title", "role": "textbox", "name": "Title" },
  "candidateElements": [
    {
      "stableId": "video.title",
      "role": "textbox",
      "name": "Title",
      "state": { "enabled": true, "required": true },
      "supportedActions": ["ui.focus", "ui.enterText", "ui.clearText"]
    },
    {
      "stableId": "video.submit",
      "role": "button",
      "name": "Create Video",
      "state": { "enabled": true },
      "supportedActions": ["ui.activate", "video.create"],
      "risk": { "level": "confirm", "tags": ["external_effect"] }
    }
  ]
}

When the planner then actually selects an action, the host layer can enrich the target for execution with additional information:

interface ExecutionTarget {
  stableId?: string;
  documentId: string;
  scopeId?: string;
  role: string;
  name?: string;
  bbox?: { x: number; y: number; width: number; height: number };
  runtimeHints?: { css?: string; xpath?: string };
  actionability?: {
    attached?: boolean;
    inViewport?: boolean;
    obscured?: boolean;
    stable?: boolean;
  };
}

1.5 Redaction First, Compression Second

If the snapshot contains sensitive values, the host layer should redact first and then build the planner context. Otherwise the password field with masking might not end up in the prompt, but the raw value is still embedded in textValue or semanticValue of an earlier intermediate representation. People call this sort of thing a “minor oversight.” Auditors tend to see it differently.

2. Capability Document -> Tool Schema

2.1 Yes, the Capability Document Can Be Directly Translated into Tools

The capability document is practically a template for tool definitions already. An ActionDescriptor already contains:

id
kind
targetKinds
requiredAffordances
executionModes
args
idempotency
risk
success

Pragmatically, this means:

Domain actions are almost always exposed as individual tools.
Primitive actions can either be directly exposed or routed through a generic run_uiap_action tool.
The currently prompt-visible tools are ideally the intersection of the global capability document and the local supportedActions on currently relevant elements/scopes.

2.2 Two Viable Approaches

Approach A: One Tool per Action

Good for smaller apps or when you want to explicitly give the model many domain actions.

Approach B: Generic Runtime Tool + Action Shortlist

Good for large catalogs. The model then receives a small list of permitted actions in context and calls a single tool that builds action.request.

In practice, B is often more stable because the model does not have to juggle 200 function names like an overtired circus director.

2.3 Mapping to a Tool Schema

interface LLMTool {
  name: string;
  description: string;
  inputSchema: Record<string, unknown>;
  meta: {
    uiapActionId: string;
    risk: ActionDescriptor["risk"];
    idempotency?: ActionDescriptor["idempotency"];
    executionModes: ActionDescriptor["executionModes"];
    success?: ActionDescriptor["success"];
  };
}

function compileActionTool(action: ActionDescriptor): LLMTool {
  const needsTarget = action.targetKinds.some((kind) => kind !== "none");

  const properties: Record<string, unknown> = {};
  const required: string[] = [];

  if (needsTarget) {
    properties.target = {
      type: "object",
      additionalProperties: false,
      properties: {
        stableId: { type: "string" },
        scopeId: { type: "string" },
        role: { type: "string" },
        name: { type: "string" }
      }
    };
    required.push("target");
  }

  for (const arg of action.args ?? []) {
    properties[arg.name] = argToJsonSchema(arg);
    if (arg.required) required.push(arg.name);
  }

  return {
    name: action.id.replaceAll(".", "_").replaceAll("-", "_"),
    description: action.description ?? action.title ?? action.id,
    inputSchema: {
      type: "object",
      additionalProperties: false,
      properties,
      required
    },
    meta: {
      uiapActionId: action.id,
      risk: action.risk,
      idempotency: action.idempotency,
      executionModes: action.executionModes,
      success: action.success
    }
  };
}

function argToJsonSchema(arg: ActionArgDescriptor): Record<string, unknown> {
  if (arg.type === "enum") {
    return { type: "string", enum: arg.enum ?? [] };
  }

  if (arg.type === "array") return { type: "array" };
  if (arg.type === "object") return { type: "object" };
  if (arg.type === "number") return { type: "number" };
  if (arg.type === "boolean") return { type: "boolean" };

  return { type: "string" };
}

2.4 Do Not Confuse Expected Results with Tool Args

success signals typically belong not in the tool input but in the tool metadata or in the controller logic. The tool should semantically say: “I want video.create with these args,” not: “And here are eight more internal verification details because the model would otherwise get nervous.”

A useful tool return shape is more like this:

type ToolResult =
  | { status: "accepted"; actionHandle: string }
  | { status: "waiting_confirmation"; actionHandle: string; preview?: unknown }
  | { status: "waiting_user"; note: string }
  | { status: "completed"; result: ActionResultPayload }
  | { status: "blocked"; reason: string };

2.5 Curated Tool Exposure

Not every capability needs to be visible in the LLM at all times. A good host layer shows per turn only:

Actions that match the current route or active scopes
Actions supported by visible supportedActions
Domain actions from the top workflows
A few safe primitives like ui.read, ui.focus, ui.enterText, ui.activate, nav.navigate

The rest remains internally available but outside the current planner budget.

3. Workflow Definitions -> Planning Context

3.1 Workflows Are Neither Holy Scripture Nor Mere Decoration

An agent should use available workflow definitions in three modes:

As executable recipes When a workflow matches well, the required inputs are known and the desired mode (guide, assist, auto) is permitted, uiap.workflow.start is often better than freeform individual planning.
As a plan skeleton When the workflow generally fits but the current UI or inputs deviate slightly, the planner can use the steps as a template and adapt locally.
As a negative guardrail handoff, collect, ensure and confirm patterns show the agent where it should not get creative right now.

3.2 Do Not Put the Entire Catalog into the Prompt

A workflow catalog can be large. In the planner context, the top 1 to 3 candidates usually suffice. For that, the matching process matters:

Match intent or goal text
Check routeId and current scopes
Check requiredActions against capabilities
Check roles, grants and policy situation
Filter by mode

interface WorkflowRecipe {
  workflowId: string;
  score: number;
  reason: string;
  missingInputs: string[];
  steps: Array<{
    id: string;
    type: string;
    actionId?: string;
    parameterNames?: string[];
  }>;
}

async function buildWorkflowRecipes(goal: string, routeId?: string): Promise<WorkflowRecipe[]> {
  const matches = await workflowClient.match({
    intent: goal,
    routeId,
    mode: "assist",
    maxResults: 3
  });

  return matches.candidates.map((candidate) => ({
    workflowId: candidate.workflowId,
    score: candidate.score,
    reason: candidate.reason ?? "workflow matched",
    missingInputs: candidate.missingInputs ?? [],
    steps: projectWorkflow(candidate.workflowId)
  }));
}

3.3 What the LLM Should Actually See

Instead of the full definition, this format often suffices:

[
  {
    "workflowId": "video.create_first_video",
    "score": 0.92,
    "missingInputs": ["title"],
    "steps": [
      { "id": "collect_title", "type": "collect", "parameterNames": ["title"] },
      { "id": "go_to_form", "type": "action", "actionId": "nav.navigate" },
      { "id": "fill_title", "type": "action", "actionId": "ui.enterText" },
      { "id": "create_video", "type": "action", "actionId": "video.create" },
      { "id": "done", "type": "complete" }
    ]
  }
]

For planning, this is usually much more useful than the complete workflow definition with every localization and review detail.

3.4 Take Authoring and Discovery Provenance Seriously

When workflows come from an authoritative bundle, they can serve as genuine recipes. When they originate only from discovery candidates, they should be treated more as hints or a skeleton, not as autonomous truth. Discovery delivers candidates with confidence and review requirements; authoring turns them into published, effective runtime artifacts.

3.5 Good Division of Labor Between LLM and Workflow Engine

A robust division of labor looks like this:

Workflow engine: applicability, step ordering, checkpoints, policy integration, resume
LLM: support intent matching, gather missing inputs, formulate suggestions, locally adapt when the UI diverges from the recipe, guide the user intelligibly through waiting_confirmation or waiting_user

When the workflow fits cleanly, the engine should lead. Freeform planning is not heroic when a good recipe already exists.

4. Policy as a Constraint

4.1 Policy Does Not Primarily Belong in the Prompt

The policy specification is deliberately decision-oriented, locally enforceable and to be evaluated before non-trivial actions. Practically, this means:

The policy document itself is not the primary interface to the LLM.
The policy decision is the operative truth.
The LLM receives at most a compact summary of global rules plus the concrete decision per action.

The sensible structure has three layers:

System/Planner Hints Short, stable rules such as: no credential entry, confirm does not mean continue autonomously, handoff is not a failure.
Policy Summary in Context Which grants does the current principal have? Which domains are generally heavy on handoff or deny?
Preflight Before Every Action The host layer calls policy.evaluate and treats the result as a hard constraint.

4.2 A Small Planner View Usually Suffices

interface PolicyHints {
  principal: {
    id: string;
    roles?: string[];
    grants?: string[];
  };
  hardStops: string[];
  confirmRules: string[];
  handoffRules: string[];
  redaction: Array<{
    applyTo: string[];
    replacement: string;
  }>;
}

Example:

{
  "principal": {
    "id": "workspace-admin",
    "roles": ["admin"],
    "grants": ["observe", "guide", "draft", "act"]
  },
  "hardStops": [
    "credential/secret data is never exposed directly to the model",
    "blocked actions are not executed autonomously"
  ],
  "confirmRules": [
    "confirm-risk actions require explicit confirmation before execution"
  ],
  "handoffRules": [
    "user activation, credential entry and payment approval cause handoff"
  ],
  "redaction": [
    { "applyTo": ["snapshot", "audit", "returnValue"], "replacement": "[REDACTED]" }
  ]
}

4.3 Preflight Is the Actual Enforcement

async function preflightAction(input: {
  actionId: string;
  target?: { stableId?: string; role?: string; scopeId?: string; name?: string };
  risk?: RiskDescriptor;
  dataClasses?: string[];
  sideEffectClass?: string;
  args?: Record<string, unknown>;
}): Promise<PolicyDecision> {
  return policyClient.evaluate({
    context: {
      principal: currentPrincipal,
      actionId: input.actionId,
      target: input.target,
      risk: input.risk,
      dataClasses: input.dataClasses,
      sideEffectClass: input.sideEffectClass,
      args: input.args
    }
  });
}

The result is then treated strictly:

function handlePolicyDecision(decision: PolicyDecision):
  | { kind: "proceed" }
  | { kind: "confirm"; obligations?: unknown[] }
  | { kind: "handoff"; obligations?: unknown[] }
  | { kind: "deny"; reasonCodes: string[] } {
  switch (decision.decision) {
    case "allow":
      return { kind: "proceed" };
    case "confirm":
      return { kind: "confirm", obligations: decision.obligations };
    case "handoff":
      return { kind: "handoff", obligations: decision.obligations };
    case "deny":
    default:
      return { kind: "deny", reasonCodes: decision.reasonCodes };
  }
}

4.4 Keep Redaction and Action Eligibility Separate

Policy models redaction separately from action permission. This is important for agent integration:

A field can be redacted without the entire screen disappearing for planning purposes.
A snapshot for the model can contain placeholders while the host layer continues to work internally with structural knowledge.
The LLM should see red zones as visible but masked, not as “invisible,” if the surrounding context would otherwise become incomprehensible.

4.5 Treat `confirm` and `handoff` as Structured States

These cases should not end as a free-form chat instruction like “I need a bit of help.” Better is a structured state in the controller:

confirm -> explicit confirmation request, followed by action.confirmation.grant or deny
handoff -> clear transfer to user responsibility
deny -> alternative planning

The model may explain, but must not simulate enforcement.

5. Observation Loop

5.1 Recommended Standard Loop

A reasonable agent loop looks like this:

Establish session
Load capabilities / policy / workflows
Start snapshot or observe stream
Build redacted planning context
Plan next step
Policy preflight
Execute action or workflow
Observe result, deltas and signals
Update context and replan

With UIAP messages, this is typically:

agent -> session.initialize
app   -> session.initialized

agent -> capabilities.get
agent -> uiap.policy.get      (or uicp.policy.get in older drafts)
agent -> uiap.workflow.get
agent -> web.observe.start

app   -> capabilities.list
app   -> uiap.policy.document
app   -> uiap.workflow.document
app   -> web.state.snapshot
app   -> web.state.delta*
app   -> web.signal*

agent -> action.request
app   -> action.accepted
app   -> action.progress*
app   -> action.confirmation.request?
agent -> action.confirmation.grant / deny?
app   -> action.result
app   -> web.state.delta*
app   -> web.signal*

5.2 Maintain a Local State Store

The state store is not glamorous, but indispensable.

class StateStore {
  private graph?: PageGraph;
  private revision?: string;
  private recentSignals: WebSignal[] = [];

  applySnapshot(graph: PageGraph) {
    this.graph = graph;
    this.revision = graph.revision;
    this.recentSignals = graph.signals ?? [];
  }

  applyDelta(delta: WebStateDeltaPayload) {
    if (!this.graph || delta.baseRevision !== this.revision) {
      throw new Error("Revision gap: full snapshot required");
    }

    this.graph = applyWebDelta(this.graph, delta.ops);
    this.graph.revision = delta.revision;
    this.revision = delta.revision;
    this.recentSignals.push(...(delta.signals ?? []));
    this.recentSignals = this.recentSignals.slice(-20);
  }

  current(): PageGraph {
    if (!this.graph) throw new Error("No snapshot available");
    return this.graph;
  }
}

When baseRevision does not match, the agent should not guess but request a new web.state.get snapshot.

5.3 A Turn Controller

class UIAPAgentController {
  constructor(
    private readonly state: StateStore,
    private readonly policy: PolicyGateway,
    private readonly tools: ToolRegistry,
    private readonly workflows: WorkflowIndex,
    private readonly planner: Planner,
    private readonly runtime: RuntimeClient
  ) {}

  async next(goal: string) {
    const rawGraph = this.state.current();
    const modelGraph = await this.policy.redactGraph(rawGraph);
    const context = buildPlanningContext(modelGraph);

    const plannerInput: PlannerInput = {
      goal,
      policyHints: await this.policy.hints(),
      context,
      workflows: await this.workflows.match(goal, context),
      tools: this.tools.forContext(context)
    };

    const plan = await this.planner.next(plannerInput);

    if (plan.kind === "workflow.start") {
      return this.runtime.startWorkflow(plan.workflowId, plan.inputs);
    }

    if (plan.kind === "action") {
      const decision = await this.policy.preflight(plan.toPolicyContext());
      const policyResult = handlePolicyDecision(decision);

      if (policyResult.kind === "deny") {
        return { kind: "replan", reason: policyResult.reasonCodes.join(",") };
      }

      if (policyResult.kind === "handoff") {
        return { kind: "waiting_user", obligations: policyResult.obligations };
      }

      if (policyResult.kind === "confirm") {
        return { kind: "waiting_confirmation", obligations: policyResult.obligations };
      }

      return this.runtime.requestAction(plan);
    }

    return { kind: "respond", message: plan.message };
  }
}

5.4 Take Verification Seriously

action.result is important but not the only truth. Good controllers use at least three signals:

action.result.verification
Observed web.signal events
web.state.delta / revision progress

Especially with ui.activate, ui.submit or domain actions, the agent should not derive success merely from a tool saying “ok.” A clean UI transition, toast, dialog state or entity signal is considerably more reliable.

5.5 Workflow Loop Instead of Individual Actions

When a workflow has been started, the loop is similar but shifted to workflow messages:

uiap.workflow.started
uiap.workflow.progress
uiap.workflow.input.request
uiap.workflow.input.provide
uiap.workflow.result

The controller should still consider workflow and action events together because action steps internally run through the runtime again.

6. Context Budget Management

6.1 The Key Principle

PageGraph can grow large. Yet the solution is almost never to compress even more aggressively and discard semantics until only “there is something with a button” remains. Better is progressive detailing:

First an overview
Then the active scope
Then targeted detail requests for the current step

6.2 Three Budget Tiers

Tier A: Overview Context

For the planner start:

Route
Active dialogs/drawers/popovers
Focus
15 to 30 relevant elements
Last 5 to 10 signals
1 to 3 workflow recipes
Curated tool list

Tier B: Scope Detail

When a specific area becomes relevant:

Only one or two scopes
All important controls in the scope
Validation and status states
Possibly relations like label -> field or submit -> form

Tier C: Execution Detail

Only when a concrete action is imminent or ambiguity exists:

documentId
Actionability fields
bbox
Runtime hints
Possibly additional same-named candidates for disambiguation

6.3 Using Scope Filtering in Practice

The web profile already provides the right levers for this:

web.state.get.scopes
web.state.get.documents
includeHidden
includeNonInteractive
maxNodes

A host layer can thereby load detail on demand:

async function ensureScopeDetail(scopeId: string) {
  if (stateHasEnoughScopeDetail(scopeId)) return;

  const snapshot = await transport.request<WebStateSnapshotPayload>("web.state.get", {
    scopes: [scopeId],
    includeHidden: false,
    includeNonInteractive: false,
    maxNodes: 80
  });

  stateStore.applySnapshot(mergeScopedSnapshot(stateStore.current(), snapshot.graph));
}

6.4 Element Prioritization

A robust prioritization prefers:

stableId over purely heuristic targets
Visible, interactive elements over passive text
Open modal scope over background content
invalid/required/busy/open over neutral state
Elements with domain actions over purely generic controls
Annotated or registry-backed semantics over inferred

When WebSemantics.sources or discovery evidence relies only on heuristics, the planner should treat that as lower confidence. This helps the model ask for detail or confirmation when facing ambiguous surfaces, rather than determinedly producing nonsense. A rare but admirable virtue.

6.5 Summarize Collections

Large tables or lists should not end up fully in the prompt. Better is a signature like:

interface CollectionSummary {
  scopeId: string;
  name?: string;
  count: number;
  visibleItems: Array<{
    stableId?: string;
    role: string;
    name?: string;
    selected?: boolean;
    supportedActions: string[];
  }>;
  omittedCount: number;
}

The planner then sees, for example:

“37 invoices visible, 5 in current viewport, 1 of those selected”
“each row item supports ui.activate and billing.openSettings”

Only when a specific row becomes relevant is its scope or row context loaded on demand.

6.6 Separate Caching and Stable Contexts from Live Context

Not everything belongs in every turn:

Stable across session: capabilities, global policy hints, workflow metadata
Slowly changing: route, visible main scopes, principal role
Highly dynamic: focus, validation, toasts, busy states, action results

A good prompt contains only what is currently needed. Capabilities or workflow descriptions do not need to be re-serialized every time a spinner appears.

6.7 Recommended Default Budget per Planner Turn

As a starting point, this often works:

1 route context
Up to 4 active scopes
20 to 30 elements
8 current signals
1 to 3 workflow candidates
8 to 15 tools or 1 generic runtime tool + action shortlist

That is small enough for plannable turns and large enough for the model to still genuinely understand the surface.

7. Practical Guidelines for a Robust Integration

7.1 The LLM Plans, but the System Decides

The model may make suggestions such as:

Next action
Preferred workflow
Needed additional details
Natural language to the user

The host layer, however, decides on:

Policy eligibility
Redaction
Actual action requests
Confirmation/handoff
State authority and revisions

7.2 Think `supportedActions` Locally, `actions[]` Globally

The capability document says what is fundamentally possible. The PageGraph says what here and now on the current screen is meaningfully addressable. An agent should combine both levels, not confuse them.

7.3 When Ambiguous, Load Details Rather Than Guess

When two buttons share the same name or a target was only found heuristically, the correct reaction is not “it will probably work,” but:

Narrow the scope
Show target candidates explicitly in context
If needed, include bbox or neighborhood information
Or obtain additional clarification from the user/workflow

7.4 User Activation and Human Handoff Are Normal States

When runtime or policy reports waiting_for_user, user_activation_required or handoff, that is not a failure of the agent but the correct response to platform and security boundaries.

7.5 Discovery Is Input, Not Runtime Truth

Discovery packages are excellent for preparing bindings, actions and workflow candidates. In live agent integration, however, unreviewed discovery candidates should not have the same status as published authoring/bundle artifacts.

8. A Simple Reference Strategy

If one boils this down to a single practical statement, it is this:

Keep the full UIAP state locally, show the LLM only a redacted and scope-scoped semantic view, compile capabilities into tools, treat workflows as prioritized recipes, let policy enforce hard constraints outside the model, and verify every side effect through results, deltas and signals.

That is not a magical architecture. It is simply the variant in which an agent does not immediately stand like a confused intern in front of a half-open modal dialog after the third UI change.