Research · read-only

The agent prompt system

How mods/ai_agent assembles the system prompt and user turn, what identity it injects (and what it conspicuously doesn't), how this worktree's memory & learning layer plugs in, and whether it follows Anthropic prompt-caching guidance — as groundwork for personalizing agents to act as the business / assigned user.

Scope · contact-facing domain agents (Text Reply, web chat, follow-up) Model · anthropic/claude-sonnet-4.6 via OpenRouter + rig 0.38 Verified against · source on this branch

↳ Headline finding for your goal:

The contact-facing agent prompt injects no current-user name, no roles, and no organization data. Persona & goals are static operator-authored text that say "the business" generically but never name the business, owner, or assigned user.
The only runtime-resolved identity is the contact's (name + the line they texted) — i.e. who the agent is talking to, not who it is acting as.
That makes "personalize the agent to act as the current/assigned user" a net-new prompt layer, not a tweak — and there's a clean, cache-safe seam to add it (§7).

00TL;DR

Seven things to know before reading the detail.

01 · LAYERED

The system prompt is assembled static→dynamic in build_system_prompt_service.rs: Tier 0 platform-core → Tier 0.5 org/agent rules → persona+goals → delegated block → Tier 1 delivery directive → skills → invokable agents → tool list.

02 · NO IDENTITY

Zero injection of user name, user role, org name, hours, signature, or timezone. Grep of the builder + executor for those fields returns nothing.

03 · CONTACT ONLY

Runtime identity that is resolved = the contact's display name + the from-line, framed into the user turn so the agent replies "from the same line."

04 · ASSISTANT ≠ AGENT

The one personalized actor is the internal Loquent Assistant flavor (Vernis), which rebuilds the member's Session + personalization. Contact-facing agents do not get this.

05 · LEARNING

Active learning version is summarized to ≤10 bullets and injected into the user turn, never the system prompt — deliberately, to keep the cached prefix byte-stable.

06 · MEMORY

Long-term memory is typed blocks read/written on-demand via tools (read_my_memory/update_my_memory). It is not injected into any prompt.

07 · CACHING

Prompt caching is ON — with_prompt_caching() puts one cache_control: ephemeral breakpoint on the system prompt. Conversation history is not cached — the biggest remaining win.

01The call path

One agent turn, end to end. Everything is reconstructed from the DB each turn — the API is stateless and there is no message store (schema frozen in Phase 1).

enqueue event ──▶ claim active_idle→active_running ──▶ run_worker_loop │ ▼ run_ai_agent_thread() (one turn) src/mods/ai_agent/services/run_ai_agent_thread_service.rs ├─ load thread + agent + skills + invokable agents ├─ drain pending queue events ──────────────▶ the NEW user input for this turn ├─ derive runtime layers: │ • include_platform_core = attach_domain_tools && !assistant_flavor │ • org_rules / agent_rules = load_tier_0_5_rules(org, agent) + legacy config_payload │ • delivery_directive = f(send_mode, reply_ctx, open_escalations) ← only turn-varying tier │ • learning_summary = summarize(active|pinned learning version) ← user-turn layer │ • contact few-shot = top-k past draft corrections for (agent, contact) ├─ build_system_prompt(...) ───────────────▶ the CACHED static preamble ├─ build_chat_history(thread_id) ──────────▶ replayed prior turns (≤100 / source) ├─ user_message = [learning] + [few-shot] + [source-framed events] ├─ resolve model + openrouter_client() + completion_model().with_prompt_caching() ├─ rig_agent.chat( Message::user(user_message), &mut history ) ── AgentTurnModel records usage └─ ONE transaction: ai_usage_log + ai_thread_log + consume events + pin learning version

Two flavors share this executor

A domain agent (Text Reply, web chat, follow-up drafter) uses the assembled persona/goals prompt below. The assistant flavor (the per-user Loquent Assistant clone) swaps in a full Vernis turn whose preamble replaces the assembled prompt and binds the member's Session-scoped tools. This document is about the domain-agent path unless noted — that's the one that talks to contacts.

02System prompt anatomy

The preamble is a pure conditional assembler — the executor decides what's included and passes pre-rendered strings in. Ordering is deliberately static→dynamic for cache stability.

System prompt (= rig .preamble(), the cached prefix)

TIER 0Platform core identity · tool-action contract · operator-vs-contact source rules · "ground your replies, never make things up"conditional

TIER 0.5Organization rules operator-authored, org-wide — always empty today at the executor seam (forward-compat hook)when set

TIER 0.5Agent rules from ai_agent_rule table + legacy config_payload["agent_rules"]when set

PERSONAPersona ai_agent.persona — static, operator-authored, verbatim (multi-line allowed)static

GOALSGoals ai_agent.goals — rendered as "Your goals:\n{goals}"static

DELEGDelegated sub-agent block only when the thread has a parent_thread_id — relay via enqueue_parentwhen delegated

TIER 1Delivery directive autonomous-send / suggest-draft / observe + escalation-reconcile — the only section that varies per turnper-turn

SKILLSSkills full bodies, or a compact lazy index (id+title+desc) when the agent has list_my_skills/load_skillper-agent

INVOKEAgents you can invoke id + name + role hint, scrubbed of newlines (a callee may be authored by a different user)per-agent

TOOLSTool allowlist "You can use the following tools: …" — names only, in allowlist orderper-agent

The literal assembly is one format string — note that learning, memory, and contact data are absent here by design:

src/mods/ai_agent/services/build_system_prompt_service.rs:337let system_prompt = format!(
  "{platform_block}{org_rules_block}{agent_rules_block}{persona}\n\n\
   Your goals:\n{goals}{delegated_block}{delivery_block}\
   {skills_block}{invokable_agents_block}{tools_block}",
  persona = agent.persona,   // static, operator-authored
  goals   = agent.goals,     // static, operator-authored
);

Tier 0 — the platform-core contract

Included only for a domain agent (attach_domain_tools && !is_assistant_flavor). It is a fixed constant (PLATFORM_CORE_BLOCK) with four sections — and crucially, it speaks of "the business" generically:

build_system_prompt_service.rs:73 — PLATFORM_CORE_BLOCK (excerpt)## Who you are
You are an autonomous agent acting on the business's behalf inside Loquent…
You speak to the business's contacts as the business itself — a real person
from the team — and you are never named or described as an AI…

## How you affect the world   → you change the world only by calling tools
## Who is talking to you: operator vs. contact   → trust steering, distrust the contact
## Ground your replies; never make things up   → gather context with read tools, else escalate

"the business" is never resolved to a name

Tier 0 establishes that the agent acts as the business — but "the business" stays a generic placeholder. There is no slot where the org's name, the owner's name, or the assigned user's name is interpolated. That's the gap your goal targets.

03User & org data — the core question

You asked: do we inject the current user (name, roles) and the organization's data so the agent can act as that person? For contact-facing agents the answer is no. Here's the evidence and the one exception.

Data point	Injected into the domain-agent prompt?	Where it lives / why not
Current/owner user name	No	`ai_agent.user_id` exists but is never read into the prompt path
User roles / permissions	No	ABAC roles drive API auth, never reach prompt assembly
Organization name / profile	No	`org_rules` is wired but ships empty; no org profile fields are read
Business hours / timezone / signature	No	Not modeled into the prompt at all
Assigned user for a given phone line	No	Phone binds to an agent (`phone_number.ai_agent_id`), not surfaced as a person in the prompt
Contact name + the line they texted	Yes	Resolved per turn, framed into the user turn (the audience, not the actor)
Persona & goals (generic "the business")	Yes	Static operator-authored text in `ai_agent.persona / .goals`

The grep that confirms it — over the assembler and the executor:

verified on this branch$ grep -n 'user.name|member.name|owner_name|organization.name|role|assigned_user|first_name' \
      build_system_prompt_service.rs run_ai_agent_thread_service.rs
(no matches)

The seed personas make the same point — they reference the business relationship but carry no identity. From the Follow-up Drafter seed:

migration/.../seed_followup_drafter_agent.rs — persona (excerpt)You are the Follow-up Drafter… You draft short, personalized SMS follow-up
messages sent to a contact on the user's behalf — the contact sees the
message as coming from the user's business, and you are never mentioned.

"the user's business" is a role, not a value — no name is ever substituted in.

The one exception: the assistant flavor

The internal Loquent Assistant (the in-app helper, not a contact-facing agent) does personalize. Its turn is assembled by assemble_assistant_turn, which rebuilds the owning member's Session and loads get_member_personalization(...). So the capability to thread a member's identity into a prompt already exists in the codebase — just not on the path that talks to contacts.

Domain agent (talks to contacts)

Static persona/goals + generic platform core. No member, org, or assigned-user identity. Knows only the contact it's replying to.

Assistant flavor (talks to the member)

Rebuilds the member Session, loads personalization, binds Session-gated CRM tools. Already "acts as" the member — a template to borrow from.

The closest thing to "from-identity" today

resolve_envelope_identity() resolves the contact's name and the from-line they texted so the reply lands on the right thread "as coming from the business." That references the business phone line, but still no business name or human identity. It's used only to frame the user turn, never the system prompt.

04The user turn

Everything dynamic and/or untrusted lives here — never in the cached system prefix. This is also where learning and few-shot examples are injected.

User message (= Message::user(user_message)), assembled per turn

1Learning summary ≤10 bullets from the pinned learning version — prepended only when learning is enableddynamic

2Few-shot block top-k past draft corrections for this (agent, contact) pairdynamic

3Source-framed events each drained event wrapped in a trust envelope (operator vs contact)dynamic

run_ai_agent_thread_service.rs:963 — KV-cache-safe user-turn injectionlet user_message = {
  let mut parts = Vec::with_capacity(3);
  if !learning_summary.is_empty() { parts.push(&learning_summary); }  // §5
  if !few_shot_block.is_empty()   { parts.push(few_shot_block); }       // few-shot
  parts.push(&prompt);                                                  // framed events
  parts.join("\n\n")
};
rig_agent.chat(Message::user(user_message), &mut history).await;

Source framing (prompt-injection defense)

Each event is wrapped by frame_for_prompt(contact_name, contact_number) so the model can tell trusted operator steering from untrusted contact text. Identity fields are scrubbed of newlines and bracket glyphs so a crafted SMS can't forge an operator envelope:

ai_thread_event_payload_type.rs:276 — frame_for_prompt (excerpt)InboundSms  → "[New SMS · {name} <{number}>]\n{scrubbed text}"
CallCompleted → "[Call ended · {name} <{number}>]\n{summary}"
UserMessage → "[Operator instruction from your owner — not the contact]\n{text}"
Scheduled   → "[Scheduled instruction from your owner — not the contact]\n{text}"

History reconstruction

There's no message store; history is rebuilt each turn from consumed queue events (user) + llm_generation log rows (assistant) + recorded tool calls, sorted by timestamp, capped at 100 per source (worst case ~300 messages). The new user input is passed as a separate Message::user — never interpolated into the system prompt.

05Memory & learning this worktree

The branch adds a three-part knowledge system. The distinction matters: learning is injected, memory is tool-loaded, lessons feed the digest that evolves learning.

Concept	Table	What it is	How it reaches the model
Learning version	`ai_agent_learning_version`	Versioned, evergreen behavioral policy (markdown bullets). Exactly one `active` per agent (partial unique index). Chained via `previous_version_id`.	Injected — summarized to ≤10 bullets into the user turn
Lesson	`ai_agent_lesson`	Supervision signal: situation / what-went-wrong / what-to-do-instead, with source + support_count + confidence.	Not injected — consumed by the digest
Memory block	`ai_agent_memory` (JSONB blocks)	Long-term facts, typed by label: PersonaBelief · BusinessFacts · Preferences. `read_only` blocks are owner-pinned.	On-demand — via memory tools only
Draft correction	`ai_draft_correction`	Operator edit of a draft before sending — the "gold signal."	Few-shot examples in the user turn (per contact)

How learning gets in (and why it's in the user turn)

On the first turn of a thread the executor resolves the agent's active learning version and pins its id to ai_thread.pinned_learning_version_id; every later turn reads the pinned version. This freezes the policy for the life of a conversation — a digest activating a new version mid-thread can't swap it underneath. The pinned body is summarized and prepended to the user message:

run_ai_agent_thread_service.rs:615 — learning resolution (condensed)let learning_summary = if agent.enable_learning {
  match thread.pinned_learning_version_id {
    Some(pinned) => summarize_learning_version(&get_learning_version_body(pinned)?),  // later turns
    None => { pin_version_id = Some(v.id); summarize_learning_version(&v.body_markdown) } // first turn → pin
  }
} else { String::new() };

✓

Why user-turn, not system prompt

Putting the (turn-varying) learning summary in the system prefix would invalidate the KV cache every time learning changed. Keeping it in the user turn lets the system prefix stay byte-stable. The summary is still mirrored into capture.learning for the debug timeline — observability only, not the injection mechanism.

Memory is tool-loaded, never injected

read_my_memory renders the typed blocks as labelled markdown on demand; update_my_memory applies a diff of add/update/delete ops (honoring read_only). There is also contact-scoped memory (read_contact_memory / update_contact_memory). The system-prompt builder explicitly leaves memory_snapshot: None — memory never enters the prompt automatically.

Evolution / digest loop

operator edits a draft / answers an escalation / turn dies │ (capture is UNCONDITIONAL — even when enable_learning is off) ▼ ai_agent_lesson (deduped, support_count++) │ learning_digest_poller_job (every minute) ▼ run_learning_digest() GATED on enable_learning ├─ load undigested lessons (≤100) + recent corrections (≤50) ├─ frequency gate: support_count ≥ 2 OR confidence ≥ 0.4 → promotable ├─ LLM proposes new body_markdown + change_summary + folded_lesson_ids ├─ guardrails: drop hallucinated ids · reject empty/unchanged body └─ insert new version (auto → active, or approval → pending) + prune to ≤20

Apply mode is org-level: auto activates immediately; approval lands a pending version an owner approves. Risk is bounded by the ≤20-version cap, the one-active-per-agent invariant, and the reward-hacking / hallucination guards. Shadow replay + regression fixtures exist as the validation surface.

Retrieval observability (#1566)

A metadata-only retrieval_context records which learning version, how many learning bullets, and how many few-shot examples informed a turn — stored in the audit capture, never injected, and carrying ids/counts only (no verbatim cross-contact example text).

06Prompt caching vs Anthropic guidance

Caching is enabled and the architecture is genuinely cache-aware. But it uses only one of four available breakpoints, and the biggest cost — replayed history — is uncached.

What's wired

src/mods/ai/rig/client.rs:38pub fn completion_model(client, model_id) -> openrouter::CompletionModel {
  client.completion_model(model_id).with_prompt_caching()
  // attaches cache_control: {"type":"ephemeral"} to the SYSTEM PROMPT
}

Provider path is OpenRouter → Anthropic (anthropic/claude-sonnet-4.6); OpenRouter forwards the cache_control marker. Cache hits are observable — the streaming layer reconciles cached_input_tokens and cache_creation_input_tokens, and read-side cached tokens are logged into ai_usage_log.

Anthropic best practice	Status here	Notes
Stable content first, volatile last	Followed	Static→dynamic tiers; the only per-turn system section (delivery directive) sits after the static identity
No silent invalidators (timestamps / UUIDs / unsorted JSON in the prefix)	Followed	System prompt is deterministic; learning & few-shot are in the user turn, not the prefix
Frozen system prompt; inject dynamic context later	Followed	Exactly the user-turn-injection design for learning (#1560) and few-shot (#1562)
Deterministic tool set (render order tools→system→messages)	Followed	Tools built in allowlist order; the system breakpoint covers tools+system as one prefix
Breakpoint on the latest turn for incremental multi-turn cache	Missing	Only the system prefix is cached. Replayed history (up to ~300 msgs) is re-processed uncached each turn
Use up to 4 breakpoints	1 of 4	Room for a tools/system split and a history breakpoint
Verify hits via usage fields	Followed	`cached_input_tokens` tracked & logged

The history-cache gap

with_prompt_caching() marks the system prompt only. On a long-lived thread, every turn re-sends the full reconstructed history at full input price. A breakpoint on the last block of the most recent turn would let history accrue cache hits incrementally — the single biggest token-cost lever for chatty threads. Whether rig 0.38's OpenRouter provider exposes message-level cache_control placement needs a quick capability check before committing to it.

Minimum-cacheable-prefix caveat

On Anthropic, the minimum cacheable prefix for the Sonnet-4.x family is ~1–2K tokens. A terse persona+goals agent with no skills may fall under that floor and silently not cache (cache_creation_input_tokens: 0). Worth confirming real agents clear the floor — and a reason an identity block (§7) is close to free: it adds stable prefix bytes that improve cacheability rather than hurt it.

07How to enhance the prompt structure

Directions to consider for personalizing agents to act as the business / assigned user, plus the caching win. Each is annotated with where it would slot in and its cache implication. These are options, not a committed plan.

OPP-1

Add a "Who you represent" identity block (Tier 0.7)

A new static section after Tier 0.5, before persona — interpolating business name, owner/assigned-user display name, role/title, signature, hours, timezone, locale. It varies per agent, not per turn, so it lands in the cache-stable prefix at near-zero marginal cost and actually helps cacheability (more stable prefix bytes). This is the most direct answer to "act as the current user."

## Who you represent
You are messaging on behalf of {business_name}.
Your point of contact on the team is {owner_name} ({owner_title}).
Business hours: {hours} ({timezone}). Sign off as {signature} when appropriate.

Source fields from ai_agent.organization_id (org profile) and ai_agent.user_id (owner). The capability to resolve a member already exists in assemble_assistant_turn / get_member_personalization — reuse it on the domain path.

OPP-2

Per-phone assigned user — static vs per-turn

"The user assigned to a given phone" is subtler. Phone numbers bind to an agent today (phone_number.ai_agent_id), and a thread can receive events across lines. Two shapes:

Per-agent default (recommended first) — resolve one owning/assigned user for the agent and put it in the Tier 0.7 static block. Cache-safe, simplest, covers the common one-line-per-agent case.
Per-line override — if a single agent fronts multiple lines with different assigned users, resolve the assigned user from the inbound line and inject it in the user turn (alongside the envelope), keeping the system prefix stable. More precise, slightly more plumbing.

OPP-3

Populate the dormant org-rules hook

Tier 0.5 org_rules is wired through the assembler but ships empty at the executor seam. If org-wide identity/voice/policy belongs anywhere shared, this is the seam that already exists — no schema change to the prompt path, just a loader.

OPP-4

Cache the conversation history

Add a breakpoint on the latest turn so replayed history accrues incremental cache hits (the §6 gap). Biggest token-cost lever for long threads. Gate on a rig 0.38 capability check for message-level cache_control placement through OpenRouter.

⚠

Two guardrails to respect for any identity injection

1. Trust boundary. Owner/org identity is trusted operator content (like persona/goals) and belongs in the system prefix; never let contact-supplied data masquerade as identity — keep the source-framing discipline. 2. Cache stability. Anything that varies per turn (per-line assigned user, time-of-day greeting) must go in the user turn, not the prefix, or it defeats the whole static-prefix design.

08File reference map

Where each piece lives, for the implementation session.

Concern	File · symbol
System prompt assembler + Tier 0 constant	`ai_agent/services/build_system_prompt_service.rs` · `build_system_prompt`, `PLATFORM_CORE_BLOCK` (line 73), format string (line 337)
Per-turn executor (all wiring)	`ai_agent/services/run_ai_agent_thread_service.rs` · `run_ai_agent_thread` (learning 615, user turn 963, model 822, chat 978, envelope 1643)
Tier 1 delivery directives	same file · `CAPABILITY__BLOCK` / `OBSERVE__BLOCK` consts + `render_delivery_directive`
Tier 0.5 rules loader	`ai_agent/services/load_tier_0_5_rules_service.rs`
Source framing of the user turn	`ai_agent/types/ai_thread_event_payload_type.rs` · `frame_for_prompt` (line 276)
Prompt caching switch	`ai/rig/client.rs` · `completion_model().with_prompt_caching()` (line 38)
Usage / cache token reconciliation	`ai/rig/streaming.rs` (241) · `ai/services/log_ai_usage_service.rs`
Learning version resolve/summarize/pin	`get_active_learning_version_service.rs` · `summarize_learning_version_service.rs`
Digest / evolution loop	`run_learning_digest_service.rs` · `jobs/learning_digest_poller_job.rs`
Memory tools + reconcile	`tools/{read_my,update_my,read_contact,update_contact}_memory_tool.rs` · `reconcile_memory_blocks_service.rs`
Member personalization (assistant flavor — template to reuse)	`assistant/services/assemble_assistant_turn_service.rs` · `get_member_personalization`