Architecture

Inside the RSI loop

Four symbiotic stages, running on a continuous cycle. Each one feeds the next; together they compose Honey Nudger’s recursive self-improvement loop — the engine that powers every deployment, whether it’s running against the public Hivemind or as a private instance for your team.

Autonomous

No manual prompt engineering. The loop runs itself, discovering new optimizations without human review bottlenecks.

Statistically Verified

Every hint is treated as a hypothesis and validated through Bayesian A/B testing before it ships.

Recursively Improving

Each cycle compounds on the last. Better hints change what the system sees next, which changes what gets distilled, and so on.

Two-Endpoint Integration

Send prompts, report outcomes. The four-stage loop sits behind those two endpoints.

The four-stage RSI loop

Four stages. One symbiotic cycle.

Each stage feeds the next, and the handoffs between them are the architecture.

1

Observe

Every interaction the agent has is captured alongside the business outcome it produced — purchases, satisfaction scores, click-throughs, whatever the host system reports back.

Feeds Distill with a stream of outcome-labeled interactions.

2

Distill

The system mines the highest-performing interactions for the patterns that drove them, condensing each pattern into a candidate Optimization Hint — a portable, reusable nudge.

Feeds Verify with newly distilled candidate hints.

3

Verify

Candidate hints are treated as hypotheses and tested live via Bayesian A/B testing (Thompson Sampling). Traffic flows to the strongest candidates first; nothing graduates without statistical evidence.

Feeds Promote with statistically verified champions.

4

Promote

Proven winners enter production and start nudging the next generation of interactions. Underperformers are automatically retired. Each promotion changes what Observe sees next — and the cycle compounds.

Feeds Observe by changing the distribution of interactions captured next.

Loops back to step 1
Inside Verify

Nothing ships without proof

The Verify stage is what stops Distill’s ideas from becoming production noise. Every candidate hint is treated as a hypothesis and validated through controlled A/B experiments before it earns a promotion.

Thompson Sampling

Bayesian multi-armed bandit allocation dynamically routes traffic to maximize learning while minimizing exposure to underperformers.

Delay-Gated Maturity

KPI attribution waits for outcomes to fully mature before making promote/reject decisions — no premature conclusions.

Auto Promote / Reject

Winners are automatically promoted to production. Losers are retired. No human review bottleneck.

Integration

Two Endpoints. That’s It.

Send your LLM payloads, get back Optimization Hints, report business outcomes. The system handles everything else.

1

Get Optimization Hints

Send your agent’s system prompt, messages, and a session identifier. Honey Nudger returns contextual Optimization Hints tailored to this specific interaction.

Request

POST /v1/nudge

{
  "session_id": "user-session-abc",
  "system": "You are a helpful
              customer service agent...",
  "messages": [
    { "role": "user",
      "content": "I want to return
                  my order" }
  ]
}
Response
{
  "nudge_id": "ndg_7f3a2b",
  "hints": [
    "Lead with empathy —
     acknowledge frustration
     before discussing process",
    "Mention the 30-day
     hassle-free guarantee
     early in the conversation"
  ]
}

Place Hints in Your Compiled Prompt

compiled_prompt = original_system + """

## Optimization Hints
"""
for hint in nudge_response["hints"]:
    compiled_prompt += f"- {hint}\n"

# Send to your LLM as normal
response = llm.chat(
    system=compiled_prompt,
    messages=messages
)
2

Report Business Outcomes

When a KPI event occurs — purchase, satisfaction score, click-through, or any metric — report it with the nudge ID. The system automatically attributes the outcome and closes the learning loop.

Request

POST /v1/honey/ndg_7f3a2b

{
  "metric": "customer_satisfaction",
  "value": 5
}
Response
{
  "status": "recorded",
  "total_reward": 5.0
}

That’s it. From here, the four-stage loop takes over — Distill mines the wins, Verify A/B tests new hypotheses against current performance, and Promote graduates statistically proven winners. The Observe stage then sees the world the new champions create, and the cycle compounds.

See the loop in motion.

Every version of the system under test is scored on the COMB benchmark and published in the live ledger.

See the COMB RSI Benchmark