Skip to content

Coherence Is Safety

Without coherence, autonomous agents are a security risk:

  • An agent that doesn’t remember it already sent an email sends it again
  • An agent that doesn’t track its own decisions contradicts itself
  • An agent without values makes expedient choices

Instar’s safety features are coherence features.

Every significant decision is recorded with reasoning. The agent can explain why it did what it did, and detect when it’s drifting from purpose.

Terminal window
# Query the decision journal
curl localhost:4040/intent/journal
# Check alignment score
curl localhost:4040/intent/alignment

External actions are evaluated by an LLM-supervised gate before execution:

  • Risk classification — Every external operation scored on mutability, reversibility, and scope
  • Bulk deletes and irreversible sends require explicit approval
  • Adaptive trust — Trust levels evolve per service based on track record
  • New services start supervised — consistent success earns autonomy

Born from a real incident where an AI agent deleted a user’s emails.

Compares decision patterns across time windows. Detects when behavior shifts from stated purpose. Measures:

  • Conflict frequency
  • Confidence trends
  • Principle consistency

Trust elevation rewards consistent, value-aligned behavior with increasing independence. Safety that grows with the agent.

Emergency stop is always available — say “stop everything” and the MessageSentinel halts operations before normal routing.

Structural guardrails that fire automatically:

HookTypeWhat it does
Dangerous command guardPreToolUse (blocking)Blocks rm -rf, force push, database drops
External operation gatePreToolUse (blocking)LLM-supervised safety for external calls
Grounding before messagingPreToolUse (advisory)Forces identity re-read before communication
Deferral detectorPreToolUse (advisory)Catches the agent deferring work it could do

Every safety feature exists because coherence is the safety mechanism. An agent that knows who it is, who you are, and what you both stand for — that’s an agent you can trust.