Every AI You Use Forgets You Exist on Monday

Apr 2

It can write better prose than most humans, generate working code from a sentence, and wake up Monday with no idea who you are.

Not "it lost some context." It does not know you exist. Your preferences, your history, your relationships, the meaning accumulated across years of digital life. Gone. Every session is a cold start.

The industry's answer is to bolt persistence onto session machinery. Memory features. Conversation history. Append-only logs. Everyone trying to make a transient engine behave less transiently.

This is backwards.

The wrong layer

Intelligence in most AI products lives in the model plus the current session. Session ends, context evaporates. Memory features improve this - they save summaries, preferences, distilled facts replayed into future conversations. Better than nothing. But it is still session machinery simulating persistence.

The memory is a cache of interactions, not a representation of your world.

Corpus-first

What if intelligence lived in the data itself?

We have been building a system where the primary artifact is not a conversation or a model. It is a classified, interpreted, continuously maintained corpus of the owner's actual data: email, messages, files, photos, contacts, calendar, notes. 1.55 million entities and growing, harvested through background daemons, classified through a cascade architecture, and projected through an interpretation layer the owner controls.

The system does not remember conversations. It knows things. When you ask it a question, it queries a governed corpus where the answer already lives, classified and attributed, with provenance on every field.

Cold start does not exist. Not because the system saved the last conversation, but because it never stopped knowing.

The compounding loop

Raw data enters through harvesters watching email, messages, files, photos, calendar, contacts. Each entity gets staged, ingested, and passed through a classification cascade. First a deterministic rules engine - fast, cheap, precise. Then a language model for semantic classification. Then a feedback loop that promotes high-confidence model outputs into new rules.

The corpus improves itself. Every classification makes the next one better. Every promoted rule handles more cases without touching a model. The system gets smarter by running, not by being trained.

This only works because the corpus, the classification engine, and the interpretation layer are co-resident. The loop depends on physical proximity. You can approximate it across an API boundary, but the compounding weakens as the layers separate.

And sovereignty falls out as a consequence. We did not set out to build a privacy system. We set out to build intelligence that does not cold-start. But once the compounding loop requires corpus and inference to live together, the data no longer needs to leave to become useful.

The interpretation layer

A million emails in a database is just a million emails. The step that turns corpus into intelligence is interpretation, and most systems quietly let the model control that layer.

In this architecture, interpretation is its own persistent layer - a registry of versioned projections that define what each entity type means, what fields matter, how relationships resolve. The owner controls it.

If the owner says "this email address maps to this person, always, regardless of what the model thinks," that is a structural fact, not a suggestion.

This includes authored silence: the owner's explicit declaration that certain information is absent. Not a model saying "I don't know." A record stating "this field has been declared empty." No model fabricates an answer there. The absence is part of the system's knowledge, not a hole in it.

We wrote separately about why the distinction between architectural properties and prompted policies matters for governance-critical systems.

The industry is converging. On the wrong thing.

The industry is converging on persistence. Background operation, scheduled memory, always-on behavior. Users clearly want systems that do not reset every morning.

But most of the industry builds from the session outward. Start with a conversational agent. Add memory. Add daemons. Each addition fights the original architecture.

The alternative is to build from the corpus inward. Start with the data. Classify it. Interpret it. Maintain it. Let interaction be one interface to an intelligence that already exists.

Session-continuity systems remember interactions. Corpus-first systems know the owner's world. These sound similar. They are architecturally opposite.

The numbers

This is not a sketch. 1.55 million entities across 9 source types. A 7-tier classification taxonomy with 75 values. 62 deterministic rules handling most classifications without a model. A cascade loop that promotes model outputs into permanent rules.

For semantic classification that requires a model, we distilled labeled output into purpose-built models running at 15+ tokens/sec on commodity hardware, replacing a 70B model at 0.22 tokens/sec. $0.002 per governance session versus $0.05 for the equivalent frontier call. The smaller model wins on compliance not because it is smarter but because it is structurally incapable of some failure modes that come with general capability.

The bet

The industry will converge on persistence because users demand it. Persistence built from sessions outward will always be a cache pretending to be continuity. Persistence built from corpus inward is a durable intelligence substrate - one that gets smarter by running, whose governance is structural rather than prompted, whose sovereignty is a consequence of where the intelligence lives.

Every other approach starts with the model and asks: how do we make it remember?

This one starts with the data and asks: how do we make it know?

Eliot Burkett