Thoughts · The Bets

Time is the missing modality.

“The model observes tokens, not elapsed time.” The next architectural shift is the one that fixes that.— Tan, Tan & Soatto, Can LLMs Perceive Time? (2026)

TML’s TimeSpeak score. GPT-realtime-2 minimal scores 4.3 on the same benchmark.

0×

Error margin when an LLM estimates its own task duration (Tan, Tan & Soatto, 2026).

Prompt-level alignment after adding timestamps. Up from ~50%. Then it plateaus.

Independent research camps using “time-aware LLM” to mean three different things.

I · THE FRAMING PROBLEM

Three labs. Same phrase. Different problems.

In May, Thinking Machines released their first model and called it “time-aware.” What they mean by this is specific. The model wakes up every 200 milliseconds, senses whatever is happening in the conversation, and reacts to it. It does not wait for the user to finish talking before doing anything.

The phrase “time-aware LLM,” though, has been around in other research lines for a while. Going back to 2022, a TACL paper by Dhingra and colleagues used the phrase for a model that knows what was true in 2017 versus what is true today — the knowledge-cutoff problem. A 2025 survey called It’s High Time treats it as a research program for tracking fact decay, event ordering, and temporal expressions. The time scale here is years, not milliseconds.

In late 2025, the TicToc-v1 paper coined another phrase: “temporal blindness.” The concern this time is the agent loop — does the agent register how much wall-clock time has passed between your message and its tool call? Usually not. The finding worth pausing on is that adding timestamps into the prompt only moves agent-alignment accuracy from about 50% to about 65%, and then it stops. Prompting can’t fix it.

Three research camps. Three meanings of the same phrase. None of them cite each other.

I don’t think this is a coincidence. The same modality is being rediscovered in three places at once.

II · THE THREE CLOCKS

Each camp picks one clock and is blind to the other two.

It helps to lay them out next to each other.

Three clocks run inside any deployed language model. Right now, almost no system handles more than one of them. TML’s release made this visible by claiming the phrase “time-aware” for the fastest of the three — and, in the process, making the gaps on the other two harder to ignore.

Open problem

Interaction Clock

Milliseconds

10¹ — 10³ ms

Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?

Representative work Moshi (Kyutai, 2024) · Synchronous LLMs (Meta, 2024) · TML Interaction (2026) · Voila · Seeduplex

Open problem

Agent Clock

Minutes — hours

10⁵ — 10⁷ ms

Inside a long-horizon agent loop, does the model know how much wall-clock time has elapsed since the user’s last input, and act accordingly?

Representative work TicToc-v1 (2025) · “Can LLMs Perceive Time?” (2026) · Ralph Wiggum drift (Huntley, 2025)

Open problem

Knowledge Clock

Months — years

10¹⁰ — 10¹² ms

Does the model know that its training cutoff is two years stale, and can it reason explicitly about when a fact was last true?

Representative work Dhingra et al. 2022 (TACL) · TimeR4 (EMNLP 2024) · TimE (NeurIPS 2025) · LiveFact (2026) · “Do LMs Know Time Passes?” (2026)

None of these communities treats the others’ problem as the same as theirs. The interaction-clock researchers consider agent reasoning out of scope. The knowledge-clock researchers do not think about real-time streams. The agent-clock researchers are not worried about sub-second perception. Each of them is half-blind to two-thirds of the cognitive failure they all describe.

What TML did, in effect, was take a phrase one of these communities had already been using and apply it to a different problem. The three communities are now in a position where they have to look at each other.

III · TIME AS COGNITION

Time isn’t an output capability. It’s an input modality.

I think the right way to understand this is that today’s LLMs don’t perceive time. They read about time. The distinction matters more than it might first appear.

The cleanest framing I have seen is from Joscha Bach: the perception of time, as duration, reference to events, and reference to internal clocks, is a feature of cognitive processing, and not an alternative to physics. So a model with no duration sense is missing a piece of the cognitive stack, even when it can describe time fluently. The analogous gap is GPT-3 with images. GPT-3 could write paragraphs about a sunset. It had never seen one.

A paper from earlier this year, Tan-Tan-Soatto, makes the same diagnosis empirically: the model observes tokens, not elapsed time. They measure the gap directly — when asked to estimate their own task duration, LLMs are off by 5–10×. That is Bach’s philosophical point made measurable.

It helps to compare this to hallucination, the field’s defining cognitive failure from 2022 to 2024. The first wave of fixes was prompt-side: RLHF, instruction tuning, sampling tricks. None of them worked structurally; the rate kept plateauing.

What eventually worked was the reframe — treating hallucination as a structural property of next-token sampling rather than a bug to be patched. The reframe came before the engineering progress, and in many ways the reframe was the engineering progress. Once the problem was named correctly, the fixes followed.

Temporal blindness is on the same arc. Right now it looks like a missing timestamp. I think in two years the framing will be that it is a missing perceptual dimension.

IV · THREE BETS

If the framing is right, three concrete things follow.

Three bets follow from the framing, each with a horizon and a named falsifier. I think each is more likely than not. None of them is obvious yet.

Bet 1 · The structural-fix bet

Temporal blindness is the next hallucination-scale problem, and prompting won’t fix it.

TicToc-v1 already has the data: prompt-level timestamps move agent alignment from ~50% to ~65%, then plateau. Tan-Tan-Soatto’s companion result — 5–10x error on self-time estimation — points to where the failure actually lives: inside the model’s implicit world model, not at the surface of the prompt. Same arc hallucination followed: prompt fixes plateau, structural fixes work. I’d bet temporal awareness gets reframed from a prompting problem to a training-objective problem within two years.

Horizon: 24 months · Falsifier: A prompt-only technique — no architectural change, no post-training — that closes more than half of TicToc-v1’s gap on the original benchmark, replicated independently.

Bet 2 · The retrospective bet

Three years from now, an LLM with no sense of time will look as primitive as a text-only LLM looks today.

The 60.4-point gap between TML (64.7) and GPT-realtime-2 (4.3) on TimeSpeak is the first time anyone has measured the deficit head-on. It’s wider than any current MMLU differential. Right now, almost no production benchmark scores it. My guess is that by 2029, time-perception will sit next to vision on every frontier model card — and the absence of it will read the way “text-only” reads today: a model from an earlier generation.

Horizon: 3 years · Falsifier: A 2029 frontier model card from any of the major Western labs that reports nothing in the TimeSpeak / CueSpeak / RepCount / TicToc family or successor — and no equivalent measure of temporal perception.

Bet 3 · The synthesis bet

World Model labs absorb time-aware interaction as a primitive — and they move first.

The interaction thesis and the embodied-AI thesis look like rival paradigms. Murati says how we work with AI matters as much as how smart it is. LeCun says language manipulation isn’t intelligence; we need World Models. But the two camps are diagnosing the same gap from different sides — that LLMs are decoupled from the world. dgallitelli95 already framed TML’s release as “accidentally proving the embodied-AI thesis,” and LeCun’s AMI Labs launched in April. The World Model side has the deeper apparatus — predictive systems need duration as a building block, not a wrapper. I’d bet AMI Labs or a comparable World Model program publishes work within 24 months that treats time-aware interaction as an architectural primitive, and the interaction-vs-autonomy framing stops being the relevant axis.

Horizon: 24 months · Falsifier: By mid-2028, AMI Labs and other major World Model programs have published nothing that explicitly treats time-aware interaction (full-duplex perception, micro-turn architecture, or sub-second visual reactivity) as an architectural primitive.

V · THE STEEL-MAN

Four objections worth taking seriously.

I will try to steelman each of them.

The Bitter-Lesson reversal. Scaling eats hand-crafted time-awareness the same way it ate hand-crafted search. The fix is not architectural; the fix is more parameters and lower latency. Kingy AI argues this most cleanly.
The dual-clock consistency hazard. A slow background model can contradict what the foreground model has already said out loud, and no released paper says how to resolve it. If the resolution requires collapsing the two clocks back into one, the multi-clock architecture unwinds. Sean Goedecke raised this.
The embodied-AI counter. Time-awareness is a downstream symptom of the deeper missing capability, which is causal physical prediction. Solve World Models and time-awareness comes for free; solve it without World Models and you have solved the wrong problem. This is LeCun’s position.
The continual-learning gap. The hardest version of “time-aware” isn’t perceiving time within a session. It is updating weights across sessions — so that a system actually carries memory rather than evaporating it at the context window. None of the three labs in this piece is doing that. Karpathy keeps naming this gap.

I think the bets survive objections 2, 3, and 4. If LeCun is right about World Models, time-awareness is a near-term piece of his program rather than a competitor to it. If Karpathy is right about continual learning, time-awareness is a precondition for the harder problem rather than a substitute. The Bitter-Lesson reversal is the only one that kills the bets outright. That is the one I would actually watch.

VI · THE NAMING PROBLEM

Whoever names this correctly defines the next era.

Hallucination was unfixable for about two years. People tried prompt engineering, retrieval augmentation, sampling tricks. None of it worked structurally. What eventually worked was the reframe: calling hallucination a structural property of sampling rather than a bug to be patched. The reframe came before the engineering progress, and in many ways the reframe was the engineering progress.

The same gap is open now around time. Three labs are demonstrating, from three angles, that the same modality is missing. None of them has the language to claim all three. Each of them is one-third right. Whoever finds the unified frame — the cross-clock benchmark, the single architectural story — gets to define what “time-aware LLM” means for the rest of the decade.

My bet, in one sentence: the next frontier model card will have a row for time.