Peike Li · Notes
Insight · The Bets

Time is the missing modality.

“The model observes tokens, not elapsed time.” The next architectural shift is the one that fixes that.— Tan, Tan & Soatto, Can LLMs Perceive Time? (2026)

0
TML’s TimeSpeak score. GPT-realtime-2 minimal scores 4.3 on the same benchmark.
Error margin when an LLM estimates its own task duration (Tan, Tan & Soatto, 2026).
0%
Prompt-level alignment after adding timestamps. Up from ~50%. Then it plateaus.
0
Independent research camps using “time-aware LLM” to mean three different things.
I  ·  THE FRAMING PROBLEM

Three labs. Same phrase. Different problems.

In May 2026, Thinking Machines shipped its first model and called it “time-aware.” What Mira Murati’s team means by that: the model runs on 200-millisecond beats and senses the flow of a conversation, instead of waiting for the user to stop talking.

But “time-aware LLM” is also the title of a completely different research line. Dhingra et al. (TACL 2022) used it to mean a model that knows what was true in 2017 versus what is true in 2026 — the knowledge-cutoff problem. A 2025 survey, It’s High Time, frames it as a program for tracking fact decay, event ordering, and temporal expressions. Different clock. The unit is years.

Then late 2025, TicToc-v1 coined a third one: “temporal blindness.” Their target was the agent loop — does the agent know that two hours passed between your message at 9 AM and its tool call at 2 PM? Usually no. The finding that matters: just dropping timestamps into the prompt nudged accuracy from ~50% to ~65%, then plateaued. Prompting can’t fix it. The architecture has to be retrained.

Three labs. Three meanings of the same phrase. None of them cite each other. People keep treating the overlap as coincidence.

It’s not coincidence. The field is rediscovering, in three places at once, that the same modality is missing.


II  ·  THE THREE CLOCKS

Each camp picks one clock and is blind to the other two.

Put them side by side.

Three clocks run inside any deployed language model. Right now, almost no system handles more than one of them. TML’s release made this visible by claiming the word “time-aware” for the fastest of the three — and, in the process, exposing how little anyone has touched the other two.

TML · solved
Interaction Clock

Milliseconds

101 — 103 ms

Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?

Representative work Moshi (Kyutai, 2024) · Synchronous LLMs (Meta, 2024) · TML Interaction (2026) · Voila · Seeduplex
Open problem
Knowledge Clock

Years

107 — 1010 ms

Does the model know that its training cutoff is two years stale, and can it reason explicitly about when a fact was last true?

Representative work Dhingra et al. 2022 (TACL) · TimeR4 (EMNLP 2024) · TimE (NeurIPS 2025) · LiveFact (2026) · “Do LMs Know Time Passes?” (2026)
Open problem
Agent Clock

Minutes — hours

106 — 107 ms

Inside a long-horizon agent loop, does the model know how much wall-clock time has elapsed since the user’s last input, and act accordingly?

Representative work TicToc-v1 (2025) · “Can LLMs Perceive Time?” (2026) · Ralph Wiggum drift (Huntley, 2025)

None of these communities treats the others’ problem as the same as theirs. Interaction-clock people think agent reasoning is out of scope. Knowledge-clock people don’t think about real-time streams. Agent-clock people aren’t worried about sub-second perception. Each one is half-blind to two-thirds of the cognitive failure they all describe.

What TML did, basically, was take a phrase that belonged to two of these groups and use it to mean the third. They forced the conversation.


III  ·  TIME AS COGNITION

Time isn’t an output capability. It’s an input modality.

Joscha Bach, the cognitive scientist, has the cleanest version of why this matters: the perception of time, as duration, reference to events, and reference to internal clocks, is a feature of cognitive processing, and not an alternative to physics. Translated for the LLM stack: a system that can’t perceive duration isn’t building the same kind of internal model as one that can. Today’s LLMs talk about time the way GPT-3 talked about images it had never seen. They’ve read about it. They haven’t measured it.

Tan, Tan & Soatto make the same diagnosis empirically: the model observes tokens, not elapsed time. It does not directly perceive wall-clock duration while generating, and it does not accumulate sensorimotor memories. The 5–10x error on self-time estimation is Bach’s point made measurable. Compare hallucination, the field’s defining failure of 2022–2024. Initial framing: prompt-engineering bug. Eventual framing: structural property of next-token sampling. The reframe came before the engineering, not after. Temporal blindness is on the same arc. It looks at first like a missing timestamp. It’ll turn out to be a missing perceptual dimension.


IV  ·  THREE BETS

If the framing is right, three concrete things follow.

Three bets. Each has a horizon and a named falsifier. I think each one is more likely than not. None of them is obvious yet.

Bet 1 · The retrospective bet

Five years from now, an LLM with no sense of time will look as primitive as a text-only LLM looks today.

The 60.4-point gap between TML (64.7) and GPT-realtime-2 (4.3) on TimeSpeak is the first time anyone has measured the deficit head-on. It’s wider than any current MMLU differential. Right now, almost no production benchmark scores it. My guess is that by 2031, time-perception will sit next to vision on every frontier model card — and the absence of it will read the way “text-only” reads today: a model from an earlier generation.

Horizon: 5 years · Falsifier: A 2031 frontier model card from any of the major Western labs that reports nothing in the TimeSpeak / CueSpeak / RepCount / TicToc family or successor — and no equivalent measure of temporal perception.
Bet 2 · The structural-fix bet

Temporal blindness is the next hallucination-scale problem, and prompting won’t fix it.

TicToc-v1 already has the data: prompt-level timestamps move agent alignment from ~50% to ~65%, then plateau. Tan-Tan-Soatto’s companion result — 5–10x error on self-time estimation — points to where the failure actually lives: inside the model’s implicit world model, not at the surface of the prompt. Same arc hallucination followed: prompt fixes plateau, structural fixes work. I’d bet temporal awareness gets reframed from a prompting problem to a training-objective problem within two years.

Horizon: 24 months · Falsifier: A prompt-only technique — no architectural change, no post-training — that closes more than half of TicToc-v1’s gap on the original benchmark, replicated independently.
Bet 3 · The convergence bet

The next frontier architecture runs multiple clocks in parallel, not one.

TML’s under-discussed contribution isn’t the 200-ms beat — Moshi did that in 2024. It’s the dual-clock split: a foreground interaction model on a millisecond budget, an async background model that can take minutes for tool use and sustained reasoning, results flowing back into the foreground’s context. The mature version, the one a frontier lab ships by 2028, runs three clocks: interaction, agent, knowledge. Probably a fourth too — the within-session continual-learning clock Karpathy keeps talking about. The architecture won’t pick a clock. It’ll run them concurrently and arbitrate.

Horizon: 24–36 months · Falsifier: A frontier flagship from any of the major Western labs that ships explicit time-perception capability but handles only one of the three clocks — not two, not three.

V  ·  THE STEEL-MAN

Four counter-frames worth taking seriously.

(i) Kingy AI’s Bitter-Lesson reversal: scaling eats hand-crafted time-awareness the same way it ate hand-crafted search. (ii) Sean Goedecke on TML’s dual-clock: a slow background model can contradict what the foreground has already spoken aloud, and no released paper says how to resolve it. (iii) Yann LeCun: time-awareness is a downstream symptom of the deeper missing capability, causal physical prediction; the answer is World Models. (iv) Andrej Karpathy: the hardest version of “time-aware” is continual learning across sessions, which none of the three labs in this piece is doing.

The bets survive (ii)–(iv). If LeCun is right about World Models, time-awareness is a near-term component of his program, not a competitor. If Karpathy is right about continual learning, time-awareness is a precondition to it. Only (i) — the Bitter Lesson reversal — kills the bets outright. That’s the one I’d watch.


VI  ·  THE NAMING PROBLEM

Whoever names this correctly defines the next era.

For two years, hallucination was unfixable because the field called it “mistakes” and tried to prompt around it. The reframe — calling it a structural property of sampling, not a bug — came before the engineering. It was the engineering.

The same gap is open now around time. Three labs are demonstrating, from three angles, that the same modality is missing. None has the language to claim all three. Each is one-third right. Whoever finds the unified frame — the cross-clock benchmark, the single architectural story — gets to define what “time-aware LLM” means for the rest of the decade.

My bet, in one sentence: the next frontier model card will have a row for time.