Milliseconds
Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?
“The model observes tokens, not elapsed time.” The next architectural shift is the one that fixes that.— Tan, Tan & Soatto, Can LLMs Perceive Time? (2026)
In May 2026, Thinking Machines shipped its first model and called it “time-aware.” What Mira Murati’s team means by that: the model runs on 200-millisecond beats and senses the flow of a conversation, instead of waiting for the user to stop talking.
But “time-aware LLM” is also the title of a completely different research line. Dhingra et al. (TACL 2022) used it to mean a model that knows what was true in 2017 versus what is true in 2026 — the knowledge-cutoff problem. A 2025 survey, It’s High Time, frames it as a program for tracking fact decay, event ordering, and temporal expressions. Different clock. The unit is years.
Then late 2025, TicToc-v1 coined a third one: “temporal blindness.” Their target was the agent loop — does the agent know that two hours passed between your message at 9 AM and its tool call at 2 PM? Usually no. The finding that matters: just dropping timestamps into the prompt nudged accuracy from ~50% to ~65%, then plateaued. Prompting can’t fix it. The architecture has to be retrained.
Three labs. Three meanings of the same phrase. None of them cite each other. People keep treating the overlap as coincidence.
It’s not coincidence. The field is rediscovering, in three places at once, that the same modality is missing.
Put them side by side.
Three clocks run inside any deployed language model. Right now, almost no system handles more than one of them. TML’s release made this visible by claiming the word “time-aware” for the fastest of the three — and, in the process, exposing how little anyone has touched the other two.
Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?
Does the model know that its training cutoff is two years stale, and can it reason explicitly about when a fact was last true?
Inside a long-horizon agent loop, does the model know how much wall-clock time has elapsed since the user’s last input, and act accordingly?
None of these communities treats the others’ problem as the same as theirs. Interaction-clock people think agent reasoning is out of scope. Knowledge-clock people don’t think about real-time streams. Agent-clock people aren’t worried about sub-second perception. Each one is half-blind to two-thirds of the cognitive failure they all describe.
What TML did, basically, was take a phrase that belonged to two of these groups and use it to mean the third. They forced the conversation.
Joscha Bach, the cognitive scientist, has the cleanest version of why this matters: the perception of time, as duration, reference to events, and reference to internal clocks, is a feature of cognitive processing, and not an alternative to physics. Translated for the LLM stack: a system that can’t perceive duration isn’t building the same kind of internal model as one that can. Today’s LLMs talk about time the way GPT-3 talked about images it had never seen. They’ve read about it. They haven’t measured it.
Tan, Tan & Soatto make the same diagnosis empirically: the model observes tokens, not elapsed time. It does not directly perceive wall-clock duration while generating, and it does not accumulate sensorimotor memories. The 5–10x error on self-time estimation is Bach’s point made measurable. Compare hallucination, the field’s defining failure of 2022–2024. Initial framing: prompt-engineering bug. Eventual framing: structural property of next-token sampling. The reframe came before the engineering, not after. Temporal blindness is on the same arc. It looks at first like a missing timestamp. It’ll turn out to be a missing perceptual dimension.
Three bets. Each has a horizon and a named falsifier. I think each one is more likely than not. None of them is obvious yet.
The 60.4-point gap between TML (64.7) and GPT-realtime-2 (4.3) on TimeSpeak is the first time anyone has measured the deficit head-on. It’s wider than any current MMLU differential. Right now, almost no production benchmark scores it. My guess is that by 2031, time-perception will sit next to vision on every frontier model card — and the absence of it will read the way “text-only” reads today: a model from an earlier generation.
TicToc-v1 already has the data: prompt-level timestamps move agent alignment from ~50% to ~65%, then plateau. Tan-Tan-Soatto’s companion result — 5–10x error on self-time estimation — points to where the failure actually lives: inside the model’s implicit world model, not at the surface of the prompt. Same arc hallucination followed: prompt fixes plateau, structural fixes work. I’d bet temporal awareness gets reframed from a prompting problem to a training-objective problem within two years.
TML’s under-discussed contribution isn’t the 200-ms beat — Moshi did that in 2024. It’s the dual-clock split: a foreground interaction model on a millisecond budget, an async background model that can take minutes for tool use and sustained reasoning, results flowing back into the foreground’s context. The mature version, the one a frontier lab ships by 2028, runs three clocks: interaction, agent, knowledge. Probably a fourth too — the within-session continual-learning clock Karpathy keeps talking about. The architecture won’t pick a clock. It’ll run them concurrently and arbitrate.
(i) Kingy AI’s Bitter-Lesson reversal: scaling eats hand-crafted time-awareness the same way it ate hand-crafted search. (ii) Sean Goedecke on TML’s dual-clock: a slow background model can contradict what the foreground has already spoken aloud, and no released paper says how to resolve it. (iii) Yann LeCun: time-awareness is a downstream symptom of the deeper missing capability, causal physical prediction; the answer is World Models. (iv) Andrej Karpathy: the hardest version of “time-aware” is continual learning across sessions, which none of the three labs in this piece is doing.
The bets survive (ii)–(iv). If LeCun is right about World Models, time-awareness is a near-term component of his program, not a competitor. If Karpathy is right about continual learning, time-awareness is a precondition to it. Only (i) — the Bitter Lesson reversal — kills the bets outright. That’s the one I’d watch.
For two years, hallucination was unfixable because the field called it “mistakes” and tried to prompt around it. The reframe — calling it a structural property of sampling, not a bug — came before the engineering. It was the engineering.
The same gap is open now around time. Three labs are demonstrating, from three angles, that the same modality is missing. None has the language to claim all three. Each is one-third right. Whoever finds the unified frame — the cross-clock benchmark, the single architectural story — gets to define what “time-aware LLM” means for the rest of the decade.
My bet, in one sentence: the next frontier model card will have a row for time.