Milliseconds
Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?
“The model observes tokens, not elapsed time.” The next architectural shift is the one that fixes that.— Tan, Tan & Soatto, Can LLMs Perceive Time? (2026)
In May, Thinking Machines released their first model and called it “time-aware.” What they mean by this is specific. The model wakes up every 200 milliseconds, senses whatever is happening in the conversation, and reacts to it. It does not wait for the user to finish talking before doing anything.
The phrase “time-aware LLM,” though, has been around in other research lines for a while. Going back to 2022, a TACL paper by Dhingra and colleagues used the phrase for a model that knows what was true in 2017 versus what is true today — the knowledge-cutoff problem. A 2025 survey called It’s High Time treats it as a research program for tracking fact decay, event ordering, and temporal expressions. The time scale here is years, not milliseconds.
In late 2025, the TicToc-v1 paper coined another phrase: “temporal blindness.” The concern this time is the agent loop — does the agent register how much wall-clock time has passed between your message and its tool call? Usually not. The finding worth pausing on is that adding timestamps into the prompt only moves agent-alignment accuracy from about 50% to about 65%, and then it stops. Prompting can’t fix it.
Three research camps. Three meanings of the same phrase. None of them cite each other.
I don’t think this is a coincidence. The same modality is being rediscovered in three places at once.
It helps to lay them out next to each other.
Three clocks run inside any deployed language model. Right now, almost no system handles more than one of them. TML’s release made this visible by claiming the phrase “time-aware” for the fastest of the three — and, in the process, making the gaps on the other two harder to ignore.
Can the model perceive the live flow of an audio or video stream, mid-utterance, without waiting for a turn boundary?
Inside a long-horizon agent loop, does the model know how much wall-clock time has elapsed since the user’s last input, and act accordingly?
Does the model know that its training cutoff is two years stale, and can it reason explicitly about when a fact was last true?
None of these communities treats the others’ problem as the same as theirs. The interaction-clock researchers consider agent reasoning out of scope. The knowledge-clock researchers do not think about real-time streams. The agent-clock researchers are not worried about sub-second perception. Each of them is half-blind to two-thirds of the cognitive failure they all describe.
What TML did, in effect, was take a phrase one of these communities had already been using and apply it to a different problem. The three communities are now in a position where they have to look at each other.
I think the right way to understand this is that today’s LLMs don’t perceive time. They read about time. The distinction matters more than it might first appear.
The cleanest framing I have seen is from Joscha Bach: the perception of time, as duration, reference to events, and reference to internal clocks, is a feature of cognitive processing, and not an alternative to physics. So a model with no duration sense is missing a piece of the cognitive stack, even when it can describe time fluently. The analogous gap is GPT-3 with images. GPT-3 could write paragraphs about a sunset. It had never seen one.
A paper from earlier this year, Tan-Tan-Soatto, makes the same diagnosis empirically: the model observes tokens, not elapsed time. They measure the gap directly — when asked to estimate their own task duration, LLMs are off by 5–10×. That is Bach’s philosophical point made measurable.
It helps to compare this to hallucination, the field’s defining cognitive failure from 2022 to 2024. The first wave of fixes was prompt-side: RLHF, instruction tuning, sampling tricks. None of them worked structurally; the rate kept plateauing.
What eventually worked was the reframe — treating hallucination as a structural property of next-token sampling rather than a bug to be patched. The reframe came before the engineering progress, and in many ways the reframe was the engineering progress. Once the problem was named correctly, the fixes followed.
Temporal blindness is on the same arc. Right now it looks like a missing timestamp. I think in two years the framing will be that it is a missing perceptual dimension.
Three bets follow from the framing, each with a horizon and a named falsifier. I think each is more likely than not. None of them is obvious yet.
TicToc-v1 already has the data: prompt-level timestamps move agent alignment from ~50% to ~65%, then plateau. Tan-Tan-Soatto’s companion result — 5–10x error on self-time estimation — points to where the failure actually lives: inside the model’s implicit world model, not at the surface of the prompt. Same arc hallucination followed: prompt fixes plateau, structural fixes work. I’d bet temporal awareness gets reframed from a prompting problem to a training-objective problem within two years.
The 60.4-point gap between TML (64.7) and GPT-realtime-2 (4.3) on TimeSpeak is the first time anyone has measured the deficit head-on. It’s wider than any current MMLU differential. Right now, almost no production benchmark scores it. My guess is that by 2029, time-perception will sit next to vision on every frontier model card — and the absence of it will read the way “text-only” reads today: a model from an earlier generation.
The interaction thesis and the embodied-AI thesis look like rival paradigms. Murati says how we work with AI matters as much as how smart it is. LeCun says language manipulation isn’t intelligence; we need World Models. But the two camps are diagnosing the same gap from different sides — that LLMs are decoupled from the world. dgallitelli95 already framed TML’s release as “accidentally proving the embodied-AI thesis,” and LeCun’s AMI Labs launched in April. The World Model side has the deeper apparatus — predictive systems need duration as a building block, not a wrapper. I’d bet AMI Labs or a comparable World Model program publishes work within 24 months that treats time-aware interaction as an architectural primitive, and the interaction-vs-autonomy framing stops being the relevant axis.
I will try to steelman each of them.
I think the bets survive objections 2, 3, and 4. If LeCun is right about World Models, time-awareness is a near-term piece of his program rather than a competitor to it. If Karpathy is right about continual learning, time-awareness is a precondition for the harder problem rather than a substitute. The Bitter-Lesson reversal is the only one that kills the bets outright. That is the one I would actually watch.
Hallucination was unfixable for about two years. People tried prompt engineering, retrieval augmentation, sampling tricks. None of it worked structurally. What eventually worked was the reframe: calling hallucination a structural property of sampling rather than a bug to be patched. The reframe came before the engineering progress, and in many ways the reframe was the engineering progress.
The same gap is open now around time. Three labs are demonstrating, from three angles, that the same modality is missing. None of them has the language to claim all three. Each of them is one-third right. Whoever finds the unified frame — the cross-clock benchmark, the single architectural story — gets to define what “time-aware LLM” means for the rest of the decade.
My bet, in one sentence: the next frontier model card will have a row for time.