NU · neighbordoorsrecords over spin
Open in NU's Reading Room →

Same Car, Next Frame: How Software Keeps One ID on a Moving Car

Picture standing on an overpass at rush hour, asked to count how many distinct cars pass under you in ten minutes. Easy enough — until two silver sedans cross paths, one ducks behind a truck for a second, and you lose the thread. Did three cars just pass, or did you double-count the same one twice? That flicker of doubt, the moment you stop trusting your own eyes, is exactly the problem software faces frame by frame. And it's a much harder problem than people assume.

Detection is not tracking

Most people picture computer vision as "the box around the car." That part — detection — is the solved-feeling half. A modern detector looks at a single still image and answers one question: what objects are here, and where? It draws a box, slaps on a label ("car," 0.94 confidence), and moves on. It has no memory. Run it on frame 199 and frame 200 and you get two completely independent answers, with no idea that the box on the left in both frames is the same vehicle.

Tracking is the part that adds memory. Its job is to assign a persistent ID — call it car #7 — and keep that same ID glued to that same car across hundreds of frames, even as the car moves, shrinks into the distance, gets briefly hidden, or drives near three other cars that look identical. Detection asks "what's in this picture?" Tracking asks "is this the thing I was already watching?"

That second question has no pixels to read off. It has to be inferred.

The core trick: predict, then match

Nearly every tracker, from the classic SORT (Simple Online and Realtime Tracking) to its widely-used successor ByteTrack, runs the same two-step loop on every new frame.

Step one: predict. Before even looking at the new frame, the tracker guesses where each car it already knows about should appear. A car that was moving right at a steady clip will probably be a bit further right. SORT does this with a Kalman filter, a decades-old piece of math (it helped guide Apollo spacecraft) that models position and velocity and produces a best estimate of the next location. It's not magic — it's just "things in motion tend to keep moving the same way," written down formally with an honest accounting of uncertainty.

Step two: match. Now the new detections arrive. The tracker has a set of predictions (where my known cars should be) and a set of fresh boxes (what the detector actually found). It has to pair them up: which new box belongs to which existing ID? This is called data association.

The usual measure of "do these two boxes refer to the same car" is IoU — Intersection over Union — basically how much the predicted box and the detected box overlap. High overlap, probably the same car. The matching itself is often solved with the Hungarian algorithm, a clean method for finding the best overall set of pairings rather than greedily grabbing the first decent match. New box with no good match? Start a new ID. Predicted car with no box? Mark it missing, and remember it for a few frames in case it comes back.

Why ByteTrack's small idea mattered

Here's a detail that sounds boring and turns out to be the whole game. Every detection comes with a confidence score, and the obvious move is to throw away the low-confidence ones — they're often junk. SORT did exactly that.

But think about a car driving behind a lamppost. For a few frames it's half-hidden, so the detector's confidence drops to, say, 0.3. Toss those low boxes and the car vanishes from tracking, the ID dies, and when it re-emerges clear it gets a brand-new ID. To a counting system, one car just became two.

ByteTrack's contribution, published in 2022, was almost embarrassingly simple: don't throw the low-confidence boxes away — use them in a second matching pass. First match the strong, confident detections. Then, for tracks still left unmatched, try to associate them with the leftover weak boxes. A low-confidence box that lands right where you predicted a known car is probably that car, just temporarily obscured. That one change pushed ByteTrack to the top of public tracking benchmarks, and it did it without a fancier detector — just by being smarter about the evidence it already had.

Why this is genuinely harder than detection

A few reasons the "same ID" problem stays stubborn:

The takeaways

Next time a sign flashes "12 cars in queue," remember there's a quiet loop behind it, guessing where each car will be and checking whether the thing it sees is the thing it was already watching. The box is the easy part. Keeping the name is the trick.

NU original — sourced analysis of the public record. Read it in the interactive Reading Room, or browse more at neighbordoors.com.

Transparency: NU articles are AI-assisted and editor-reviewed, built from the cited primary sources. We label what's proven, alleged, and opinion.