The Counter That Wouldn't Move: A Serverless Function That Could Write to Redis But Not Read It
A real debugging war story. A live "visitors" counter sat frozen at a floor number while real traffic poured in. The data was fine. The token was fine. The fix wasn't where anyone would look first — and the lesson is one every serverless dev should keep in their back pocket.
The symptom
A public site showed a lifetime visitor counter on its landing page. It was stuck — pinned at the same number (313) no matter how many people visited. The owner knew the site was getting hit. The number never moved.
A frozen counter has a dozen plausible causes, so the temptation is to guess. We didn't. We verified one layer at a time, and the answer turned out to be genuinely surprising.
First instinct (and why it was wrong)
The counter was backed by a Redis store (Upstash). The obvious theory: the free tier hit its monthly cap and started rejecting operations. We'd seen a related store get rate-limited before, so this felt right.
It was wrong. A direct check from a shell proved the store was healthy:
``` $ curl -s "$URL/dbsize" -H "auth: Bearer $TOKEN" {"result":9436} $ curl -s -X POST "$URL" -d '["SCARD","presence:lifetime"]' -H "auth: Bearer $TOKEN" {"result":359} ```
The data was there. The real count was 359, not 313. The store read and wrote perfectly — from a shell. So why did the live site show the floor value of 313?
The real clue: writes worked, reads didn't
The counter code had a built-in diagnostic endpoint. Hitting it on the live deployment returned the tell:
``` { "redisConfigured": true, "scard": null, "writeProbe": { "addOk": true, "cardAfter": null } } ```
Read that carefully. On the production runtime, a write (`SADD`) succeeded — `addOk: true` — but the immediately following read (`SCARD`) came back `null`. Every time. Cold start or warm, it made no difference.
So the count was quietly growing in Redis on every visit (the writes landed), but the code could never read it back, so it fell through to a safety floor and displayed 313. The bug was invisible precisely because the fallback looked like a normal number.
Isolating it: not the token, not the shape, not the data
A write-works-read-fails split is bizarre, because both calls go through the identical code path — same URL, same token, same fetch. We ruled out every variable:
- Token? The exact production token did both `SADD` and `SCARD` flawlessly from a shell. Confirmed it was the full-access token, not a read-only one.
- Request shape? Redis REST APIs accept commands two ways: a `POST` with a JSON command array, and a path-style `GET` (`/scard/<key>`). We switched the read to the path-style `GET`. Still null at runtime. So it wasn't the request format.
- The data? The shell saw 359. The data was real.
Everything worked from a shell and from the edge. Only the serverless function runtime could write but not read. That leaves one conclusion: it was environmental to that runtime's outbound network path — something about how that platform's function egress handled the response on a read. Not a bug we could fix by changing our request; a property of where the code ran.
That is the key diagnostic instinct: when writes succeed but reads fail from one specific environment, stop debugging your request and start debugging your environment.
The fix: do the read where the read works
The site was already fronted by a Cloudflare worker (a caching reverse-proxy). Cloudflare's egress read Redis perfectly — we'd just proven it. So instead of fighting the function runtime, we moved the read to the layer that could do it.
The worker already proxied the counter's API response. We taught it to do the `SCARD` itself and splice the true number in:
```js async function handlePresence(request, env, url) { const resp = await proxyToOrigin(request, url); // get the app's JSON const base = await resp.json(); try { const r = await fetch(`${env.REDIS_URL}/scard/presence:lifetime`, { headers: { Authorization: `Bearer ${env.REDIS_TOKEN}` }, }); if (r.ok) { const n = Number((await r.json()).result); if (Number.isFinite(n)) base.total = Math.max(n, base.total || 0); } } catch { / leave the app's value if the edge read fails / } return new Response(JSON.stringify(base), { status: 200, headers: resp.headers }); } ```
The function keeps writing the count (that always worked); the edge does the read (which always worked) and overlays the real number. Within minutes the live counter jumped from 313 to the true 359 and started growing again.
Note the key handling: the Redis credentials live as server-side secrets on the edge worker — never in client code. The browser never sees a token.
Two bonus traps we hit on the way
1. "Sensitive" env vars read back empty. When we rebuilt the deployment, some secrets had been stored as the platform's sensitive (write-only) type. Pulling them returned blank, and re-piping blanks created empty vars — which silently disabled the store. Fix: set them through the API as standard encrypted vars with real values, then redeploy. If a secret "exists" but your code acts like it's missing, check whether it's a write-only var that pulled back empty.
2. The deploy that "blocked." On a free team, a deployment can be rejected because the git commit author isn't a member of the team — the platform refuses to build it. Fix: amend the commit author to the account that owns the team, then deploy. A deployment stuck in a "blocked" state with a "git author must have access" reason is this, not your code.
The takeaways
- Verify each layer independently — token, request shape, data, runtime. Guessing wastes hours; one targeted probe per layer finds it fast.
- Writes-succeed-reads-fail from one environment is an egress signal, not a code bug. Don't rewrite your request; move the operation.
- Push work to the layer that can do it. If your edge can reach a service your function can't, let the edge do that part. A reverse-proxy front is a place to fix data, not just cache it.
- Silent fallbacks hide bugs. The floor value made a broken read look like a working counter. Log when you fall back, so the failure is visible instead of disguised as a plausible number.
The counter moves now. And the next time something writes but won't read, we'll know exactly where to look.