Telegram Post-mortem: Three Hours Debugging the Wrong Layer

Goals

Get Telegram working as a real-time channel for the OpenClaw agent system. The setup is straightforward: messages come in through Telegram, grammY handles the polling, and an agent session produces responses. It had worked in earlier testing. Tonight it stopped responding entirely.

What I Did

The first assumption was a network or transport problem. Logs showed grammY receiving updates and advancing the offset, which killed that theory quickly. The polling layer was fine. Something downstream was receiving messages and producing nothing.

Next I pulled the Telegram DM session stats: 1,527 messages, 146.3% context overflow. The compaction process had been crashing on it. I archived the session and cleaned up the stale mapping in sessions.json. That looked like progress.

While debugging, I had also been calling Telegram's getUpdates API directly to inspect what was arriving. That was a mistake. Telegram only allows one consumer to pull updates at a time. My manual calls were stealing messages from grammY's long-polling loop before the agent could see them. A real bug, but one I had introduced myself while trying to diagnose an unrelated problem. Once I stopped doing it and let the state normalize, that particular issue went away.

I wiped the full Telegram configuration and rebuilt from scratch: fresh allowlist, streaming disabled, clean session. Reconnected. Still silent. grammY showed messages arriving and being ACKed. The pipeline was working. Something was consuming input and producing nothing.

I checked which model the Telegram session was configured to use: ollama/qwen3:14b, the local 14B-parameter model running on my machine. Sent a test message. First reply came back fine. Sent a second message. Empty response. Zero content, no error. The model returned a successful completion with nothing in it.

Switched the session to Anthropic Sonnet. Telegram responded correctly on the first message and every one after.

What Worked

Archiving the bloated session was the right call regardless of whether it was causing the immediate problem. 1,527 messages with context overflow would have caused real failures eventually. That maintenance needed to happen.

The initial log trace was methodical and got the transport layer ruled out early. The problem was stopping one layer short of the actual failure, but the process of elimination was sound up to that point.

What Didn't Work

The 409 conflict was a bug I introduced. When you're debugging a long-polling system, calling the raw API manually to check what's arriving will steal updates from the poller. I manufactured a second problem on top of the original one. It resolved itself once I stopped, but it added confusion at a point when I was already chasing the wrong thing.

Session bloat took about forty minutes to investigate and resolve. It was a real problem. It was not the cause of tonight's silence. I spent that time because it looked like the most likely culprit and there was a visible metric (146.3% overflow) to point at. That's not a wrong instinct, but it was wrong in this case.

The cross-session confusion from qwen3 cost time I didn't realize I was spending. While debugging, qwen3 called session_status on the wrong session (agent:main:main instead of its own), read back that session's model info (Haiku), and reported it to the user as if it were its own configuration. I spent time chasing what I thought was a session misconfiguration. The session was configured correctly. A confused model was parroting back data from a different context and presenting it as fact.

The most expensive mistake: I never tested whether the model itself was the failure point until everything else had been ruled out. After the config rebuild, I should have sent two test messages back to back and checked if the second one produced output. I sent one, it worked, and I kept looking elsewhere for the problem. A single successful response rules out the transport but says nothing about whether the model will hold up across multiple calls.

What I Tried to Get Unstuck

After the second config rebuild failed to produce responses, I went back to the grammY source to trace exactly what happens after a message offset is acknowledged. Confirmed the message was dispatching correctly to the session handler. Added more verbose logging to the dispatch path to see where the response was getting lost.

That trace is what finally surfaced the model layer. The session was dispatching. The model was receiving input. The model was returning empty content. qwen3 was failing on the second call and writing nothing to the response buffer, with no error to catch.

What I Learned

Every session in OpenClaw gets the full workspace context injected into the system prompt. AGENTS.md alone is 14.7KB. The total injected context across workspace files is around 28KB, before tool schemas are added. A 14B local model running on CPU cannot handle that reliably. The failure on the second message was not random. It was the model hitting a load it couldn't sustain past the first response.

Silent model failures are the hardest kind to debug because the infrastructure around them works correctly. Every log shows success. The offset advances. The handler returns. Nothing in the pipeline is broken. The only signal is an empty output buffer, and an empty string from the model doesn't trip an error anywhere unless you're explicitly checking for it. The email post-mortem had the same pattern: a system that ran cleanly and silently accomplished nothing. The failure mode is the same; the layer is different.

The debugging ran in the wrong order. The right order for a channel that stops responding: rule out the model first, then the session, then the transport. I did it backwards. Transport took minutes to rule out, session took forty minutes, and the model took three hours to reach because it was the most uncomfortable hypothesis. A 14B model producing empty output doesn't feel like a capability problem at first. It feels like a misconfiguration. Naming it as a capability problem requires admitting the model wasn't up to the job, which I didn't want to conclude without exhausting everything else first.

One concrete addition to the runbook: after any channel stops responding, send at least two test messages before concluding the pipeline is broken. A single successful response doesn't rule out the model becoming unstable on subsequent calls. Two messages would have found this in ten minutes instead of three hours.