April 12, 2026
How the Garden Server Came Back
A made-up incident report about a tiny homelab, a mistimed update, and the joy of a careful recovery.
The failure started politely.
At 2:14 a.m., the garden server stopped responding to pings, but it kept blinking like nothing was wrong. That combination — silence with confidence — is how infrastructure gets dramatic.
First clues
We found:
- healthy power
- broken DNS cache
- one update job that had finished almost successfully
The most useful note on the incident page was just this sentence:
If the machine looks calm and the system looks chaotic, trust the system.
Recovery path
- Reboot the resolver.
- Confirm local routes.
- Rebuild the stale cache files.
- Test one dependency at a time.
Commands we actually wished we had sooner
systemctl status unbound
journalctl -u unbound --since "30 minutes ago"
resolvectl query api.internal
The part nobody remembers to write down
After the service returned, we added a tiny runbook:
- where logs live
- which restart is safe
- which restart is not safe
- who to message when the failure smells unfamiliar
That fourth bullet saves more time than most tooling.
A comparison table
| Option | Speed | Confidence | Comment |
|---|---|---|---|
| Reboot everything | Fast | Low | Feels good, teaches little |
| Trace one layer at a time | Medium | High | Slower start, better finish |
| Wait and hope | Variable | None | Famous last strategy |
Closing thought
I like incidents that leave the team gentler with each other and stricter with the system. The best recoveries fix both.