April 12, 2026

How the Garden Server Came Back

A made-up incident report about a tiny homelab, a mistimed update, and the joy of a careful recovery.

The failure started politely.

At 2:14 a.m., the garden server stopped responding to pings, but it kept blinking like nothing was wrong. That combination — silence with confidence — is how infrastructure gets dramatic.

First clues

We found:

  • healthy power
  • broken DNS cache
  • one update job that had finished almost successfully

The most useful note on the incident page was just this sentence:

If the machine looks calm and the system looks chaotic, trust the system.

Recovery path

  1. Reboot the resolver.
  2. Confirm local routes.
  3. Rebuild the stale cache files.
  4. Test one dependency at a time.

Commands we actually wished we had sooner

systemctl status unbound
journalctl -u unbound --since "30 minutes ago"
resolvectl query api.internal

The part nobody remembers to write down

After the service returned, we added a tiny runbook:

  • where logs live
  • which restart is safe
  • which restart is not safe
  • who to message when the failure smells unfamiliar

That fourth bullet saves more time than most tooling.

A comparison table

OptionSpeedConfidenceComment
Reboot everythingFastLowFeels good, teaches little
Trace one layer at a timeMediumHighSlower start, better finish
Wait and hopeVariableNoneFamous last strategy

Closing thought

I like incidents that leave the team gentler with each other and stricter with the system. The best recoveries fix both.