Monitoring & health
A sign is healthy when the kiosk is connected, the assigned URL is reachable, and the device telemetry is within normal ranges. The dashboard surfaces all three on every sign detail page; this page is the reference for what each indicator means.
Heartbeats
Every claimed sign sends a heartbeat to the backend over the WebSocket every 5 seconds. Each heartbeat carries:
status— online / offline / error / maintenancecurrentUrl— what the kiosk is currently displayingdeviceInfo— platform, OS, MAC, IP, Tailscale IP, hostname, screen resolution, free disk, RAM, CPUappVersion— desktop sign versionuptime— seconds since the kiosk bootedcacheState— whether content is cached locally for offline operation
The backend writes the latest heartbeat into Redis (for fast reads) and a rolling window into PostgreSQL (for the uptime timeline).
Online / offline transitions
The dashboard transitions a sign between Online and Offline based on heartbeat freshness:
| State | Trigger |
|---|---|
| Online | Heartbeat received within the last 15 seconds |
| Offline | No heartbeat for 15+ seconds (3 missed in a row) |
| Online (recovered) | Was offline, now received a heartbeat — flagged briefly as "recently reconnected" |
Why 15 seconds? Heartbeats fire every 5 seconds, so 15 seconds is exactly 3 missed in a row — strong enough to filter transient packet loss, fast enough that you find out about a real disconnect within the venue's typical "is something wrong?" reaction time.
The backend stores each heartbeat in Redis with a 30 s TTL and runs a background sweep every 10 s to mark expired sign records offline. So a true offline transition can take up to ~25 s to surface — heartbeat goes silent at T=0, Redis entry expires at T+15 s, the sweep next runs by T+25 s. Notifications are deferred a further 60 seconds to give the sign a chance to reconnect — see Notifications for why.
Sign states
A sign is in exactly one state at a time. The dashboard renders each with a consistent color:
| State | Color | Meaning |
|---|---|---|
| Online | Green | Connected, heartbeating, content displaying |
| Offline | Yellow | Heartbeat is stale (no signal for 30+ s) — likely a network or kiosk problem |
| Error | Red | Sign reported an explicit failure (e.g., couldn't load assigned URL) |
| Maintenance | Blue | Operator-controlled state. Ctrl+Shift+Q on the kiosk exits the sign app for maintenance (the watchdog won't relaunch while the .maintenance sentinel is present). The maintenance state on the dashboard is set by the dashboard itself, not by the kiosk's heartbeat. See Crash recovery. |
| Unlinked | Grey | Sign record exists but no physical device is linked yet |
The color is consistent across the dashboard sign grid, the sign detail page, the mobile app, and notification badges.
State transitions are written to the Audit tab so you can answer "when did this sign go offline?" without grepping logs.
The "Monitoring" badge
A sign in monitoring mode shows a Monitoring badge on its dashboard card alongside the state color. The orthogonal mode field on the heartbeat carries 'monitoring' or 'active' — see Sign states → Orthogonal mode field. When the badge is showing:
- The sign is healthy (heartbeat is current; sign is online)
- The wall is intentionally dark — display hidden, audio muted
- This is not a failure to escalate; the operator put the sign in this mode
Toggle off via the dashboard's Exit Monitoring button or Ctrl+Shift+M at the kiosk. See Remote control → Monitoring mode and Hotkeys.
Uptime tracking
The dashboard computes uptime two ways:
- Per-sign uptime % for the lifetime of the event:
online time / event time - Per-event uptime %: average across all signs in the event
You'll see both on the event detail page. A few patterns worth recognizing:
- >99% is normal for a properly-deployed event
- 95-99% typically reflects venue Wi-Fi flapping rather than kiosk failure — the wall is up, the dashboard just sees the connection drop briefly
- Below 95% suggests genuine trouble — either a network problem you can fix or a sign in a flaky state
Uptime resets at the start of an event, so historical events keep their stats and a new event starts fresh.
Device info
Every heartbeat carries device telemetry. The dashboard surfaces it on the Device info card:
| Field | Source |
|---|---|
| Platform / OS version | os.platform() + os.release() |
| Hostname | os.hostname() — useful when you set custom names like LOBBY-SIGN-01 |
| MAC | First non-internal NIC at first boot (stored, doesn't change) |
| Local IP | Current primary interface IP |
| Tailscale IP | tailscale ip -4 if installed, blank otherwise |
| Screen resolution | Per the primary display |
| CPU / RAM / Free disk | Snapshot at heartbeat time |
| Sign app version | Build version of the desktop sign |
| Uptime | Seconds since the sign app launched (reset by Reboot app or Reboot device) |
Telemetry is for triage, not surveillance — use it to answer "is this sign stuck?" or "did somebody reboot the device an hour ago?" not for performance dashboards.
Content reachability
Independent of the kiosk's connection to us, the kiosk monitors whether the assigned URL is reachable by doing an HTTP HEAD every 60 seconds. The dashboard reports this on a per-sign and per-event basis:
- Reachable — last HEAD succeeded (2xx or 3xx)
- Unreachable — last HEAD failed (timeout, 4xx, 5xx, DNS failure)
Reachability state is independent of the sign's online/offline state:
- A sign can be Online but with Unreachable content — the kiosk reaches us, but its content origin is down
- A sign can be Offline with Reachable content (last known) — the kiosk lost its WebSocket but the content URL was working at last check
When content goes Unreachable, the kiosk continues displaying the cached version and notifies subscribers (see Notifications). The wall doesn't blank — you have time to fix the content side without an audience seeing the failure.
Local diagnostics on the kiosk
Sometimes you want to look at health from the sign's side rather than the dashboard's. With keyboard access to the kiosk, press Ctrl + Shift + S to open the Status Dashboard overlay on the kiosk itself:
- Connection status (WebSocket state, last heartbeat sent, last command received)
- Sign ID, short code, MAC
- Backend URL, WebSocket URL
- IP addresses (LAN + Tailscale)
- Cache status (items cached, size, last sync)
- Recent error count
Press Esc to dismiss. This overlay is also what techs press when triaging a misbehaving sign in person — answers "is this device even reaching the backend?" without leaving the venue.
When to escalate
A few patterns and what to do about them:
| Pattern | What it means | Action |
|---|---|---|
| One sign offline for >2 minutes | Kiosk-specific — the others are fine | Troubleshoot offline |
| Multiple signs offline at once | Network or backend issue | Check venue Wi-Fi first. Multiple signs across multiple venues going offline at the same time usually means a backend incident — we'll email an alert if so |
| All signs online, content unreachable | Your content URL is down | Fix content side, or assign a fallback URL |
| Sign cycling online/offline rapidly | Wi-Fi flapping or kiosk DNS issues | Network resilience |
| Sign in Error state | Content failed to load | Fetch logs, look for the failed URL or HTTP error |
Reach for Troubleshooting for symptom-by-symptom playbooks.
What's next
- Notifications — push-based alerts for the state transitions covered here
- Remote control — the commands you'll typically pair with monitoring (fetch logs first, then refresh or reboot)
- Reference → Sign states — exhaustive state transition table