Stability over features

This page captures the philosophy that informs the rest of the docs. Most of DisplaySync's design choices look conservative compared to a typical SaaS — slow update cadences, image-first deployment, freeze windows, manual remote commands. They look that way until 6 AM the morning of a show, when every conservative choice is the reason the wall is still up.

If you're an AV company evaluating DisplaySync, this is the page that explains why the deployment guide looks like it does. If you're an event manager wondering why a feature you'd expect (auto-update, hot-reload) is intentionally absent, this is the rationale.

Verified product behavior; recommendations need team vetting

The conservative-defaults inventory below describes actual product behavior (heartbeat cadence, offline thresholds, command tokens, etc.) verified against the codebase. The operational recommendations that pair with it — "cut the production tag a week before load-in," freeze-window practices — are suggested patterns rather than DisplaySync-mandated policies. Treat them as defaults to adapt for your team.

The default is "boring"

For a feature to ship in DisplaySync's hot path, it has to fail safely. That eliminates a lot of patterns that work fine for office software:

Auto-update during events — works fine on a developer's laptop. Disastrous when an update is downloading mid-keynote.
Background reloads — fine for a dashboard. Catastrophic when "background" includes a wall in front of an audience.
Optimistic UI on the kiosk side — fine for snappy interaction. Bad when "optimistic" means showing wrong content for 5 seconds.
Continuous deployment to production kiosks — fine for a SaaS. A non-starter for a frozen event tag.

The defaults below are the result of saying no to features that would be reasonable in any other context.

Image-first deployment

The deployment guide is sequential and detailed because the image is the contract. Once a kiosk is imaged, it's expected to run unattended for months. Every configuration that needs to be right has to be right at image-build time, not patched in the field.

Why this matters:

A kiosk you can't reach over Wi-Fi is a kiosk you can't fix remotely.
A configuration drift across a fleet of 50 signs is 50 different problems to debug.
An image that boots into kiosk mode in 60 seconds saves you from thousands of "is the wall up yet?" questions during load-in.

The image-first approach is more upfront work than "install the app and configure later." It pays back the first time you deploy 30 signs to a venue and they all just work.

See Windows kiosk image for the full guide.

Production tag freezes during events

The desktop sign has a production tag that doesn't move while events are running. Backend deploys preserve compatibility with the currently-shipping desktop-sign version, so the backend can keep shipping while frozen production kiosks keep running.

Mechanics:

Production tag: the version in your captured image. Moves only when you re-image.
Staging tag: ahead of production. New features, bug fixes, in-progress work. Used for non-event installs and dogfooding.
Backend ↔ desktop wire compat: the backend stays compatible with the currently-shipping desktop-sign version. The freeze window below is what you should rely on, not the compatibility window.

The freeze is an organization-level discipline, not enforced by the product. We recommend treating it that way:

Cut the production tag 10 days before load-in — see Live event checklist → T-10 days. Test on staging during that window.
Don't update production during the event window. Even if a critical fix lands. The fix can wait; an unstable update during a keynote can't be undone.
Re-image after the event — staging features land on the next production tag, image regenerates, fleet rolls forward at the next deployment.

This is conservative in the way airline software is conservative. The cost of the discipline is small; the cost of breaking it once is large.

See Releases for the versioning policy.

Push-to-deploy isn't on the kiosk side

Backend ships via push-to-deploy continuously. Web dashboard the same. Mobile app via app stores.

Desktop sign does not. A production tag moves only when:

You explicitly run an update command from the dashboard, or
You re-image with a newer captured image

This is the right asymmetry for fleets that depend on uniformity. The kiosks aren't where you want surprise.

Conservative defaults inventory

A non-exhaustive list of the choices that pair with the philosophy:

Choice	Conservative because...
Webpage-only content in v1	A webpage is a known browser surface. Video / image / playlist content adds rendering complexity and we'd rather get it right in v2 than ship it half-baked in v1.
5-second heartbeat	Tighter intervals mean more network chatter and faster offline detection — but also more false positives from venue Wi-Fi flapping. 5 s strikes a balance we've validated across many events.
15-second offline threshold	Three missed heartbeats. Fast enough to find real issues; slow enough to filter blips.
60-second notification deferral	A sign that goes offline and comes back within ~75 s never wakes anyone up. Deliberate trade-off — slightly slower critical alerts in exchange for vastly less notification noise.
Cached offline display	The wall keeps showing correct content during outages. Optional in some signage products; load-bearing in DisplaySync.
Token-signed destructive commands	Reboots, fetches, updates carry a 30 s replay-protected token. Adds latency you'd notice if you were debugging interactively; eliminates a class of cross-org attacks.
No bulk delete from the API	Bulk operations require an explicit dashboard action. No accidental "delete every sign in this org" via a misconfigured script.
403 with the same message for permission denied AND not-found	You can't enumerate signs you don't have access to. Slightly worse error UX in the rare case; eliminates a security flag.
Tier limits enforced server-side, not just client-side	Doesn't matter for the dashboard, where client-side limits are fine. Matters for the eventual API where clients can ignore the dashboard.
Staging environment available	A `staging.displaysync.live` mirror lets customers test changes safely before applying to production. Actual cost; actual value.

Each of these is small in isolation. Together they're why fleets running DisplaySync don't have surprise incidents during live events.

When to break the rules

Not all defaults are sacred. Times when reaching past the conservative defaults is right:

Pilot deployments and demos — Pilot tier exists for a reason; not every install is a live event.
Internal staging environments — break things on staging on purpose. Discover failure modes before production sees them.
Single-user setups — solo organizers running a tiny event don't need fleet-scale conservatism.
Post-event experimentation — try the new desktop sign release on archived hardware after a show. Catches incompatibilities before the next one.

The right read isn't "always be conservative" — it's "be deliberate about which defaults you're departing from and why."

How this shapes the docs

A few corollaries that show up across these docs:

Deployment is detailed and sequential. Image-first means every step matters at build time.
Operations is task-first, not feature-first. Day-of work is a checklist, not a tour of every dashboard widget.
Troubleshooting is symptom-indexed. When something breaks, search by symptom — fast retrieval matters more than encyclopedic coverage.
Best practices exists. A tour of the why behind the what. AV companies evaluating us read this section to decide if they trust our judgment.

If a doc page feels too cautious or too sequential — that's the philosophy. The cost of cautious is some friction on routine work; the cost of un-cautious shows up on the worst day of the year.

Self-healing signs

We ship three in-process watchdogs because event-day reliability beats feature velocity. The scheduled-task watchdog catches the case where the sign app crashes entirely. The three in-process watchdogs catch the cases where the app is running but broken: blank frames, frozen JS, renderer process death.

Watchdog	What it detects	Cadence	Threshold	Action	Analytics event
FrameWatchdog	White / blank frame	Every 5s	95% white in 5×5 grid sample	`invalidate()` → `reload()`	`frame_anomaly`
PageWatchdog	Frozen JS / unresponsive renderer	Preload pings every 10s	30s freeze	`reload()`	`page_frozen`
GpuMonitor	Renderer process crash	Event-driven	Renderer crash	1s deferred `reload()`	`process_crash`
GpuMonitor	GPU/Utility/Plugin crash	Event-driven	(any)	Logged only, no auto-reload	`process_crash` (with `kind` field)

A 60-second RecoveryCooldown is shared across the three watchdogs. If a fault triggers multiple watchdogs simultaneously (a renderer crash that also produces a white frame), only one reload fires; the others are suppressed for the cooldown window. This prevents reload loops when something keeps re-failing immediately.

The watchdogs are paused while monitoring mode is on — see Monitoring mode as a stability tool below.

To verify the watchdogs work on a specific hardware target, run Test 9 in the testing-the-image guide before Sysprep.

Monitoring mode as a stability tool

Monitoring mode is the surface for "I want this sign telemetered but visually off." Reach for it when:

Storage / shipping — kiosk powered (so you can confirm health remotely) but display dark
Pre-event venue prep — sign is in place, audio muted, display hidden, you'll flip them all on at showtime
Wrong-content emergency — a sign is showing the wrong thing and you can't fix the URL fast enough; flip it to monitoring while you sort the dashboard side, telemetry keeps flowing so you know it stays healthy

For full mechanics see Remote control → Monitoring mode. The stability framing: monitoring mode is a way to say "this sign is intentionally not displaying — don't escalate" without taking it offline.

The dashboard surfaces this with a Monitoring badge on the sign card so on-call engineers don't get woken up over an intentionally-dark wall.