Architecture briefing · Internal

Jimmy's Janice Jumpstart Journal

An honest look at what Janice does today, where it's bloated, and how we want to split it so the permanent work is durable and the temporary work can retire cleanly.

Drafted 2026-04-16 · Valeron platform team · For Jimmy Bhaktha

What Janice actually does today

Concrete enumeration — not the 6-month-old mental model, the 2026-04 reality.

You remembered Janice as the supervisory control layer — the YAML-driven rule engine that chose nav mode transitions. That's still in there, but it's now a small piece of what the service owns. Since absorbing ace_onboard_api earlier this year (and per ADR-014, the canonical pairing flow), Janice has accreted twelve distinct responsibilities across four audiences. The ~13% CPU footprint you observed is real, and the breakdown below shows where it comes from.

Responsibility
Consumed by
Est. CPU
What breaks if removed today
Safety & control (on-robot)
Rule / sequence engine (YAML state machine)
Demo sequences
~2–3%
Demo orchestration; high-level state transitions during play
Armed / disarmed circuit-breaker
Any command path
<1%
Rules engine could fire commands with no supervision
Geofence no-go / zone veto rules
Nav transitions
<1%
Robot can enter no-go zones without block (your ws will replace this)
Command execution relay
Field Control, Sequencer
<1%
No centralized audit log of commands; nav stack stands alone
Phone-on-AP gateway (the "BFF" layer)
JWT verification (offline JWKS cache)
Golfer phone
<1%
Phone can't authenticate → no access to shots, video, state
Session token lifecycle (issue/validate/revoke)
Golfer phone
<1%
Phone can't start/end sessions; sessions lost on restart
Shot webhook receiver + broadcast
Golfer phone (live feed)
<1%
Phone can't see live shots as they happen
Video / thumbnail proxy (Capture → phone)
Golfer phone
~1% (bursts)
Phone can't reach Capture Jetson directly, gallery dies
Cloud sync (shot batch upload, 5-min)
Ace Cloud
<1%
Shots stuck on robot; no web portal playback
Static file serving (golfer web app, operator UI)
Any client on AP or Tailscale
<1%
No way to load the apps
Observability / convenience
Robot state polling + caching (2 Hz /v1/nav)
Operator UI, rules engine
~1%
Your ws replaces this entirely
OAK-D camera MJPEG stream → operator PiP
Operator dashboard (one view)
~5–7%
Operator loses the PiP window — nothing else cares
Position relay to Ace Mapper (Supabase push)
KYT "Ping Robot" feature
<1%
Mapper loses live tracking dot
The headline: the single biggest CPU driver on Janice — by a factor of 3–5 over everything else — is the OAK-D camera encoding that feeds a PiP window nobody outside the operator dashboard looks at. That's not a code-quality problem. It's a "we kept a dev feature running in production" problem.

How it got here

Not making excuses — showing the accretion path so the bloat is legible.

The original Janice (2026-Q1)

A supervisory control layer. Poll nav status, evaluate YAML rules, issue transition commands. ~1,500 LOC. Named after Janus — god of beginnings, endings, transitions. The name fit.

+ Pairing absorbed from ace_onboard_api

When we retired the old onboard-API service, its responsibilities moved into Janice rather than standing up a third on-robot service. Session tokens, JWT verification, user profile fetches — all came along.

+ Shot pipeline (webhook / video / sync)

Capture Jetson posts shot_complete webhooks; phone wants live feed; cloud wants uploaded batches. Janice became the broker between three parties that can't otherwise talk to each other because the phone is on an internet-less AP (ADR-004).

+ Static serving (golfer web app, operator UI)

Janice already had an HTTP server, so serving two Svelte apps from it was "free." It's still free in the strict sense — but it's blurred the line between "what Janice orchestrates" and "what Janice hosts."

None of these accretions were wrong in isolation. But the cumulative effect is that the name "Janice" now covers two logically distinct things: the supervisory control layer you remember, and a phone-on-AP API gateway that emerged from the ace_onboard_api absorption. That's the framing we want to fix.

Our new POV: split Janice into two services

One stays Python + Janus-themed. One is a new, durable TypeScript API gateway.

Today

Janice Python · ~5,720 LOC

One Python process. One systemd unit. One codename covering everything.

  • Rules / sequences / geofence
  • JWT + session tokens
  • Shot broker + video proxy
  • Cloud sync
  • Camera PiP / state polling
  • Static file serving

After split

Sequencer Python · shrinking

The original Janice. Keeps the Janus codename.

  • Rule / sequence engine
  • Armed / disarmed safety gate
  • High-level state transitions
  • (Geofence veto → migrates to nav stack)
ace_gateway_api TypeScript · permanent

New repo. Mirrors robot-capture-service naming + conventions.

  • JWT verification, session tokens
  • Shot webhook receiver + WebSocket broadcast
  • Video / thumbnail proxy
  • Cloud sync (shots → Supabase)
  • Static serving for golfer + Field Control apps

The split is conceptually useful immediately (we name the pieces differently in conversation and docs) and physically separable at the ACL-87 trigger (the already-planned TypeScript migration for Janice). Doing the split in TS lets us carve out the Gateway cleanly while leaving the Sequencer in Python on a shrinking horizon — two rewrites at once would be a mistake.

Why this framing is load-bearing: the Gateway's responsibilities are permanent (they flow from ADR-004 — phone on AP loses internet, so someone has to be the on-robot broker). The Sequencer's responsibilities are shrinking (L4 nav will absorb most of its decision logic over time). Naming them apart lets us invest in one with confidence and retire the other without coupling.

Language policy alignment

This follows ADR-017 (Valeron Language Policy, 2026-04-14).

ADR-017 codifies our language split: realtime-core is C++, service-layer is TypeScript, Python remains only where an ML/vision library forces it or as acknowledged debt. The Janice rewrite to TypeScript is already filed as ACL-87 ("acknowledged debt, not urgent"). The Gateway carve-out is the natural moment to cash that in.

Component
Language
Policy status
Why
Nav stack
C++
Conformant
Realtime-core. Hard latency, direct hardware interfaces. ADR-017.
robot-capture-service
TypeScript
Conformant
Greenfield service-layer rewrite. Deno 2 runtime (ADR-018).
ace_gateway_api (new)
TypeScript
Conformant
Mirrors robot-capture-service. Service-layer default.
Sequencer (formerly Janice)
Python
Acknowledged debt
Shrinking horizon. Full port is disproportionate investment against a dying surface. Kept on Python by intention, not inertia.
Capture pipeline (dora-rs, perception)
Python
Conformant carve-out
ML-adjacent — torch + opencv native bindings. ADR-017 carve-out.
Web apps (cloud)
TypeScript
Conformant
Next.js/React per ADR-021 (just drafted). Cloud-hosted convention.
Web apps (robot-served)
Svelte + TypeScript
Conformant
Svelte per ADR-021. TS migration filed for the three JS holdouts.

Specifically on the Python / TS question: we are not porting the Sequencer to TS as part of this change. The rule/sequence engine is small, stable, YAML-driven, and heading toward a much smaller surface as L4 matures. Putting engineering cycles into a 4–6 week TS rewrite of a service on a shrinking trajectory is bad allocation. The Gateway — which is permanent — is where we spend the TS investment.

UI offloading

Getting work off the robot where it doesn't need to be there.

Three concrete moves, ordered by leverage:

1. Kill the OAK-D camera PiP from the operator UI

The PiP is consumed by exactly one Svelte component (CameraPiP) in one view (MainView.svelte). Not by the golfer app, not by the Capture Jetson, not by enrollment. It's a dev-convenience we forgot to turn off in production. Flipping camera.enabled: false in configs/field-test.yaml is a config-only change (hot-reloadable, no deploy) that reclaims the ~5–7% CPU we identified. Zero risk, zero code change, zero coordination.

2. Rename "Observer UI" → "Sequencer UI", move to cloud hosting

The operator dashboard has always been misnamed. It's where we build and monitor demo sequences — that's a sequencer interface. Renaming it aligns the UI with its backing service (Sequencer service ↔ Sequencer UI). After PiP is gone and the Stop button moves to Field Control's two-operator model (see below), the Sequencer UI is view-only and has no safety responsibilities — it can be cloud-hosted on Vercel/etc. without any latency concern. Removes another ~1% CPU of state-polling-broadcast load from the robot.

3. Field Control → direct nav-stack WebSocket (needs your ws)

Today Field Control routes commands through Janice. Once your ws lands on the nav stack with its own armed/disarmed safety state, Field Control can bypass Janice entirely for command execution. Sub-millisecond latency on LAN, no Janice hop. The operator dashboard becomes a two-person demo model: one operator on the Sequencer UI watching, one on Field Control with E-stop ready. Cleaner ops, lower latency, safer.

Scenarios — pick what we ship

Toggle what's in / out to see the projected state. The "impact" panel updates live.

Projected robot state

Janice CPU (est.)
~13% was 13%
Python LOC on robot
5,720 was 5,720
Distinct on-robot services
1 (Janice)
UIs served from the robot
Golfer · Field Control · Sequencer UI
Net complexity verdict
Status quo — Janice monolith as today.

What we're asking from you

Three concrete collaboration points, none urgent.

1. When you design the nav-stack WebSocket, bake in armed/disarmed state as a first-class concept.
This is the safety gate that currently lives in Janice. If the nav stack owns it internally with a clear command accepted / rejected API, Field Control can talk to the nav stack directly with more safety than today, not less. We don't want the armed-state to end up in a no-man's-land between Janice and the nav stack during the migration.
2. Help us scope geofence veto into the nav stack.
Today Janice rejects transitions that would cross into no-go zones. That logic wants to live in the nav stack — it's a spatial-awareness concern, not a supervisory-orchestration concern. When you're ready, we'd like to collaborate on the API shape so the Sequencer can hand this responsibility off cleanly.
3. Sanity-check our CPU hypothesis by measuring after the PiP kill.
We think the PiP is the dominant contributor to Janice's ~13% footprint. Flipping camera.enabled: false is cheap — a one-line config change, hot-reloadable. If we measure Janice CPU before and after, we get empirical ground truth for the rest of the migration planning. We can run that experiment any time the robot is back on Tailscale.
The bottom line: you remembered Janice as the demo sequence maker — and that's still in there, small and intact. What may not have been visible from your side is how much else it absorbed since: a phone-on-AP gateway's worth of pairing, auth, shot brokering, video proxy, and cloud sync. Our plan honors both realities — keep the original sequencer as Janice (Python, shrinking naturally with L4 maturity), and carve the accreted gateway work out as a durable TypeScript ace_gateway_api. Your nav-stack ws is the enabler that lets most of the cleanup happen.