From Click-to-Go to Programmable Missions: Routes, Actions & Detection (Part 2)

In Part 1 I covered the foundation: a web UI showing live costmaps, point clouds, and a 2D goal navigation flow for a Unitree G1 humanoid. That works great for one-off "go here" commands. But real robot work isn't single goals — it's sequences. "Walk to position A. Stand up. Play an audio greeting. Walk to position B. Dock to an AprilTag. Pick up the box. Move to position C. Put it down."

That's a route with actions. This is the part where the system turns from a remote-control toy into something that can actually run a demo on its own.

Like Part 1, everything in this post is built from scratch — the route editor UI, the waypoint state management, the action library on the robot, the WebSocket protocol that ties it all together, even the Detectron2 inference and detection persistence pipeline. No off-the-shelf "robot mission planner" was harmed in the making of this blog.

Why routes?

Click-to-navigate is fantastic for tele-op. But the moment you have a multi-step demo — say, an inspection round where the robot stops at five points, scans something at each, and reports back — you can't be the one clicking. You need to define the route once and run it later.

I needed:

A way to place waypoints by clicking on the map
A way to attach actions to each waypoint (audio, body movements, manipulation)
A way to save and reload routes from the backend
A way to execute routes and watch progress in real time
Recovery options when something fails (because something always fails)

The Route Creation page

Route Creation page with a pre-loaded route, waypoints, side panel, and yellow dashed route line

Route editor — 3D scene on the left, waypoint side panel on the right, route polyline in yellow

This is the route editor. You can see:

The 3D scene on the left — same Three.js viewer as Part 1, with the costmap, point cloud overlay, and waypoint markers (green arrows with labels)
The yellow dashed line — the route polyline connecting waypoints in execution order
The side panel on the right — an editable list of waypoints, each expandable to show its actions
The bottom toolbar — Save / Load / Execute / Stop / Pause buttons

To place a waypoint: enable the Add Waypoint tool, click on the map where you want it, drag to set the heading, release. To reposition: with no tool active, left-click drag the marker. To rotate: right-click drag. To reorder: drag the rows in the side panel. Delete: trash icon per row.

Routes are persisted as JSON in the backend (scans/<map>/routes/<name>.json). The schema went through a refactor recently — it's now segment-based to support an external orchestrator API, but old waypoint-flat routes auto-migrate at load time so nothing breaks.

The action library

Each waypoint has a list of actions that run after the robot arrives. The action library covers the basic categories you need to script a real demo:

Timing — simple pauses for N seconds (the most-used action, by a wide margin)
Movement — closed-loop forward/backward translation by an exact distance, closed-loop rotation by an exact angle, plus open-loop duration-based fallbacks for when localization gets flaky
Body posture — enter walk-ready state, sit/park, via the standard LocoClient calls in the SDK
Audio — stream an audio file through the onboard speaker via the SDK's audio client
Manipulation — vision-guided pick-and-place using a depth camera, plus AprilTag-based docking that aligns yaw and approaches a fixed offset

Every one of these sits behind a clean async API on the robot — handwritten glue between ROS topics and the unitree_sdk2py SDK.

The frontend doesn't need to know how any of them work. It just sends a JSON action dict over WebSocket and the executor on the robot picks it up and runs it.

Live execution

Here's a full waypoint route in action:

Creating a 3-waypoint route, attaching actions, and executing it end to end

What's happening in the video:

I create 3 waypoints by clicking on the map and dragging for orientation
For each waypoint, I attach an action — in this demo, simple wait and movement actions to keep things visible
I link the route to the backend execution API (the route is sent over WebSocket as a route_start message with all segments + initial pose)
I hit Execute — the robot starts navigating to the first waypoint via move_base
The side panel shows live progress — a yellow pulsing dot on the active waypoint, green dots on completed ones, gray on pending
At each waypoint the robot stops, runs its actions, then continues to the next
When all 3 are done the panel shows "Route completed"

The progress feedback comes from the executor on the robot sending progress messages back through the WebSocket relay every time the phase changes (navigating / action / pre_action / post_action).

Robustness — because navigation fails

You can't ship a robot demo where one failed waypoint kills the entire route. move_base aborts goals all the time — wheels slip, costmaps update, the robot oscillates. So I added a recovery layer:

Pause / Resume
At any point during execution, hit Pause. The current move_base goal is cancelled, the robot stops, and the executor blocks. Hit Resume and it re-issues the same goal and keeps going. The state survives — your route doesn't reset.

Retry / Skip
When move_base returns ABORTED for a waypoint, instead of failing the whole route, the executor sends a route_paused message with the failed waypoint index. The UI shows an amber pulsing dot on that waypoint, and below the row two buttons appear: Retry (re-issue the goal — useful when the obstacle has moved) and Skip (mark this waypoint as done and move to the next one — useful when the goal is genuinely unreachable).

Stop
The big red button. Cancels everything, stops the robot, drops the WebSocket. Use when something is wrong and you don't want to debug right now.

All of this is wired up via three additional WebSocket message types: route_pause, route_resume, route_retry, route_skip. Each maps to an asyncio.Event in the executor that the navigation loop polls between move_base goal attempts.

Object detection — Detectron2 in the loop

When the pick action runs, the executor needs to know where the object actually is. The only sensor that knows is the RealSense camera on the robot's head, and the only way to find a specific object in an RGB image is a vision model.

I integrated Detectron2 running directly on the Jetson with a custom-trained model. The flow is the standard "RGB-D object pick" recipe:

Grab the latest frame from /camera/color/image_raw
Run Detectron2 inference and pick the target detection
Look up depth inside the bounding box from /camera/aligned_depth_to_color/image_raw and deproject the bbox center into a 3D point in the camera frame
Apply the head-tilt rotation to convert into the robot's body frame
Hand the 3D point off to the arm controller for the actual grasp

But the more interesting bit (for a blog post anyway) is: how does the operator see what the robot detected?

I added two new ROS topics published by the vision controller after every detection:

/detection_image/compressed — the annotated camera frame with bounding boxes drawn (JPEG)
/detection_result — a JSON string with {timestamp, status, location, objects: [{class_name, confidence, bbox}]}

A background WebSocket listener in the React app subscribes to both, POSTs each detection to the backend, and the backend persists them to detections/ (one .json + one .jpg per detection). They survive restarts.

In the UI, click the side menu (top right) → Detections:

Side menu opened with Detection option visible

Side menu — Detection panel is one click away from any view

You get a panel with the full detection history.

Detection panel card view showing past detections with bounding boxes, class names, confidence, and location

Detection card view — every past detection with its annotated image, class, confidence, and location

Each card shows:

The annotated image with bounding boxes
Class name and confidence percentage
Bounding box coordinates
Status (Detected / None)
Location identifier (configurable per deployment)
Timestamp

Switch to table view if you want a denser layout. Click any thumbnail for a full-size modal.

The whole thing is opt-in observability — the detection panel doesn't have to be open for detections to be saved. The background listener catches every detection regardless.

What's next

I'm working on multi-robot orchestration. Right now each segment in a route is gated locally by the executor. But in a multi-robot setup, you want a central scheduler that says "OK, your turn, do segment 3 now." That's the external-orchestrator integration — each segment can have an optional trigger field that tells the backend to wait until an external scheduler publishes a matching step before executing. The backend polls the scheduler, releases segments as their triggers fire, and reports completions back so the scheduler can advance.

That's already implemented but not yet wired to a real multi-robot deployment — I'll write a follow-up once it's running end-to-end.

Closing

This whole project — the React + Three.js UI, the FastAPI backend, the FAST-LIO localization tuning, the move_base configuration, the Unitree SDK action library, the Detectron2 inference pipeline, the WebSocket relay, the route schema, the orchestrator integration — was built from scratch. Every layer.

If you're a robotics person who's been thinking "I wish I had a real web UI for my robot but it sounds like too much work" — it's not impossible. It's just a lot of small problems stacked on top of each other. Pick the layer that scares you most, build it first, and the rest follows.

If you missed it, Part 1 covers the foundation: the localization stack, the live visualization, and the click-to-go navigation flow. Part 3 goes inside the robot — FAST-LIO, move_base, and the action controllers.

Built on a Unitree G1 humanoid with a Livox MID360 LiDAR and a Jetson Orin. UI in React 19 + TypeScript + Three.js, backend in FastAPI, robot side in Python with unitree_sdk2py and a fair bit of hand-tuning of FAST-LIO and move_base.

Unitree G1 Humanoid Mission Planner Detectron2 Object Detection AprilTag move_base asyncio WebSocket RealSense