Skip to main content
Stretch Lab

Stretch 5 Planning

Lessons learned, competitive landscape, technology trends, AI/ML roadmap, and architecture decisions for the next generation

Strategic Priorities

P0 compute upgradeEnables models, future-proofs AI stack
P1 + dedicated busesEliminates critical firmware reliability risk ( reentrancy)
P2ESP32-S3 with defined UART protocolClean WiFi/BLE/OTA architecture
P3eFuse power protectionSafety + FCC compliance
P4FCC Class B design-inRequired for consumer market
P5Enhanced Dex Teleop + pipelineData flywheel is the long-term moat

Key Strategic Insight

Stretch's competitive moat is its AI ecosystem, not hardware. With 216+ GitHub stars on stretch_ai, the largest mobile manipulation research community, and integration into Open X-Embodiment, Hello Robot is the default platform for home robot AI research. Stretch 5 should double down by upgrading compute () and making the AI developer experience frictionless. The defensible position is best-integrated AI platform for home manipulation research.

Competitive Landscape

PlatformCompanyPriceStatusThreatDifferentiator
Stretch 3Hello Robot$24,950ShippinglowLightest, most affordable research platform
TIAGoPAL Robotics~$80K+ShippinglowFull humanoid torso, industrial-grade, ROS
Mobile Stanford (open-source)~$30K BOMResearchhighBimanual, learning from demonstration focus
Unitree G1/H1Unitree$16K–$90KShippinghighHumanoid, legged locomotion, Chinese supply chain

Technology Trends

Single neural networks that take images + language instructions and output robot actions directly. (Physical Intelligence) and (Google DeepMind) represent the leading edge.

Diffusion Policy
Implication for Stretch 5

Stretch 5 compute must support 3B+ parameter models at >5 Hz inference. Requires dedicated GPU ().

AI/ML Integration Roadmap

Object Grasping
Current

OWL-v2 detection → heuristic grasp

With Foundation Models

model (-style) generates grasp trajectories end-to-end

Task Planning
Current

LLM prompt → fixed operation sequence

With Foundation Models

LLM with affordance grounding — plans only feasible actions

Navigation
Current

A* / RRT on voxel map

With Foundation Models

Learned navigation policies for dynamic obstacles + social norms

Voice Interaction
Current

Whisper STT → GPT-4o → Piper TTS

With Foundation Models

On-device multimodal model (Gemma/Qwen) for low-latency, private interaction

Compute Requirements for AI Stack

Model ClassExampleParametersMin VRAMSpeed
Vision encoderSIGLip-so400m400M2 GB30+ FPS
Object detectorOWL-v2 large300M3 GB10+ FPS
SegmentationSAM2-base90M2 GB15+ FPS
LLM (local)Qwen2.5-7B7B6 GB20 tok/s
model-small3B8 GB5+ Hz
Full stackPerception + LLM + 12–16 GBPipelined

Recommendation: NX 16 GB as baseline, Orin AGX 64 GB for flagship. This future-proofs Stretch 5 for the model wave.

Executive Summary

Why Rust: the body server sits in the real-time bridge between AI software and six boards. The proposal removes Python GC pauses from the 1 kHz coordination path, removes contention from multi-axis concurrency, and replaces runtime protocol checks with compile-time validation.

Key finding for leadership: Hello Robot is the only vendor still running Python directly in the real-time control path. The migration keeps the Python user API through a layer while moving transport, stepper, sensor fusion, and server internals to Rust.

4-Phase Timeline

Program window11 months implementation + 6 months parallel support

Transport → Stepper → Sensor Fusion → Robot Server

Phase 1Months 1–3

Transport

3 months

Next: Stepper

Phase 2Months 4–6

Stepper

3 months

Next: Sensor Fusion

Phase 3Months 7–8

Sensor Fusion

2 months

Next: Robot Server

Phase 4Months 9–11

Robot Server

3 months

Rust migration phase plan
PhaseScopeDurationRiskKey deliverables
Phase 1: Transport
Months 1–3
Replace TransportPySerial, TransportCSerial, libtransport.so, and CobbsFraming with a Rust stretch-transport crate exposed through .3 monthshigh
  • framing + Modbus in Rust transport crate
  • Transport.do_rpc compatibility wrapper
  • 72-hour soak test over all stepper and PowerPeriph RPC paths
Phase 2: Stepper
Months 4–6
Rewrite core/stepper.py and subsystem motion wrappers so the hot path no longer crosses Python in the control loop.3 monthshigh
  • Type-safe RPC command/status structs for P7/P8/P9
  • Rust trajectory and calibration logic for stepper devices
  • Arm/Lift/OmniBase compatibility layer
Phase 3: Sensor Fusion
Months 7–8
Migrate power_periph and IMU fusion pipeline into Rust to remove unchecked reply parsing in sensor status handling.2 monthsmedium
  • Rust PowerPeriph with full RPC coverage
  • IMU config/status pack-unpack and sync trigger coordination
  • Deterministic DeviceTimestamp rollover handling
Phase 4: Robot Server
Months 9–11
Replace robot_server, robot, and client_server with a -driven Rust server and keep Python API compatibility through a client bridge.3 monthsmedium
  • Async command dispatch + status broadcast replacing REQ/REP
  • Graceful shutdown and process lock management
  • Production-ready static binary + Python client compatibility

PyO3 Bridge Architecture

Python SDK
bridge (stretch-ffi)
Rust server on runtime
Transport + Stepper + Sensor Fusion crates
motor and power firmware boards
PyO3 bridge architecture layers
LayerComponentInterfaceResponsibility
Python APIExisting robot.arm.move_to() scriptsPython methodsKeep researcher and customer workflows unchanged during migration.
Compatibility Bridgestretch-ffi () boundaryExpose Rust classes with Python-compatible signatures and status dictionaries.
Rust Control Planestretch-server on Async channels + RPC dispatchRun command orchestration, status publication, and lifecycle control without contention.
Rust Device Cratesstretch-transport, stretch-stepper, stretch-power + framed USB RPCProvide low-latency deterministic control for transport, motion, and sensor fusion.
Firmware motor + power boardsRPC V0/V1 framesExecute 1 kHz board loops with synchronized motor start and status feedback.

Summary Stats

3,455

Total quantified issues across memory, , latency, type safety, and error handling categories.

2,804critical

Type safety gaps

259high

Memory safety issues

205high

contention points

148medium

Latency-critical paths

39moderate

Error handling issues

Issue Category Breakdown

Type safety gaps81.2%critical

Dict chain access and bare RPC constants dominate the codebase and hide interface breakage until runtime.

Memory safety issues7.5%high

Unchecked reply indexing and unpack_slice patterns are concentrated in firmware communication modules.

contention points5.9%high

Threading and sleep-heavy control paths serialize status and command work in Python.

Latency-critical paths4.3%medium

Byte-wise /CRC processing and busy waits exist directly in transport and control-loop paths.

Error handling issues1.1%moderate

Bare except and sys.exit in library code create fail-stop behavior instead of recoverable flows.

Top 10 Highest-Risk Files

Top audit hotspot files by issue count category
FileMemoryGILLatencyTypeErrorsTotal
core/stepper.py
Largest concentration of unchecked firmware reply parsing.
99813570465
subsystem/power_periph.py
IMU and power status unpacking path has high reply-index risk.
83222490336
robot/robot.py
Status thread orchestration heavily contends on Python threading.
0410630104
core/transport/transport_pyserial.py
Primary transport bottleneck for framing, CRC, and RPC dispatch.
3014523099
robot/robot_client.py
Sleep-heavy request flow with runtime-typed status access.
016064181
robot/robot_server.py
Dispatch and status paths rely on runtime tuple and dict shape assumptions.
00045045
core/transport/transport_util.py
High-frequency struct pack and unpack in the transport hot path.
180183039
core/feetech/protocol_packet_handler.py
Servo packet parsing uses unchecked indexing and busy polling loops.
20400024
core/feetech/feetech_SM_chain.py
Threaded servo polling plus bare except handling.
31600221
core/transport/cobbs_framing.py
Byte-wise framing and CRC loops run in Python in every RPC cycle.
00200020

Counts shown are the quantified file-level findings explicitly called out in PYTHON-AUDIT.md.

Key Findings

  • Memory safety hotspots: stepper.py (99), power_periph.py (83), and transport_pyserial.py (30) account for 212 of 259 memory safety issues.
  • contention is concentrated in status polling and transport paths, including robot.py threading and transport serial locking.
  • Latency-critical path concentration: 85 transport latency points in 1,515 lines, dominated by and CRC processing in Python.
  • Type safety is the largest risk class with 2,804 findings, mostly dynamic dict access and raw RPC constants.
  • V1 PIMU: Added INA228 power monitor, pre-charge circuit, safe PC shutdown, system shutdown mode (103µA vs 12–30mA), brake button float mode, IMU+Mag, better USB hub ICs
  • Charging improved: 10A (2.5hrs vs 4.5hrs)

    Requires 36V 8A adapter specifically

  • BMS comms limited to 9600bps

    Blocking at 1kHz was problematic

  • EMC: USB hub renumeration from ESD fixed with new ICs

    Grounding still critical

  • Stretch 3 passed Class A EMC in both operational and charging modes

Development Roadmap

Phase 1Architecture & Schematic
  • Define power architecture (Efuse, INA228 alerts)
  • Consolidate 3V3 rails
  • Master/slave UART protocol spec
  • Select Class B EMC-friendly USB hub ICs
  • Select NX vs AGX for compute
  • Design with dedicated buses (IMU/Mag/INA228 separated)
Phase 2Prototype & Bring-up
  • In-house pre-compliance EMC scans
  • Actuator protection validation
  • Non-blocking implementation + transfers
  • Reverse current path testing
  • integration + AI stack validation
  • ESP32-S3 UART protocol bring-up
Phase 3Validation & Compliance
  • FCC Class B formal testing
  • Automated RDK test suite complete
  • SOC accuracy validation (INA228 vs BMS)
  • System grounding audit
  • model inference benchmarks on Orin
  • Dex Teleop + pipeline validation
Phase 4Production Readiness
  • Final BOM review & cost optimization
  • Manufacturing test fixtures
  • Firmware OTA update pipeline (via ESP32-S3)
  • Documentation & handoff
  • 'Contribute data' toggle for fleet learning