PRISM Logo

RAPIDS: Robot AI Agents Pipeline for
Intelligent Data Collection & Skill Learning

Minjong Yoo*, WooKyung Kim*, Soyoung Kim, Hyunsuk Cho, Sihyung Yoon, SangHyun Ahn, Seungchan An
PRISM (Physical Robot Intelligence & Simulation)
CSI Agent Lab, Sungkyunkwan University
2026 AI Co-Scientist Challenge Korea | Track 2

Abstract

High-quality demonstration data is the primary bottleneck in robot learning research. Teleoperation-based collection yields only 10–20 episodes per hour, requires skilled operators, and suffers from quality variance due to operator fatigue and proficiency differences.

We present RAPIDS, a multi-agent system that fully automates the robot data collection and learning pipeline. From a single natural language instruction, RAPIDS autonomously converts the task into a structured specification (A-1), generates a simulation environment in IsaacLab (A-2), collects demonstration data via LLM-generated code in both simulation and the real world (A-3), validates quality through dual pre/post verification (A-4), executes on heterogeneous robots via a unified control framework (B-1), and fine-tunes VLA models with LoRA adapters (B-2).

RAPIDS produced 57 GB of simulation data across 13 tasks (78% episode success rate) and 10 GB of real-world data across 14 tasks on 3 robot platforms (SO-101 single-arm, SO-101 dual-arm, UR7e), reducing new-task pipeline time from 4–8 weeks to 17 minutes and per-episode cost from $10–50 (operator labor) to ~$0.1 (API cost).

RAPIDS end-to-end autonomous pipeline demo

6 AI Agents
82 Task YAMLs
67 GB Collected Data
4 Robot Types
17 min New Task E2E

Research Acceleration

Open Data Ecosystem

Industrial Applicability

Demo Gallery

Stacking

Tray Stacking

Multi Pick&Place

Can Pick&Place

Distribute to Plates

3-Block Stacking

Dual: Block Transfer

Dual: Towel Folding

Battery Assembly

Nut & Bolt Assembly

Precision Marker

Precision Placement

System Overview

6 specialized AI agents, end-to-end from natural language to trained VLA model.

RAPIDS pipeline overview

RAPIDS 6-Agent Pipeline: NL Input → Task Spec (A-1) → Sim/Real Environment (A-2, A-3) → Verification (A-4) → Verified Dataset

A-1Task Definition
A-2Sim Generation
A-3Data Collection
A-4Verification
B-1Robot Control
B-2VLA Training

Key Differentiators

 End-to-End Autonomy

NL input → 6 agents → trained VLA model. No manual steps. 4–8 weeks → 17 min.

 Closed-Loop Quality

Auto self-refinement, 3-tier validation, AST + VLM dual verification, 10Hz real-time monitoring.

 Physical World Design

6-DOF IK, PhysX 5 physics, auto Domain Randomization, low-level safety limits.

Simulation Environment Generation

Task Definition [A-1]

Input
  • Natural language task description
Process
  • RAG search (82 tasks)
  • LLM parsing
Output
  • YAML task specification

Sim Generation [A-2]

Input
  • YAML task spec
Process
  • LLM code generation
  • Self-refinement (5x)
  • VLM visual check
Output
  • IsaacLab environment code
StackTray environment

Stack in Tray

ColorSort environment

Color Sort

MultiPickPlace environment

Multi Pick&Place

Drawer environment

Drawer Pick&Place

Data Collection & Verification

Data Collection [A-3]

Input
  • Task instruction
  • Environment
  • Skill primitives
Process
  • Code-as-Policies
  • 6-DOF IK (Pinocchio)
  • Forward-Reset loop
Output
  • LeRobot v3.0 dataset

Pre & Post Validator [A-4]

Input
  • Generated code
  • YAML spec
Process
  • AST static analysis
  • VLM visual scoring
Output
  • 100-point score
  • Violation feedback

 Simulation

Env Code Gen
IsaacLab Sim
6-DOF IK Exec
3-Tier Verify
LeRobot v3.0
57 GB Total Data 13 tasks, LeRobot v3.0
78% Episode Success 401 / 514 episodes

Stacking

Tray Stacking

Multi Pick&Place

Can Pick&Place

 Real-World

Scene Capture
VLM Perception
Code Generation
Execution
Verification
10 GB Total Data 14 tasks, 3 robot configs
68–92% Episode Success SO-101 single/dual, UR7e
VLM-Driven Autonomous Pipeline
Scene Scene
Detect Detect
Grasp Grasp Plan
Workspace Workspace
Complete Complete

Autonomous Forward-Reset data collection

Robot Control & VLA Training

Robot Control [B-1]

Input
  • NL command
  • RGB/RGBD image
Process
  • Perception
  • Planner
  • Controller
  • Monitor
Output
  • Robot trajectory
  • Joint commands

VLA Fine-Tuning [B-2]

Input
  • Demo data
  • Pre-trained VLA
Process
  • Data preprocessing
  • LoRA adapter fine-tuning
Output
  • Task-specific VLA model
73.5% LIBERO Success +1.1% vs. baseline
14.7% RoboCasa Success +8.5% vs. baseline
< 10ms Adapter Swap Task-specific LoRA
2h Training Time H100 x 4 GPUs

Results

System Efficiency

Metric Conventional RAPIDS
Task definition time Days ~ weeks (manual reward/env code) 1 min (NL → YAML)
Environment setup time 1–2 weeks (manual sim authoring) 9 min (LLM generation + self-refinement)
Data collection throughput 10–20 episodes/hr (teleop) ~8.5 episodes/hr (fully autonomous)
Full pipeline (new task) 4–8 weeks 17 minutes
Cost per episode $10–50 (operator labor) ~$0.1 (API cost only)

Hardware & Software

Hardware

 GPU ServerNVIDIA H100 80GB x 4
 GPU WorkstationRTX 5090 / RTX 3090
 SO-101 Arms5+1 DOF x 2 units
 UR7e Arm6+1 DOF x 1 unit
 CamerasIntel RealSense D435 x 1
 3D PrinterBambu Lab H2

Software

 OSUbuntu 22.04 LTS
 SimulatorIsaac Sim 5.1.0 + IsaacLab v2.3.2
 Robot FrameworkROS2 Humble
 CUDA / Python12.1 / 3.11 (Docker)
 ML FrameworkPyTorch 2.7.0
 Data FormatLeRobot v3.0 (OXE compatible)
Robot Hardware Preparation

SO-101 robot parts 3D printing

Open Source & Data

PRISM Team

CSI Agent Lab, Sungkyunkwan University

Team Lead
Minjong Yoo Ph.D.
Framework Design
RL & Embodied Agents (7+ yrs)
Integration Lead
WooKyung Kim Ph.D.
Pipeline Integration
RL & Embodied Agents (5+ yrs)
Soyoung Kim M.S.
A-1, A-2 Dev
Sim Embodied Agents
Hyunsuk Cho M.S./Ph.D.
A-3, A-4 Dev
Agents & CV
Sihyung Yoon M.S./Ph.D.
B-1, B-2 Dev
RL & Embodied Agents
SangHyun Ahn M.S.
A-4, B-1, B-2 Dev
Embodied Agents
Seungchan An M.S./Ph.D.
A-1, A-2, A-4 Dev
Embodied Agents

14 top-tier publications in embodied agents and reinforcement learning over the past 3 years

Citation

@misc{rapids2026,
  author    = {Yoo, Minjong and Kim, WooKyung and Kim, Soyoung and
               Cho, Hyunsuk and Yoon, Sihyung and Ahn, SangHyun and An, Seungchan},
  title     = {RAPIDS: Robot AI Agents Pipeline for Intelligent Data Collection
               and Skill Learning},
  year      = {2026},
  note      = {2026 AI Co-Scientist Challenge Korea, Track 2},
  institution = {PRISM, CSI Agent Lab, Sungkyunkwan University}
}