RAPIDS - Robot AI Agents Pipeline for Intelligent Data Collection & Skill Learning

Abstract

High-quality demonstration data is the primary bottleneck in robot learning research. Teleoperation-based collection yields only 10–20 episodes per hour, requires skilled operators, and suffers from quality variance due to operator fatigue and proficiency differences.

We present RAPIDS, a multi-agent system that fully automates the robot data collection and learning pipeline. From a single natural language instruction, RAPIDS autonomously converts the task into a structured specification (A-1), generates a simulation environment in IsaacLab (A-2), collects demonstration data via LLM-generated code in both simulation and the real world (A-3), validates quality through dual pre/post verification (A-4), executes on heterogeneous robots via a unified control framework (B-1), and fine-tunes VLA models with LoRA adapters (B-2).

RAPIDS produced 57 GB of simulation data across 13 tasks (78% episode success rate) and 10 GB of real-world data across 14 tasks on 3 robot platforms (SO-101 single-arm, SO-101 dual-arm, UR7e), reducing new-task pipeline time from 4–8 weeks to 17 minutes and per-episode cost from $10–50 (operator labor) to ~$0.1 (API cost).

RAPIDS end-to-end autonomous pipeline demo

6 AI Agents

82 Task YAMLs

67 GB Collected Data

4 Robot Types

17 min New Task E2E

Research Acceleration

Open Data Ecosystem

Industrial Applicability

Demo Gallery

Stacking

Tray Stacking

Multi Pick&Place

Can Pick&Place

Distribute to Plates

3-Block Stacking

Dual: Block Transfer

Dual: Towel Folding

Battery Assembly

Nut & Bolt Assembly

Precision Marker

Precision Placement

System Overview

6 specialized AI agents, end-to-end from natural language to trained VLA model.

RAPIDS 6-Agent Pipeline: NL Input → Task Spec (A-1) → Sim/Real Environment (A-2, A-3) → Verification (A-4) → Verified Dataset

A-1Task Definition

A-2Sim Generation

A-3Data Collection

A-4Verification

B-1Robot Control

B-2VLA Training

Key Differentiators

End-to-End Autonomy

NL input → 6 agents → trained VLA model. No manual steps. 4–8 weeks → 17 min.

Closed-Loop Quality

Auto self-refinement, 3-tier validation, AST + VLM dual verification, 10Hz real-time monitoring.

Physical World Design

6-DOF IK, PhysX 5 physics, auto Domain Randomization, low-level safety limits.

Simulation Environment Generation

Task Definition [A-1]

Input

Natural language task description

Process

RAG search (82 tasks)
LLM parsing

Output

YAML task specification

Sim Generation [A-2]

Input

YAML task spec

Process

LLM code generation
Self-refinement (5x)
VLM visual check

Output

IsaacLab environment code

Stack in Tray

Color Sort

Multi Pick&Place

Drawer Pick&Place

Data Collection & Verification

Data Collection [A-3]

Input

Task instruction
Environment
Skill primitives

Process

Code-as-Policies
6-DOF IK (Pinocchio)
Forward-Reset loop

Output

LeRobot v3.0 dataset

Pre & Post Validator [A-4]

Input

Generated code
YAML spec

Process

AST static analysis
VLM visual scoring

Output

100-point score
Violation feedback

Simulation

Env Code Gen

IsaacLab Sim

6-DOF IK Exec

3-Tier Verify

LeRobot v3.0

57 GB Total Data 13 tasks, LeRobot v3.0

78% Episode Success 401 / 514 episodes

Stacking

Tray Stacking

Multi Pick&Place

Can Pick&Place

Real-World

Scene Capture

VLM Perception

Code Generation

Execution

Verification

10 GB Total Data 14 tasks, 3 robot configs

68–92% Episode Success SO-101 single/dual, UR7e

VLM-Driven Autonomous Pipeline

Scene

Detect

Grasp Plan

Workspace

Complete

Autonomous Forward-Reset data collection

Robot Control & VLA Training

Robot Control [B-1]

Input

NL command
RGB/RGBD image

Process

Perception
Planner
Controller
Monitor

Output

Robot trajectory
Joint commands

VLA Fine-Tuning [B-2]

Input

Demo data
Pre-trained VLA

Process

Data preprocessing
LoRA adapter fine-tuning

Output

Task-specific VLA model

73.5% LIBERO Success +1.1% vs. baseline

14.7% RoboCasa Success +8.5% vs. baseline

< 10ms Adapter Swap Task-specific LoRA

2h Training Time H100 x 4 GPUs

Results

System Efficiency

Metric	Conventional	RAPIDS
Task definition time	Days ~ weeks (manual reward/env code)	1 min (NL → YAML)
Environment setup time	1–2 weeks (manual sim authoring)	9 min (LLM generation + self-refinement)
Data collection throughput	10–20 episodes/hr (teleop)	~8.5 episodes/hr (fully autonomous)
Full pipeline (new task)	4–8 weeks	17 minutes
Cost per episode	$10–50 (operator labor)	~$0.1 (API cost only)

Hardware & Software

Hardware

GPU Server	NVIDIA H100 80GB x 4
GPU Workstation	RTX 5090 / RTX 3090
SO-101 Arms	5+1 DOF x 2 units
UR7e Arm	6+1 DOF x 1 unit
Cameras	Intel RealSense D435 x 1
3D Printer	Bambu Lab H2

Software

OS	Ubuntu 22.04 LTS
Simulator	Isaac Sim 5.1.0 + IsaacLab v2.3.2
Robot Framework	ROS2 Humble
CUDA / Python	12.1 / 3.11 (Docker)
ML Framework	PyTorch 2.7.0
Data Format	LeRobot v3.0 (OXE compatible)

Robot Hardware Preparation

SO-101 robot parts 3D printing

Open Source & Data

Simulation-Generation-Agent

Simulation Pipeline (A-1, A-2, A-3 Sim, A-4)

RealWorld-DataCollector-Framework

Real-World Pipeline (A-3, B-1, B-2)

HuggingFace Datasets

LeRobot v3.0 Datasets (OXE compatible)

PRISM Team

CSI Agent Lab, Sungkyunkwan University

Team Lead

Minjong Yoo Ph.D.

Framework Design

RL & Embodied Agents (7+ yrs)

Integration Lead

WooKyung Kim Ph.D.

Pipeline Integration

RL & Embodied Agents (5+ yrs)

Soyoung Kim M.S.

A-1, A-2 Dev

Sim Embodied Agents

Hyunsuk Cho M.S./Ph.D.

A-3, A-4 Dev

Agents & CV

Sihyung Yoon M.S./Ph.D.

B-1, B-2 Dev

RL & Embodied Agents

SangHyun Ahn M.S.

A-4, B-1, B-2 Dev

Embodied Agents

Seungchan An M.S./Ph.D.

A-1, A-2, A-4 Dev

Embodied Agents

14 top-tier publications in embodied agents and reinforcement learning over the past 3 years

Citation

@misc{rapids2026,
  author    = {Yoo, Minjong and Kim, WooKyung and Kim, Soyoung and
               Cho, Hyunsuk and Yoon, Sihyung and Ahn, SangHyun and An, Seungchan},
  title     = {RAPIDS: Robot AI Agents Pipeline for Intelligent Data Collection
               and Skill Learning},
  year      = {2026},
  note      = {2026 AI Co-Scientist Challenge Korea, Track 2},
  institution = {PRISM, CSI Agent Lab, Sungkyunkwan University}
}

RAPIDS: Robot AI Agents Pipeline forIntelligent Data Collection & Skill Learning

Abstract

Research Acceleration

Open Data Ecosystem

Industrial Applicability

Demo Gallery

System Overview

Key Differentiators

End-to-End Autonomy

Closed-Loop Quality

Physical World Design

Simulation Environment Generation

Task Definition [A-1]

Sim Generation [A-2]

Data Collection & Verification

Data Collection [A-3]

Pre & Post Validator [A-4]

Simulation

Real-World

VLM-Driven Autonomous Pipeline

Robot Control & VLA Training

Robot Control [B-1]

VLA Fine-Tuning [B-2]

Results

System Efficiency

Hardware & Software

Hardware

Software

Robot Hardware Preparation

Open Source & Data

Simulation-Generation-Agent

RealWorld-DataCollector-Framework

HuggingFace Datasets

PRISM Team

Citation

RAPIDS: Robot AI Agents Pipeline for
Intelligent Data Collection & Skill Learning