Robot Learning Benchmarks
Standardized evaluation for robot manipulation — RLBench, LIBERO, CALVIN, and more. Success rates, task completion, evaluation metrics.
Evaluation
Benchmarks for Manipulation
Simulation
RLBench
100+ manipulation tasks in PyRep. Widely used for VLA evaluation. BridgeVLA 88.2%, InternVLA 95%+ on subsets.
View benchmark → SimulationLIBERO
Lifelong learning benchmark. 130 tasks, spatial/object/goal suites. RoboSuite. 95.9% SOTA (InternVLA).
View benchmark → SimulationCALVIN
Composing Actions from Language and Vision. Long-horizon, language-conditioned. RoboFlamingo strong baseline.
View benchmark → Real RobotGoogle Robot Benchmark
Real-world manipulation. 700+ tasks. WidowX, various embodiments. Success rate, multi-task evaluation.
View benchmark → Real RobotCOLOSSEUM
Large-scale real-robot benchmark. Diverse tasks, environments. BridgeVLA 64%.
View benchmark →