← Research

Best Robot Learning Datasets 2025

A curated guide to the top open-source datasets for imitation learning, VLA fine-tuning, and robot learning research.

Top Datasets for Robot Learning

Choosing the right dataset depends on your robot, task, and model. Here are the most widely used datasets in 2025.

1. Open X-Embodiment

Combines RT-X, BridgeData, DROID, and others into a unified format. Used to train foundation models like OpenVLA and Octo. Best for: pre-training generalist policies. See Open X-Embodiment.

2. DROID

Large-scale, diverse manipulation from 22 robot types. 76K trajectories. Best for: multi-robot generalization, foundation model training. See DROID.

3. BridgeData

WidowX manipulation across 60 tasks. Widely used in research. Best for: single-arm manipulation, WidowX compatibility. See BridgeData.

4. ALOHA / Stanford Datasets

Bimanual teleoperation. Kitchen, mobile manipulation. Best for: bimanual tasks, Mobile ALOHA. See ALOHA.

5. LeRobot

Hugging Face–hosted, community datasets. Easy to add your own. Best for: quick experiments, sharing data. See LeRobot.

How to Choose

  • Same robot as dataset? Use that dataset (e.g., WidowX → BridgeData).
  • Different robot? Open X-Embodiment or DROID for multi-robot transfer.
  • Custom task? Collect your own or use our data services.

Full Catalog

See our complete Datasets catalog with links to all datasets, papers, and download pages.

View All Datasets →