Open-Source VLA & VLM Robot Models
A curated catalog of open-source Vision-Language-Action (VLA) and Vision-Language (VLM) models for robot manipulation — with links to official sites, GitHub, and Hugging Face.
VLA & VLM Models for Robotics
Each model has a dedicated page with description, architecture, benchmarks, and official links.

OpenVLA
7B-parameter VLA. Llama 2 + DINOv2/SigLIP. 970K demos from Open X-Embodiment. Outperforms RT-2-X with 7× fewer params. MIT, Hugging Face.
View model →
Octo
Transformer diffusion policy. 27M/93M params. 800K trajectories. Multi-robot, language/goal conditioning. MIT, Hugging Face.
View model →
RT-X / RT-1-X
Open X-Embodiment models. JAX & TensorFlow checkpoints. Multi-robot, language-conditioned. Apache 2.0.
View model →
InternVLA-M1
Spatially guided VLA. Two-stage: grounding + action. 71–81% on Google Robot, 95.9% LIBERO. MIT, Hugging Face.
View model →
RoboFlamingo
OpenFlamingo-based VLM for robot control. Policy head + imitation learning. Strong on CALVIN. MIT, Hugging Face.
View model →
BridgeVLA
3D VLA with input-output alignment. 88.2% RLBench, 64% COLOSSEUM. Heatmap pre-training + point cloud fine-tuning.
View model →
Diffusion Policy
Visuomotor policy as denoising diffusion. +46.9% over prior methods. Receding horizon, time-series transformer. Open source.
View model →
LeRobot
Framework + ACT, SmolVLA (450M). End-to-end IL/RL. Datasets, training, deployment. PyTorch, Hugging Face Hub.
View model →