RoboFlamingo
Vision-Language Foundation Models as Effective Robot Imitators. OpenFlamingo-based.
Overview
RoboFlamingo builds on OpenFlamingo to combine single-step vision-language understanding with an explicit policy head for sequential robot control. Fine-tuned via imitation learning. Trainable on a single GPU server.
Architecture & Performance
- OpenFlamingo backbone (MPT-3B, 4B, 9B variants)
- Policy head for sequential decision-making
- Strong on CALVIN benchmark
- Open-loop control, low-resource deployment
Official Links
- roboflamingo.github.io — Project site
- github.com/RoboFlamingo/RoboFlamingo — Code (MIT)
- Hugging Face: robovlms/RoboFlamingo — Models
Citation
See the project site for BibTeX and paper references.