RoboFlamingo

Vision-Language Foundation Models as Effective Robot Imitators. OpenFlamingo-based.

Overview

RoboFlamingo builds on OpenFlamingo to combine single-step vision-language understanding with an explicit policy head for sequential robot control. Fine-tuned via imitation learning. Trainable on a single GPU server.

Architecture & Performance

OpenFlamingo backbone (MPT-3B, 4B, 9B variants)
Policy head for sequential decision-making
Strong on CALVIN benchmark
Open-loop control, low-resource deployment

Official Links

roboflamingo.github.io — Project site
github.com/RoboFlamingo/RoboFlamingo — Code (MIT)
Hugging Face: robovlms/RoboFlamingo — Models

Citation

See the project site for BibTeX and paper references.