Run FlexGen on Google Colab

2 min readMar 14


What is FlexGen?

FlexGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexGen allows high-throughput generation by IO-efficient offloading, compression, and large effective batch sizes.

Can We run it on Google Colab?

Yes, we can. However, the free version resource is limited, so we need to use the lightest model.

!git clone
!pip install flexgen
cd FlexGen
!git checkout 9d888e5e3e6d78d6d4e1fdda7c8af508b889aeae
!python flexgen/apps/ --model facebook/opt-125m

As you can see, the conversation didn’t make sense but actually, we could run FlexGen on Google Colab 😂

opt-350m is not implemetned

!python flexgen/apps/ --model facebook/opt-1.3b

If you pay some money, probably you will be able to run opt-1.3b + more.

What is opt?

OPT (Open Pre-trained Transformer)is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters.




software engineer works for a Biotechnology Research startup in Brooklyn. #CreativeCoding #Art #IoT #MachineLearning #python #typescript #javascript #reactjs