Run a LLM and API server with ngrok on Google Colab

2 min readNov 6, 2023

What is this?

This is a jupyter notebook that uses LLMs via endpoints. This is using llama.cpp, ngrok, and a model from TheBloke The base jupyter notebook uses zephyr-7b.

How to use


step0. create the above accounts

Need to create a google account and ngrok account if you don’t have them.

step1. Copy the jupyter notebook

step2. Create a secreat key

There is the key icon in Google colab’s sidebar. You will need to add your token as a secret key. In the jupyter notebook, I named NGROK. You can change that into anything you want.

step3. Run the jupyter notebook

After setting a new secreat key, you can run the jupyter notebook to run the API server.

step4. Check the API server

If everything works properly, you can acces https://ngrok_address/docs and you will see something like 👇
You can see the all available endpoints.

Do not run the last two lines
The first one is to kill FastAPI server and the second one is to kill ngrok.

!pkill uvicorn
!pkill ngrok

step5. Call the endpoint

Now you can call the endpoint via any language you like. In this repo, I put a python code as a sample.
What you need to do is to change the endpoint.

The speed of response wouldn’t be really great but this would be useful for building something to understand the possibility of the combination of LLM and something without spending much time and cost for the environment for LLMs.




software engineer works for a Biotechnology Research startup in Brooklyn. #CreativeCoding #Art #IoT #MachineLearning #python #typescript #javascript #reactjs