Run Llama3 with Ollama on Google Colab(T4) Free-tire

2 min readMay 16, 2024




In this post, we will try to run llama3 on Google Colab free-tire, T4 with Ollama and use it as an API server.

Step 1. Setup Ollama

First, you need to select T4.

!curl | sh

!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

import os
# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})
os.environ.update({'OLLAMA_HOST': ''})

!dpkg -i cloudflared-linux-amd64.deb

import subprocess
import threading
import time
import socket

def iframe_thread(port):
while True:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
result = sock.connect_ex(('', port))
if result == 0:

p = subprocess.Popen(["cloudflared", "tunnel", "--url", f"{port}"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in p.stderr:
l = line.decode()
if " " in l:
print("running ollama server\n\n", l[l.find("http"):], end='')

threading.Thread(target=iframe_thread, daemon=True, args=(11434,)).start()

!ollama serve

When you run step1 successfully, you will see an URL like the following.

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFV6YKzw4J7jqYfTNrwkB6h41sdeGDY644nCHUJCPqor

2024/05/16 05:00:08 routes.go:1008: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:*****] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-05-16T05:00:08.999Z level=INFO source=images.go:704 msg="total blobs: 0"
time=2024-05-16T05:00:08.999Z level=INFO source=images.go:711 msg="total unused blobs removed: 0"
time=2024-05-16T05:00:09.000Z level=INFO source=routes.go:1054 msg="Listening on [::]:11434 (version 0.1.38)"
time=2024-05-16T05:00:09.000Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2437403823/runners

running ollama server |

Step 2. Install Llama3

This step will take some time since you will need to download a model.
If you want to try another model, you can find anything you want in Ollama model site.

curl  url_you_got/api/pull -d '{ "name": "llama3" }'

Step 3. Test prompt

curl url_you_got/api/chat -d '{ "model": "llama3", "stream": false, "messages": [{ "role": "user", "content": "What is LLM?" }]}'

If you use jq

curl url_you_got/api/chat -d '{ "model": "llama3", "stream": false, "messages": [{ "role": "user", "content": "What is LLM?" }]}' | jq ".message.content"




software engineer works for a Biotechnology Research startup in Brooklyn. #CreativeCoding #Art #IoT #MachineLearning #python #typescript #javascript #reactjs