Tried a small multimodal, imp-v1–3b on Google Colab

2 min readJan 31, 2024

MILVLG/imp-v1-3b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Install dependencies

!pip install -U transformers
!pip install -q pillow accelerate einops

2. Load the model and set instruction

Before running the following, you will need to upload an image you want to try with imp-v1–3b.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained(
    "MILVLG/imp-v1-3b", 
    torch_dtype=torch.float16, 
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "MILVLG/imp-v1-3b", 
    trust_remote_code=True
)

text = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nDescribe the person in the image. ASSISTANT:"
image = Image.open("result.png")

3. Generate a description of the image

input_ids = tokenizer(text, return_tensors="pt").input_ids
image_tensor = model.image_preprocess(image)
output_ids = model.generate(
    input_ids,
    max_new_tokens=100,
    images=image_tensor,
    use_cache=True)[0]
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

I tried the following image. (The image is what DDColor generated)

The generated description is below.

The image features a man dressed in a traditional Japanese kimono, standing in front of a wooden podium. He is wearing a black kimono with a white belt and a white hat. The man appears to be posing for a photograph, as he is standing in front of the podium with a smile on his face.

Tried a small multimodal, imp-v1–3b on Google Colab

MILVLG/imp-v1-3b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Written by 0𝕏koji