What is this?
This is a jupyter notebook that uses LLMs via endpoints. This is using llama.cpp, ngrok, and a model from TheBloke The base jupyter notebook uses zephyr-7b.
How to use
[Requiremetns]
- Google account for Google colab https://colab.google/
- ngrok account https://ngrok.com
step0. create the above accounts
Need to create a google account and ngrok account if you don’t have them.
step1. Copy the jupyter notebook
step2. Create a secreat key
There is the key icon in Google colab’s sidebar. You will need to add your token as a secret key. In the jupyter notebook, I named NGROK
. You can change that into anything you want.
step3. Run the jupyter notebook
After setting a new secreat key, you can run the jupyter notebook to run the API server.
step4. Check the API server
If everything works properly, you can acces https://ngrok_address/docs and you will see something like 👇
You can see the all available endpoints.
Do not run the last two lines
The first one is to kill FastAPI server and the second one is to kill ngrok.
!pkill uvicorn
!pkill ngrok
step5. Call the endpoint
Now you can call the endpoint via any language you like. In this repo, I put a python code as a sample.
What you need to do is to change the endpoint.
The speed of response wouldn’t be really great but this would be useful for building something to understand the possibility of the combination of LLM and something without spending much time and cost for the environment for LLMs.