Quietly building the future.
2024-07-08
Sign up for Vast.ai using my referral link, if my notes are helpful to you.
Add your SSH key to https://cloud.vast.ai/account/
Choose ollama/ollama:0.1.38 as the Docker image. (This succeeded for me, whereas ollama/ollama:latest failed.)
Create a new instance. I picked a $0.111/hr RTX 2080 Ti with 11GB RAM. A better GPU and more RAM might be better, but I wanted to keep costs low to start.
That one failed when I ran ollama run mixtral
, so I tried a $0.144/hr RTX A4000 with 16GB RAM.
Once it started up, I clicked the Connect button, copied the SSH command, and pasted it into my terminal.
ssh -p
I couldn't get ollama to run with systemd, so I ran it in the background with:
(curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &
Then I was able to run ollama from the terminal. Rather than trying to run mixtral, I ran a tiny model, orca-mini, which worked fine:
ollama run orca-mini
Then I was able to chat with the model in the terminal!
I tried to run it via the API as well, but that didn't work:
curl http://<ip>:11434/api/chat -d '{
"model": "orca-mini",
"messages": [
{ "role": "user", "content": "hey there, how are you doing?" }
]
}'
curl: (7) Failed to connect to <ip> port 11434 after 88 ms: Couldn't connect to server
Next up, I'll repeat roughly these steps on another host with better support for external IPs, so I can try the API again.