Ollama

github.com/jmorganca/ollama

Ollama is a highly automated open source wrapper that makes it very easy to run multiple Open weight LLM models either on CPU or GPU.

Its README alone is of great value, serving as a fantastic list of the most popular Open weight LLM models in existence.

Install with:

curl https://ollama.ai/install.sh | sh

The below was tested on Ollama 0.1.14 from December 2013.

Download llama2 7B and open a prompt:

ollama run llama2

On P14s it runs on CPU and generates a few tokens per second, which is quite usable for a quick interactive play.

As mentioned at github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under /usr/share/ollama/.ollama/models/ and ncdu tells me:

--- /usr/share/ollama ----------------------------------
    3.6 GiB [###########################] /.ollama
    4.0 KiB [                           ]  .bashrc
    4.0 KiB [                           ]  .profile
    4.0 KiB [                           ]  .bash_logout

The file:

/usr/share/ollama/.ollama/models/manifests/hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF/Q2_K

gives a the exact model name and parameters.

We can also do it non-interactively with:

/bin/time ollama run llama2 'What is quantum field theory?'

which gave me:

0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps

but note that there is a random seed that affects each run by default. ollama-expect is an attempt to make the output deterministic.

Some other quick benchmarks from Amazon EC2 GPU on a g4nd.xlarge instance which had an Nvidia Tesla T4:

0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps

and on Nvidia A10G in an g5.xlarge instance:

0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps

So it's not too bad, a small article in 10s.

It tends to babble quite a lot by default, but eventually decides to stop.

Table of contents 486 9
- llama.cpp Ollama 171 2
  - llama-cli llama.cpp 121 1
    - llama-cli inference batching llama-cli 18
- Ollama HOWTO Ollama 24 2
  - Ollama output size Ollama HOWTO
  - Ollama deterministic output Ollama HOWTO 24
- Ollama parameter Ollama 57 2
  - Ollama set parameter on CLI Ollama parameter 56 1
    - ollama-expect Ollama set parameter on CLI 50

Ollama

 Ancestors (14)

 Incoming links (4)