ID photo of Ciro Santilli taken in 2013 right eyeCiro Santilli OurBigBook logoOurBigBook.com  Sponsor 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
github.com/jmorganca/ollama
Ollama is a highly automated open source wrapper that makes it very easy to run multiple Open weight LLM models either on CPU or GPU.
Its README alone is of great value, serving as a fantastic list of the most popular Open weight LLM models in existence.
Install with:
curl https://ollama.ai/install.sh | sh
The below was tested on Ollama 0.1.14 from December 2013.
Download llama2 7B and open a prompt:
ollama run llama2
On P14s it runs on CPU and generates a few tokens per second, which is quite usable for a quick interactive play.
As mentioned at github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under /usr/share/ollama/.ollama/models/ and ncdu tells me:
--- /usr/share/ollama ----------------------------------
    3.6 GiB [###########################] /.ollama
    4.0 KiB [                           ]  .bashrc
    4.0 KiB [                           ]  .profile
    4.0 KiB [                           ]  .bash_logout
We can also do it non-interactively with:
/bin/time ollama run llama2 'What is quantum field theory?'
which gave me:
0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps
but note that there is a random seed that affects each run by default. This was apparently fixed however: github.com/ollama/ollama/issues/2773, but Ciro Santilli doesn't know how to set the seed.
Some other quick benchmarks from Amazon EC2 GPU on a g4nd.xlarge instance which had an Nvidia Tesla T4:
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
and on Nvidia A10G in an g5.xlarge instance:
0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps
So it's not too bad, a small article in 10s.
It tends to babble quite a lot by default, but eventually decides to stop.
TODO is it possible to make it deterministic on the CLI? There is a "seed" parameter somewhere: github.com/jmorganca/ollama/blob/31f0551dab9a10412ec6af804445e02a70a25fc2/docs/modelfile.md#parameter

Ancestors (14)

  1. Open source LLM
  2. Large language model
  3. Text-to-text model
  4. AI text generation
  5. Generative AI by modality
  6. Generative AI
  7. AI by capability
  8. Artificial intelligence
  9. Machine learning
  10. Computer
  11. Information technology
  12. Area of technology
  13. Technology
  14. Home