This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.
| less than a minute read

The future for local LLMs is bright

Last week I was excited to hear about a new LLM paradigm: 1.58 bit large language models (LLMs). In other words, an LLM whose parameters are stored as ternary {-1, 0, 1}, rather than as a continuous 16-bit floating point, for example. This clearly has advantages in terms of latency, memory, and energy consumption, but, surprisingly (to me at least), according to the paper linked below, also comparable performance.

The authors have demonstrated that across 1.3B to 70B parameter variants of a LLaMA model, the new 1.58 bit architecture performed comparably in terms of accuracy (or surprisingly better in some cases) to a standard 16-bit architecture, whilst having up to a 7.16x smaller memory foot print and 4.1x lower latency. 

Perhaps we’ll see LLMs running on a raspberry pi in the near future.

It matches the full-precision LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.

Tags

artificial intelligence, data & connectivity, digital transformation