3/6/2024 9:03:11 AM | less than a minute read

The future for local LLMs is bright

Future Technology Digital Data Network Connection, Digital Cloud Computing Cyber Security, Digital Data Network Protection, Abstract Background 3d rendering.

Get in touch

Jeremy Russ

Trainee Patent Attorney

David Robinson

Partner

Get in touch

Jeremy Russ

Trainee Patent Attorney

David Robinson

Partner

Last week I was excited to hear about a new LLM paradigm: 1.58 bit large language models (LLMs). In other words, an LLM whose parameters are stored as ternary {-1, 0, 1}, rather than as a continuous 16-bit floating point, for example. This clearly has advantages in terms of latency, memory, and energy consumption, but, surprisingly (to me at least), according to the paper linked below, also comparable performance.

The authors have demonstrated that across 1.3B to 70B parameter variants of a LLaMA model, the new 1.58 bit architecture performed comparably in terms of accuracy (or surprisingly better in some cases) to a standard 16-bit architecture, whilst having up to a 7.16x smaller memory foot print and 4.1x lower latency.

Perhaps we’ll see LLMs running on a raspberry pi in the near future.

It matches the full-precision LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.

arxiv.org/...

The future for local LLMs is bright

Get in touch

Get in touch

Subscribe to receive more articles like this here.

Tags

Get in touch

Get in touch

Latest Insights

New Hydrogen Buses in Northern Ireland

Semiconductors Born in the USA?

Celebrating LGBT History Month: Sophie Wilson - Computing pioneer, inventor, and advocate for inclusion