2/22/2024 1:56:17 PM | 1 minute read

Mixture of Experts: Old but gold

Get in touch

Jeremy Russ

Trainee Patent Attorney

Mike Williams

Partner

Get in touch

Jeremy Russ

Trainee Patent Attorney

Mike Williams

Partner

Through the news, I’ve recently become aware of an interesting large language model (LLM) model architecture, Mixture of Experts (MoE), a concept which was actually established in a 1991 paper but has only recently come to prominence.

In MoE, each separate model is specialised, or expert, in one or more domains, subject to their training. During inference, depending on the nature of the particular prompt and the suitability of the particular model(s), only a subset of the overall models are called-upon. This reportedly improves computational efficiency and scalability.

MoE has advanced significantly from 2010 onwards, including scaling the concept up to a 100B+ parameter LSTM applied to natural language processing tasks in 2017. Historically, there have been a number of hurdles to overcome in order to realise MoE’s full potential. For instance, due to the branching nature of MoE, such models have not been particularly suited to computation on graphical processing units (GPUs). However, due to innovations around how training and inference occurs in relation to MoE models, they’re becoming more and more popular.

Such innovations that result in improvements in computational efficiency due to their specific training/inference implementation are often found to be technical before the European Patent Office (EPO) regardless of their application to any particular field of technology. Advantageously, a patent application directed to an AI system’s specific technical implementation may provide relatively broad protection, including in the field of natural language processing which the EPO generally considers less patentable than say image processing.

MoE works in a divide-and-conquer strategy where a complex task is broken up into several simpler and smaller subtasks, and individual learners (called experts) are trained for different subtasks

machinelearningmastery.com/...

Mixture of Experts: Old but gold

Get in touch

Get in touch

Subscribe to receive more articles like this here.

Tags

Get in touch

Get in touch

Latest Insights

New Hydrogen Buses in Northern Ireland

Semiconductors Born in the USA?

Celebrating LGBT History Month: Sophie Wilson - Computing pioneer, inventor, and advocate for inclusion