ITPro

Small language models are growing in popularity — but they have a “hidden fallacy” that enterprises must come to terms with

By George Fitzmaurice,

18 days ago

Small language models (SMLs) are in the spotlight following OpenAI’s release of GPT-4o mini , but experts have warned ITPro that these lighter, more cost-efficient models won’t be a silver bullet for frugal enterprises.

Now available for all ChatGPT users, GPT-4o mini boasts strong performance, particularly in areas of mathematical reasoning. It scored 82% on multi-task language understanding (MMLU) and 87% on multilingual grade school mathematics (MGSM).

Where the model really shines is in its cost, though. Priced at 5 cents per million input tokens and 60 cents per million output tokens, GPT-4o is over 60% cheaper than GPT-3.5 Turbo. OpenAI expects its new model to drive an uptick in application development by “making intelligence much more affordable.”

This is true across the board in the SLM field as other companies roll out their own reduced-size models. Take Microsoft’s Phi 3, for example, released earlier this year .

Phi-3 is a family of SLMs designed to make “fine-tuning or customization easier and more affordable,” with lower computational requirements that drive down associated operational costs.

According to Arun Subramaniyan, founder and CEO of Articul8, the cost comes down to the amount of GPUs required to deploy a model. The smaller the model, the fewer GPUs are needed and therefore the lower the cost will be.

“The costs add up very, very quickly, even for small-scale use cases, and so these smaller models reduce cost quite a bit,” Subramaniyan told ITPro .

With costs spiralling and some estimates putting a price tag of $100 billion on model development in the next decade, SLMs seem an attractive alternative to the prospective business user.

Small language models won't solve every problem

There is a “hidden fallacy” in the conversation on SLMs, however. While their capabilities are “good enough to get started” they are not “good enough to actually get to production”, according to Subramaniyan. “You will get started here and get stuck and then go to the larger models,” he told ITPro.

SLMs are often built to excel in specific areas, he noted, creating barriers when companies attempt to work with the models outside of their remit of capabilities.

“One of the main tasks that they made sure that model [GPT-4o mini] did very well was mathematical reasoning and overall reasoning capabilities,” Subramaniyan said.

RELATED WHITEPAPER

https://img.particlenews.com/image.php?url=2XhiY0_0uivdoT100 — (Image credit: IBM)

Balance performance needs with deployment requirements

“To do that, they can get away with small models because they're increasing the capability of the model along one dimension,” he added. Other capabilities of the model are not at the same level so users must “be careful” when they’re assessing “what they want to use it for.”

If ues is in line with the benchmarks the model achieves most highly in, then a user will see positive results. At the same time, though, users shouldn't expect great performance across functions on an SLM.

“No one model” is sufficient

While SLMs and large language models (LLMs) have their relative advantages and disadvantages, the future will likely not demand that enterprises make an outright choice between the two.

“No one model can actually serve the purpose of enterprise level, expert level use cases,” Subramaniyan said.

“Multiple models have to come together - to work together in sync - in a nearly autonomous fashion, in order to get you the outcomes,” he added.

Just as businesses are now embracing the necessity of a multi-cloud strategy when building their cloud architectures, companies will also need to think about their future in AI in a similar way.

Small language models are growing in popularity — but they have a “hidden fallacy” that enterprises must come to terms with

Small language models won't solve every problem

“No one model” is sufficient

Comments / 0