Tom's Hardware

AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains

By Christopher Harper,

2024-06-26

A research paper from UC Santa Cruz and accompanying writeup discussing how AI researchers found a way to run modern, billion-parameter-scale LLMs on just 13 watts of power. That's about the same as a 100W-equivalent LED bulb, but more importantly, its about 50 times more efficient than the 700W of power that's needed by data center GPUs like the Nvidia H100 and H200 , never mind the upcoming Blackwell B200 that can use up to 1200W per GPU.

The work was done using custom FGPA hardware, but the researchers clarify that (most) of their efficiency gains can be applied through open-source software and tweaking of existing setups. Most of the gains come from the removal of matrix multiplication (MatMul) from the LLM training and inference processes.

How was MatMul removed from a neural network while maintaining the same performance and accuracy? The researchers combined two methods. First, they converted the numeric system to a "ternary" system using -1, 0, and 1. This makes computation possible with summing rather than multiplying numbers. They then introduced time-based computation to the equation, giving the network an effective "memory" to allow it to perform even faster with fewer operations being run.

The mainstream model that the researchers used as a reference point is Meta's LLaMa LLM. The endeavor was inspired by a Microsoft paper on using ternary numbers in neural networks, though Microsoft did not go as far as removing matrix multiplication or open-sourcing their model like the UC Santa Cruz researchers did.

It boils down to an optimization problem. Rui-Jie Zhu, one of the graduate students working on the paper, says, "We replaced the expensive operation with cheaper operations." Whether the approach can be universally applied to AI and LLM solutions remains to be seen, but if viable it has the potential to radically alter the AI landscape.

We've witnessed a seemingly insatiable desire for power from leading AI companies over the past year. This research suggests that much of this has been a race to be first while using inefficient processing methods. We've heard comments from reputable figures like Arm's CEO warning that AI power demands continuing to increase at current rates would consume one fourth of the United States' power by 2030. Cutting power use down to 1/50 of the current amount would represent a massive improvement.

Here's hoping Meta, OpenAI, Google, Nvidia, and all the other major players will find ways to leverage this open-source breakthrough. Faster and far more efficient processing of AI workloads would bring us closer to human brain levels of functionality — a brain gets by with approximately 0.3 kWh of power per day by some estimates, or 1/56 of what an Nvidia H100 requires. Of course, many LLMs require tens of thousands of such GPUs and months of training, so our gray matter isn't quite outdated just yet.

Expand All

Read in NewsBreak

Comments / 0

Add a Comment

Tom's Hardware13 hours ago

AMD President Victor Peng retires — SVP of AI group Vamsi Boppana will lead AMD's AI business

Tom's Hardware5 days ago

Nvidia GeForce RTX 50-series launch pushed back to early 2025 according to prominent leaker

Tom's Hardware6 days ago

Intel finally announces a solution for CPU crashing and instability problems — claims elevated voltages are the root cause; patch coming by mid-August [Updated]

Tom's Hardware6 days ago

Chain with 1,000 stores closes a major location unexpectedly after 40 years

Tulsa, OK1 day ago

Can landlords enter at any time, or is it trespassing? Here are the rules in North Carolina

Town Talks17 days ago

Life partners from Virginia have been identified as the victims of a New York plane crash

Ronkonkoma, NY3 days ago

Opinion: The Words Every Dying Person Needs to Hear

Martin Vidal13 days ago

Project 2025’s Plan to Gut Medicare and Medicaid

Bucks County Beacon11 days ago

Cryptominer with palm-sized $179 ASIC hits the jackpot with $206,000 in Bitcoins

Tom's Hardware3 days ago

Denver homeless people suffer sores as xylazine enters fentanyl supply

Denver, CO23 days ago

Did You Feel the Earthquake?

Chicago, IL13 days ago

Drinking in public will no longer be prohibited on private Campgrounds in Virginia beginning July 1

Virginia State29 days ago

Major Grocery Brand Shuts Down Three Factories, Sparking Community Outcry

Auburn, NY11 hours ago

Optum laying off 524 employees in San Bernardino, Los Angeles, Riverside and Orange counties

Los Angeles County, CA2 days ago

Damaging Winds, Large Hail & Isolated Tornado Likely; Northern Plains & MS Valley; July 13, 2024

Mississippi State15 days ago

Tragedy Strikes: Teenager Commits Suicide Over Exam Results in Lahore

NewsNinja14 days ago

Savage Boxer of the Sea: The Deadly Beauty of Mantis Shrimp

Explore Beaufort SC4 days ago

Virginia pastor facing federal charges after he allegedly hit his wife on an Alaskan Airlines flight

Anchorage, AK18 days ago

Expert: Long COVID puzzle pieces are falling into place – picture is unsettling

The Current GA1 day ago

Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.

Comments / 0

Community Policy