Tom's Hardware

Ex-Twitter dev reminisces about finding 700 unused Nvidia GPUs after takeover — forgotten cluster was 'powered on and idle'

By Mark Tyson,

8 hours ago

An engineer who worked at Twitter during the seismic Agrawal-Musk transition has been publicly reminiscing about finding a cluster of 700 Nvidia V100 GPUs. Tim Zaman, who now works as a software engineer at Google DeepMind, discovered this significant chunk of GPU power to be powered up but idle in the data center of X’s chirpy ancestor.

The warm humming mass of Nvidia silicon and PCBs in the Twitter data center was poetically described as “the forgotten remains of an honest attempt to make a cluster within Twitter 1.0” by Zaman in a Twitter/X post on Monday. The engineer had been spurred to write about his surprise discovery of this silicon treasure trove after reading about xAI’s Memphis Supercluster getting to work training Grok 3, powered by 100,000 liquid-cooled Nvidia H100 accelerators on a single RDMA fabric.

Zaman underlined what many of you will be thinking – Twitter had 700 of the world's most powerful GPUs humming along without purpose for years. “How times have changed!” he exclaimed. Indeed, the first Nvidia Volta architecture V100 GPUs for data centers started to arrive in the market during the first great GPU shortage of 2017, and Zaman found the 700x V100 card-powered cluster running without purpose in mid-2022. That’s a lot of computing time and resources wasted.

Another moment of mirth for Zaman was discovering that the 700 Nvidia V100s were PCIe GPUs rather than the far higher bandwidth NVLink interfaced SXM2 form factor variety. Of course, we don’t know why the 2017-era Twitter bought PCIe instead of SXM2 bus V100 GPUs for this sizable installation, and perhaps we will never know.

Zaman’s Tweet also contained some interesting musings about Musk’s new ‘Gigafactory of Compute.’ Running “100k GPUs on the same fabric must be an epic challenge,” commented the engineer. “At that scale, the only guarantee is failure, and it's all about graceful failure management.” With this in mind, Zaman pondered over disaggregating resources into distinct domains so that failures wouldn’t bring the whole house down.

The engineer also found the potential maximum number of GPUs that could exist on a single fabric fascinating. With tech titans racing to build bigger and bigger AI training clusters, both predictable and unforeseen limits on the maximum number of GPUs on the same fiber are bound to become known.

Expand All

Read in NewsBreak

Comments / 0

Add a Comment

The US Sun26 days ago

Leaked RDNA 4 features suggest AMD drive to catch up in Ray Tracing — doubled RT intersect engine could come to PS5 Pro

Tom's Hardware6 days ago

Nvidia reportedly planning RTX 3050 A using Ada Lovelace AD106 silicon — it's unclear what features remain available or how it might perform

Tom's Hardware2 days ago

Unreal Engine supervisor at ModelFarm blasts 50% failure rate with Intel chips — company switching to AMD's Ryzen 9 9950X, praises single-threaded performance

Tom's Hardware3 days ago

Elon Musk reveals photos of Dojo D1 Supercomputer cluster — roughly equivalent to 8,000 Nvidia H100 GPUs for AI training

Memphis, TN3 days ago

Two Beloved Hollywood Actresses Have Passed Away This Week

Blanco, TX12 days ago

Favorite Chicken Chain Suddenly Closes All Stores, Heartfelt Message Found on Doors

Lancaster County, PA11 days ago

The Bold and the Beautiful Spoilers: Luna finds something shocking in Poppy's old apartment

Virginia State3 hours ago

Chelsea Clinton is rumored to be living in Virginia but probably still in NYC

New York City, NY13 days ago

Did You Feel the Earthquake?

Chicago, IL12 days ago

You Won't Believe What This Cat Did to Show Its Love!

Vision Pet Care7 hours ago

A well-known grocery chain, has confirmed the closure of 32 of its under performing stores

New York City, NY14 days ago

Cruise Passenger Wins Jackpot on Cruise and Gets $50k Cabin Upgrade

J. Souza25 days ago

Meet Chunk: A Little Chihuahua Who Lost her Owner Now Looking For Love

Massachusetts State4 days ago

Wyoming Dems demand Hageman apologize for calling Harris a ‘DEI hire’

Wyoming State7 hours ago

An Android phone launching next week has a spec we’ve never seen before

Digital Trends2 days ago

Denver homeless people suffer sores as xylazine enters fentanyl supply

Denver, CO22 days ago

Life partners from Virginia have been identified as the victims of a New York plane crash

Ronkonkoma, NY1 day ago

I’ve lost all hope in PC hardware this year

Digital Trends8 hours ago

Major Grocery Brand Shuts Down Three Factories, Sparking Community Outcry

Auburn, NY1 day ago

Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.

Comments / 0

Community Policy