AMD Talks '1.2 Million GPU' AI Supercomputer, Says 'Sober People' Willing to Spend Billions in AI Race

AMD has revealed that it has received an inquiry to build a massive supercomputer based on 1.2 million data center GPUs, which is an absurd amount given the market dynamics.

AMD could see ‘gold rush’ on hype around next-gen AI clusters as company reveals involvement in potential 1.2 million GPU supercomputer

Well, Team Red may have found its next “behemoth” customer, as the company claims it could be involved in building an AI cluster that would house 1.2 million GPUs. Speaking to The Next Platform, Forrest Norrod, executive vice president and general manager of AMD’s Data Center Solutions Group, said that AMD has received requests from “unknown customers” that require the provision of a massive amount of AI accelerators, and this was validated after being asked if anyone was considering such an undertaking.

TPM: What’s the biggest AI training group that anyone is seriously interested in? You don’t need to name names. Has anyone come to you and said, with the MI500, I need 1.2 million GPUs or something?

Norrod Forest: Is it in this range? Yes.

TPM: You can’t just say “it’s in this range.” What’s the largest real number?

Norrod Forest: I’m very serious, it’s in that range.

TPM: For a machine.

Norrod Forest: Yes, I’m talking about a machine.

TPM: It makes you a little dizzy, you know?

Norrod Forest: I understand that. The scale of what is being contemplated is mind-boggling. Will it all come to fruition? I don’t know. But there are public reports that very sober people are contemplating spending tens of billions of dollars, maybe even a hundred billion dollars, on training groups.

Forrest Norrod – Executive Vice President of AMD (via The Next Platform)

Let’s refresh your memory a bit. If you still think that 1.2 million GPUs is not a huge amount, consider that the world’s largest supercomputer, the Frontier, uses about 38,000 GPUs, and having 1.2 million GPUs on board means that there is a 30x gap in terms of graphics computing, just with GPUs, which is shocking. And if you just consider the interconnectivity of a graphics stack that large, it is simply mind-boggling and perhaps impossible given modern technology.

AMD-powered Frontier supercomputer uses 3,000 of its 37,000 MI250X GPUs to achieve a staggering 1 trillion LLM Run 1 parameters

Do we believe that it is impossible to have 1.2 million GPUs in an AI cluster? Well, no. The reason is that, given the way AI is progressing, the need for adequate computing power has grown rapidly and, as Forrest himself says, “sober people” are willing to spend billions to build large-scale data centers to meet the demand that is present in the markets.

If you were to equip a supercomputer with 1.2 million AMD Instinct MI300X AI accelerators, it would cost about $18 billion just for the GPUs, considering that a single unit costs about $15,000. And that’s not even considering the power requirements of such a supercomputer. If AI continues to accelerate at the rate it is currently, we can expect to see such supercomputers popping up all over the world. It will be a huge investment and will take years to complete, but once complete, these supercomputers will be among the fastest computing platforms on the planet.

NVIDIA CEO Jensen Huang has said that the data center segment is expected to become a trillion-dollar market in the coming years, and rumor has it that Microsoft and OpenAI are planning to build a $100 billion supercomputer, Stargate. So the 1.2 million GPU figure isn’t entirely inaccurate. Will big tech companies choose AMD over NVIDIA? That’s a question that only time will tell.

Information source: The Next Platform