close
close

Here’s what caught my attention

Here’s what caught my attention

The Summit attracted over 1,000 attendees this year, with dozens of presentations and hundreds of AI leaders from large companies as well as many startups.

Since 2019, the AI ​​HW Summit in the San Francisco Bay Area has been the focal point for new technologies around AI every September. While the event, hosted by UK-based Kisaco, started with semiconductors, it has continually expanded its focus to include software, models, networking, and full data center optimization. Next year, it will return as the AI ​​Infra Summit, recognizing that AI has become a full-stack enterprise that consumes entire data centers.

Some may be surprised to learn that Nvidia was not present at the event. They don’t see the point, since everyone knows who they are and how fast their GPUs are.

Here are some previews of the event.

A food fight has broken out over the claim of “the fastest inference on the planet”

It seems like the battle over inference services is really heating up, with Cerebras, Groq, and Samba Nova all claiming to be the fastest tokens-as-a-service available. Now, I’m pretty sure no one is lying here, but let’s just say that each company is cherry-picking the size of the Llama3.1 model they want to promote. And they’re mostly referring to tests done by Artificial Analysis, whose results are available on their website.

Here are Cerebras’ benchmark results:

And Samba Nova claims the fastest 405B setting in Llama 3.1. There’s been a lot of discussion about why Groq and Cerebras haven’t run this model (yet). It could be that they don’t have enough SRAM in their systems to do as well. Or they just don’t have enough time. (Note: OctoAI is reportedly in acquisition talks with Nvidia.)

And here are the results from Groq. Groq appears to be gaining momentum, securing $640 million in investment from Blackrock and others. The company’s development cloud has rapidly grown to over 360,000 developers building AI on GroqCloud. Groq also secured a major data center deal with Aramco to build a giant data center in Saudi Arabia that could grow to some 200,000 language processing units.

So, I went to the AA website (no, not THIS AA, although maybe I should) and found a very interesting chart that proclaims Cerebras the 70B winner in terms of performance and price per million chips at just under 50 cents.

Confused? Me too. But here’s the thing. Artificial Analysis runs a wide variety of models on whatever hardware a modeling vendor uses. They don’t do any tuning, and Nvidia is only represented by vendors that use an unspecified Nvidia GPU. They don’t disclose how many accelerators are used in the runs, or what lower-level software is used.

Inference will become a bigger market than AI training, and these three companies have demonstrated a huge leap forward in reducing the cost of using AI in real-world applications. Good job! Now if you could just publish some MLPerf results, we would all feel better. AA provides a great service, but it is not a substitute for benchmarks run by the hardware vendors themselves such as MLPerf, whose benchmarks are all peer-reviewed before publication.

Optics is the next big thing

How many times have we heard this? It’s always coming soon. Yes, optical interconnects are widely used for rack-to-rack connectivity in modern data centers to get around copper length limitations and the need for reclocking. But optics are rarely used in a rack, where cable lengths are not an issue for cheaper copper solutions.

But that may be about to change. Celestial AI is developing a sleek, high-performance design that they touted at the conference. Their approach could help solve the “memory wall” that GPUs face today, by providing access to over 33TB of shared HBM memory space. They claim to reduce costs by over 25x, power consumption by 8x, and RDMA latency by 5x, all while delivering 4x the bandwidth. We’ll be watching these guys closely as they finish their 1st-gen design.

What happened to analog computing?

There is a lot of research going on at IBM, Intel and elsewhere to develop a high-performance in-memory analog computing solution. It looks great in PowerPoint, but D/A converters add latency and the memory size is not conducive to running the LLMs that generate billions of dollars of investment these days.

Enter Mentium, a UC Santa Barbara startup building a platform that combines a digital processor with an analog in-memory computing processor that it says offers the best of both worlds.

Mentium also made the choice to move from hosting its EDA tools internally to the Synopsys Cloud, hosted on Microsoft Azure. This change saved the company months of development and costs, while reducing the complexity it faced when using on-premises EDA tools.

Enfabrica’s Mega-NIC is coming soon

One of Nvidia’s major strengths is NVLink, which interconnects up to 512 GPUs at 100 GB/s per link, and is 14 times faster than PCIe. But what about the “rest of the story”: how do you connect the GPU nodes? It takes a lot of switches.

Enfabrica emerged from hiding at last year’s AI HW Summit, with backing from Nvidia and a who’s who of venture capitalists. This year, the company is moving closer to productization, and has expanded its value proposition to include the failover features so important to AI training.

When adoption begins in 2025, we expect Enfabrica to become an industry darling, and they should see significant adoption.

Other stories worth telling

Microsoft, AWS, and Meta all shared data center news, more than we can cover on this blog. But their presentations and those of others reinforced the message that AI is now at data center scale, with tens of thousands of GPUs. Meta predicted a tenfold increase in cluster sizes by 2030, or millions of GPUs. And while AMD and Intel shared their stories and roadmaps (nothing new to see here; move on), there were plenty of interesting stories from entrepreneurs at the event. Here are a few:

Positron:

Angry AI:

Japanese startup Furiosa AI has demonstrated its approach to efficient AI using Tensor Contraction, not Matmul, as the primitive operation. In data centers with typical current power densities of around 15 kW/rack, this could be interesting, although many other companies such as Hailo and D-Matrix are on the same energy efficiency path using SRAM for weights.

Broadcom and the Ultra Ethernet Consortium

Now that Nvidia has joined the party, there is no doubt that UEC will become a mainstream networking standard when it is released in 2026 (?). While we are confident that UEC will be the next standard, that doesn’t mean that Nvidia will stop innovating in its networking, including NVLink and InfiniBand, as well as its own Spectrum Ethernet.

Conclusions

Phew! There were a lot of slides and four full days of companies working to improve the effectiveness of AI. And the focus went well beyond the chips that power AI. For example, Meta presented data on failure and a three-pronged strategy for dealing with the certainty of failure: avoid failure, detect failure, and tolerate inevitability.

If you only have time to attend two conferences next year, Nvidia GTC and the AI ​​Infra Summit (new name) are the two you should attend.

I hope to see you there!