Here's what caught my attention

The Summit attracted over 1,000 attendees this year, with dozens of presentations and hundreds of AI leaders from large companies as well as many startups.

Since 2019, the AI HW Summit in the San Francisco Bay Area has been the focal point for new technologies around AI every September. While the event, hosted by UK-based Kisaco, started with semiconductors, it has continually expanded its focus to include software, models, networking, and full data center optimization. Next year, it will return as the AI Infra Summit, recognizing that AI has become a full-stack enterprise that consumes entire data centers.

Some may be surprised to learn that Nvidia was not present at the event. They don’t see the point, since everyone knows who they are and how fast their GPUs are.

Here are some previews of the event.

A food fight has broken out over the claim of “the fastest inference on the planet”

It seems like the battle over inference services is really heating up, with Cerebras, Groq, and Samba Nova all claiming to be the fastest tokens-as-a-service available. Now, I’m pretty sure no one is lying here, but let’s just say that each company is cherry-picking the size of the Llama3.1 model they want to promote. And they’re mostly referring to tests done by Artificial Analysis, whose results are available on their website.

Here are Cerebras’ benchmark results:

Cerebras claims it is fastest for inference using Llama3.1-8 and 70B

Artificial analysis

And Samba Nova claims the fastest 405B setting in Llama 3.1. There’s been a lot of discussion about why Groq and Cerebras haven’t run this model (yet). It could be that they don’t have enough SRAM in their systems to do as well. Or they just don’t have enough time. (Note: OctoAI is reportedly in acquisition talks with Nvidia.)

Samba Nova claims they are the fastest…

Samba Nova

And here are the results from Groq. Groq appears to be gaining momentum, securing $640 million in investment from Blackrock and others. The company’s development cloud has rapidly grown to over 360,000 developers building AI on GroqCloud. Groq also secured a major data center deal with Aramco to build a giant data center in Saudi Arabia that could grow to some 200,000 language processing units.

But in larger models like the Llama 3.1 70B, Groq claims leadership

Artificial analysis

So, I went to the AA website (no, not THIS AA, although maybe I should) and found a very interesting chart that proclaims Cerebras the 70B winner in terms of performance and price per million chips at just under 50 cents.

As you can see, Cerebras takes the crown for performance in the 70B Instruct model and the best price by 1M … (+) tokens.

Artificial analysis

Confused? Me too. But here’s the thing. Artificial Analysis runs a wide variety of models on whatever hardware a modeling vendor uses. They don’t do any tuning, and Nvidia is only represented by vendors that use an unspecified Nvidia GPU. They don’t disclose how many accelerators are used in the runs, or what lower-level software is used.

Inference will become a bigger market than AI training, and these three companies have demonstrated a huge leap forward in reducing the cost of using AI in real-world applications. Good job! Now if you could just publish some MLPerf results, we would all feel better. AA provides a great service, but it is not a substitute for benchmarks run by the hardware vendors themselves such as MLPerf, whose benchmarks are all peer-reviewed before publication.

Optics is the next big thing

How many times have we heard this? It’s always coming soon. Yes, optical interconnects are widely used for rack-to-rack connectivity in modern data centers to get around copper length limitations and the need for reclocking. But optics are rarely used in a rack, where cable lengths are not an issue for cheaper copper solutions.

Celestial Optical Fabric can be used for system-to-HBM memory and system-to-system connectivity. … (+)

Celestial AI

But that may be about to change. Celestial AI is developing a sleek, high-performance design that they touted at the conference. Their approach could help solve the “memory wall” that GPUs face today, by providing access to over 33TB of shared HBM memory space. They claim to reduce costs by over 25x, power consumption by 8x, and RDMA latency by 5x, all while delivering 4x the bandwidth. We’ll be watching these guys closely as they finish their 1st-gen design.

Celestial AI Fabric Can Reduce Costs by More Than 20X, Power Consumption by 8X, and Latency by 5X … (+) according to the company.

Celestial AI

What happened to analog computing?

There is a lot of research going on at IBM, Intel and elsewhere to develop a high-performance in-memory analog computing solution. It looks great in PowerPoint, but D/A converters add latency and the memory size is not conducive to running the LLMs that generate billions of dollars of investment these days.

Enter Mentium, a UC Santa Barbara startup building a platform that combines a digital processor with an analog in-memory computing processor that it says offers the best of both worlds.

Mentium combines an analog processor for small cores with a digital processor for frequently used cores. … (+) for better energy efficiency in Edge AI.

Mentium

Mentium also made the choice to move from hosting its EDA tools internally to the Synopsys Cloud, hosted on Microsoft Azure. This change saved the company months of development and costs, while reducing the complexity it faced when using on-premises EDA tools.

Synopsys offers its cloud-based EDA development environment to help startups design chips. … (+) Mentium.

Synopsis

Enfabrica’s Mega-NIC is coming soon

One of Nvidia’s major strengths is NVLink, which interconnects up to 512 GPUs at 100 GB/s per link, and is 14 times faster than PCIe. But what about the “rest of the story”: how do you connect the GPU nodes? It takes a lot of switches.

The current network topology interconnecting GPUs in the data center requires PCIe switches, NICS, … (+) Rail switches and spine switches.

Fabrica

Enfabrica emerged from hiding at last year’s AI HW Summit, with backing from Nvidia and a who’s who of venture capitalists. This year, the company is moving closer to productization, and has expanded its value proposition to include the failover features so important to AI training.

The Enfabrica Super-NIC fabric, enabled by ACF-S silicon, eliminates PCIe, NIC and … (+) Switches.

Fabrica

When adoption begins in 2025, we expect Enfabrica to become an industry darling, and they should see significant adoption.

Conclusions

Phew! There were a lot of slides and four full days of companies working to improve the effectiveness of AI. And the focus went well beyond the chips that power AI. For example, Meta presented data on failure and a three-pronged strategy for dealing with the certainty of failure: avoid failure, detect failure, and tolerate inevitability.

Causes of outages in Meta’s massive data centers.

Here’s what caught my attention

A food fight has broken out over the claim of “the fastest inference on the planet”

Optics is the next big thing

What happened to analog computing?

Enfabrica’s Mega-NIC is coming soon

Other stories worth telling

Positron:

Angry AI:

Broadcom and the Ultra Ethernet Consortium

Conclusions