Thursday 5 March 2026, 06:28 AM

How Ultra Ethernet Transport (UET) eliminates RDMA bottlenecks in hyperscale AI clusters

Discover how the UEC 1.0.1 specification and Ultra Ethernet Transport (UET) use connectionless delivery and PCM to eliminate RDMA bottlenecks in AI clusters.

The network is the computer

I spend a lot of time talking to teams here in the Bay Area who are trying to train the next generation of massive Large Language Models. In almost every conversation, we eventually hit the exact same wall: the network. When you are stringing together clusters of over 100,000 GPUs, the network doesn't just connect the computers—it literally becomes the computer.

Up until recently, hyperscale AI networking has felt like a walled garden. You either paid the premium for Nvidia's proprietary InfiniBand to get the performance you needed, or you tried to wrestle legacy RDMA over Converged Ethernet (RoCEv2) into submission. But I've been digging into the newly minted UEC 1.0.1 specification, and it is a breath of fresh air. The Ultra Ethernet Consortium (UEC) is fundamentally re-architecting Ethernet to eliminate RDMA bottlenecks, and it opens up a playground of possibilities for how we scale AI infrastructure.

Packet spraying and the end of traffic jams

The most wildly exciting part of the UEC 1.0.1 specification is the Ultra Ethernet Transport (UET) protocol. UET tosses out the legacy RoCEv2 playbook in favor of a clean-slate, connectionless architecture.

To understand why this is huge, think about how traditional Ethernet routes traffic. Using Equal-Cost Multi-Path (ECMP) routing is a bit like forcing a massive convoy of trucks to stay in a single lane; if one truck slows down, the whole convoy suffers flow collisions. UET completely decouples packet delivery from strict ordering constraints. Instead, it uses per-packet multipathing—affectionately known as "packet spraying." It takes your data and sprays it evenly across every available network path, acting more like a highly coordinated swarm of bees than a single-file line.

This means we can achieve near 100% link utilization. When you pair this with Programmable Congestion Management (PCM) and high-fidelity Congestion Signaling (CSIG), the network can dynamically adapt on the fly to the massive incast traffic spikes generated by AI collective operations. The head-of-line blocking that historically choked AI cluster performance? Gone. We no longer have to rely on Priority Flow Control (PFC) and massive headroom buffers to keep things moving.

Scaling to infinity (and beyond O(N^2))

If you've ever tried to scale a massive distributed system, you know that state memory is the silent killer. In the old RoCEv2 world, Fabric Endpoints relied on persistent Queue Pairs. That meant network state memory scaled quadratically—at an agonizing O(N^2).

UET flips the script by utilizing ephemeral Packet Delivery Contexts (PDCs). By dropping the requirement for persistent connections, network state memory now scales at a beautiful, flat O(1). Theoretically, this allows clusters to scale to millions of endpoints without exhausting memory. For anyone dreaming about the next frontier of distributed computing, this is the scalability unlock we've been waiting for.

Real silicon and battle-tested roots

I have a healthy allergy to vaporware, so my first question when reading any new spec is always: Is this actually shipping? The short answer is yes, and it is built on an incredibly solid foundation.

Roughly 75% of the UET protocol is derived directly from Hewlett Packard Enterprise's (HPE) Slingshot interconnect. This isn't untested theory; this is the exact technology powering the world's top three supercomputers.

The broader ecosystem is already falling into place. The Internet Assigned Numbers Authority (IANA) has officially assigned UDP port 4793 to UET, ensuring standardized UDP encapsulation across global IP infrastructures. On the hardware side, silicon vendors are already pushing UEC 1.0 validated gear out the door. Broadcom's Tomahawk 6 ASIC is out in the wild delivering a staggering 102.4 Terabits per second of switching capacity, and Nokia is actively integrating it into their data center switching portfolio.

Securing the swarm and scaling up

Security in hyperscale environments is notoriously tricky, usually requiring massive state overhead from protocols like IPsec. The UEC 1.0.1 specification elegantly sidesteps this by introducing a Transport Security Sublayer (TSS). Using AES-GCM encryption and highly efficient group keying, thousands of endpoints within a single AI job's security domain can communicate securely without bogging down the hardware.

But what really hints at the massive opportunity ahead happened in October 2025 at the OCP Global Summit. A coalition of UEC heavyweights—AMD, Broadcom, Meta, and Nvidia—launched the ESUN (Ethernet for Scale-Up Networking) working group. Their goal is to extend these open UEC standards beyond scale-out networks and push them directly into the scale-up, GPU-to-GPU communication domain.

We are looking at a democratized, interoperable future where standard, multi-vendor Ethernet hardware delivers InfiniBand-class tail latency and reliability. With projections showing UET surpassing InfiniBand in AI data centers by 2027, the era of proprietary lock-in is coming to an end. It's a great time to be building in this space.