To structure or not to structure IT Cabling for AI Clusters - Part 4: Shifting the Status Quo Ante: The Future of AI Connectivity

In the first three parts of this blog series, we explored various critical aspects of IT cabling for AI clusters, emphasizing the impact of high bandwidth and latency requirements on connectivity solutions, the practical considerations of deployment and installation, and the sustainability implications of different cabling choices.

 

Part 1: Operational Sensitivity and High Bandwidth Demands

We began by differentiating between training and inference AI clusters, highlighting their unique connectivity requirements. Training clusters, with their substantial number of GPUs and CPUs, demand high bandwidth, low latency, and operational stability. We discussed how the increased complexity of modulation protocols and latency requirements influence the choice of IT cabling solutions, with a focus on Direct Attach Copper (DAC) and Active Optical Cables (AOC).

 

Part 2: Deployment and Installation Considerations

In the second part, we delved into the deployment and installation aspects of AI clusters, comparing point-to-point cabling with structured cabling. We analyzed their impact on cost, flexibility, modularity, and deployment speed. Key points included the advantages of pre-mounted and pre-cabled racks, the benefits of using top-of-rack panels for managing numerous connections, and the operational challenges associated with different cabling methods.

 

Part 3: Sustainability

The third part focused on the sustainability of IT cabling infrastructure. We discussed the significant impact of cabling choices on greenhouse gas (GHG) emissions and overall data center sustainability. The importance of low-power solutions like DAC for short distances and the considerations for long-distance fiber solutions were highlighted. We also explored the embedded GHG footprint and plastic waste associated with connectors, cables, and patch panels, emphasizing the need for sustainable practices in maintenance and operations.

 

In part 4 we’ll be looking at the future of AI and which technologies and architectures are on the horizon to deal with age of AI and quantum computing.

 

Right now, various players like AWS, Microsoft Azure, Google, NVIDIA, and others are individually devising their own strategies and architectures to best advance AI architecture and connectivity infrastructure. This “status quo ante” is a transformative phase where innovative and cohesive strategies are developed to handle the increasing demands of AI.

 

One of the aspects we discussed in a previous blog is already impacting the connectivity technology road map for AI with the drive to bring the optical signaling closer and closer to the actual xPU to reduce power consumption, latency and increase bandwidth. With big financial support for advancements in Co-Packaged Optics (CPO) and On-Interposer Optics (OIO) the world of Terrabit connections is about to dawn on us.

Co-Packaged Optics (CPO): Co-Packaged Optics (CPO) represents a significant leap forward in data center connectivity. By integrating optical and electronic components into a single package, CPO dramatically reduces the distance data must travel, thereby minimizing latency and power consumption. This technology is poised to handle the ever-increasing bandwidth demands of AI workloads efficiently.

Figure 1: Co-Packaged Optics

Source: lightwaveonline.com

On-Interposer Optics (OIO): On-Interposer Optics (OIO) takes the integration a step further by embedding optical components directly onto the interposer. This integration enhances signal integrity and reduces latency, making OIO particularly suited for high-performance AI applications. OIO, combined with CPO, ensures that high-speed data transfer within switches and equipment remains efficient and reliable.

Figure 2: On-Interposer Optics (OIO)

Source: semianalysis.com

What does this have to do with your cabling infrastructure, you might ask?

It means that as we move to all optics connectivity, the AI clusters are not limited anymore in distance and with optical switching can move to real meshed cluster architecture. This will increase the amount of fibers used in AI clusters and drive the need to handle bigger and bigger densities. We will see an increase in VSFF connectors to handle the parallel optics and density requirements in the data center.

VSFF Connectors: Very Small Form Factor (VSFF) connectors, such as SN-MT and MMC, play a crucial role in future connectivity technologies. These connectors allow for higher fiber density and more efficient use of space, which is essential for the compact designs of CPO and OIO.

Figure 3: Optical switch with VSFF 3D rendering

These parallel optics to support the Tb bandwidth requirements will also be transformed as we can’t just keep on increasing the amount of fiber per connector to support this. That is where the evolution of few-core fibers (FCF) and Multi-core Fibers (MCF) will be potential contestants for the future of fiber cabling.

Flexible Circuit Fiber (FCF) and Multi-Core Fiber (MCF): Flexible Circuit Fiber (FCF) and Multi-Core Fiber (MCF) offer promising advancements in cable technology. FCF provides flexibility and durability, making it ideal for high-density environments where space is at a premium. MCF, on the other hand, can transmit multiple signals simultaneously through a single fiber, significantly increasing bandwidth capacity without requiring additional physical space.

Figure 4: Multi-Core Fiber

Source: researchgate.net

From an application perspective, AI will not just stop at impacting the data center, also the broadband industry won’t be able to escape the age of AI, where AI will be part of the 6G and 7G mobile architecture to manage the network and bring AI closer to the end-user. So also the antenna base stations and central offices will need to be ready to accommodate AI. Also the compute processing is constantly advancing, just recent google successfully tested their quantum computer, where quantum computing will speedup the AI capabilities, read bandwidth, and impact quantum encryption, read latency.

As exiting this is from a innovative perspective, the more daunting it is for the data center design, as to which technology to choose to ensure proper financial and sustainable return on investment. Here standardization will eventually play an important role to ensure interoperability between the different technologies.

“The future belongs to those who believe in the beauty of their dreams.” — Eleanor Roosevelt

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact US

If you want to know more about us, you can fill out the form to contact us and we will answer your questions at any time.