To structure or not to structure IT Cabling for AI Clusters - Part 3 Sustainability

It has been a while to get part 3 going as I was emerged in the cabling architecture of the NVIDIA H100 and GB200 ultra large clusters and how this impacts the cabling product portfolio. And am glad to share some lessons from that here.

Recap

Part 1: Explored different AI models, signaling bandwidth challenges, and connectivity requirements.

Part 2: Examined the differences between Point-to-Point and structured cabling and their impact on cost, flexibility, modularity, and deployment.

Part 3: The Sustainability Aspect In this third part, we delve into the sustainability of IT cabling infrastructure. While much attention has been given to the increased power and cooling requirements of AI, the choice and future readiness of IT cabling also significantly impact sustainability.

As the IPCC states

“Each incremental reduction in greenhouse gas emissions helps mitigate the impacts of climate change, leading to a more sustainable future.”

DAC and Point-to-Point Copper Solutions Given the immense number of Compute-to-Compute links required and their high bandwidth, low latency demands, the most sustainable solution, at the moment, is Direct Attach Copper (DAC) or other point-to-point copper solutions. These have a hundredfold lower power consumption than fiber transceivers.

Long-Distance Connectivity For longer distances, fiber is the only solution, but it’s important to consider the long-term operational impact. While power consumption remains similar for Active Optical Cables AOC and transceivers with fiber cabling, either Point-to-Point or structured cabling, the maintenance and operations related GHG emissions in large AI clusters become a significant differentiation.

Maintenance and GHG Emissions META has identified network cable failure as one of the top five causes of AI issues. In a 100,000 GPU cluster, considering an industry standard 5-year mean time between failures for transceivers, there is only 26.3 minutes till first failure time, or an estimated 54 potential failures per day. This makes the rip-and-replace approach with AOC cables both a sustainable and operational burden.

Connectors, Cables, and Patch Panels Connectors, cables, and patch panels contribute to the embedded GHG footprint and plastic waste. Deploying VSFF connectors and couplers in combination with reduced cladding and coating cables can significantly reduce material usage, space usage and the associated GHG footprint [. While Point-to-Point cabling eliminates the need for patch panels, it poses operational challenges and future connectivity complications, potentially increasing Total Cost of Ownership (TCO). Balancing operational efficiency, cost, and sustainability is essential.

Material Reduction :

  • VSFF Connectors: Increase fiber density by up to three timescompared to traditional connectors [1].
  • Reduced Cladding Fiber Cables: Decrease the diameter of the fiber coating from 200 µm to 125 µm, reducing material usage[2].

Plastic Usage:

  • VSFF Connectors: Reduce plastic usage by approximately 50%due to higher fiber density and smaller connector size [1][3].
  • GHG Emissions:
    • Manufacturing: Lower material usage results in a 20-30% reductionin GHG emissions during production [2][3].
    • Transportation: Reduced weight of cables and connectors leads to lower transportation emissions by 10-15%[2][3].
  • Operational Efficiency:
    • Space Optimization: Higher fiber density and smaller connectors optimize space within data centers, reducing the need for additional infrastructure[2][3].

Conclusion Due to the huge amount of connections needed to support the parallel neural network in AI, the impact of theconnectivity and  IT cabling on the sustainability of the data center has grown and needs to be considered when designing for it.

Fortunately, advancements in transceiver power usage and fiber cabling offer a brighter and more sustainable future. Stay tuned for more about these developments in the next installment of our blog series.

Part 4: The future of AI Connectivity In this final part we will have a look at what the future connectivity could look like with CPO, OIO, FCF and MCF acronyms to share.

Happy dAIs and stay connected

[1] USCONEC – A Novel, Low-loss, Multi-Fiber Connector with Increased Usable Fiber Density

[2]USCONEC – A Novel, Low-loss, Multi-Fiber Connector Compatible with Reduced Coating Diameter Fiber

[3]Corning – Port Breakout and Very Small Form Factor (VSFF)

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact US

If you want to know more about us, you can fill out the form to contact us and we will answer your questions at any time.