To structure or not to structure IT Cabling for AI Clusters - Part 2

Previous blog I covered the differences between training and inference AI, the increased bandwidth requirements and how this impacts the operational consideration around the AI connectivity. How the increased complexity of the modulation protocol and latency requirements greatly influence the IT cabling solution for the ICI, FrontEnd, BackEnd and InBand Managment connectivity.

In this blog we’ll have a look at what aspects of the deployment and installation influence the choice of IT cabling.

To gain time on the deployment of the the clusters, the equipment, servers and switches are often pre-mounted and pre-cabled in the racks offsite. Sometimes the equipment gets even pre-configured and tested, ready for when the floor space and power is completed. The racks are entirely shipped to the data center, to be wheeled in, connected and powered up. This gives the advantage that the numerous ICI links between the server and the switch can be nicely managed in the rack. However there are also several links that need to connect to devices outside of the rack, such as the ICI switch L1 to ICI switch L2 links, the FrontEnd, BackEnd, InBand and Out of Band Management connections. These connections can run in the several dozens with a mixture of SMF and MMF, parallel and duplex optical channels.

Therefore it is not uncommon to also run these cables to a top of rack panel, this solution offers a couple of advantages:

  • The numerous cables in the rack can be installed with the correct length and properly dressed in the rack;
  • It reduces the risk to the, very expensive, transceivers in the equipment as they don’t need to be touched during the install by technicians, which are under stress to connect the rack as soon as possible and are often not trained in proper handling of fibre cables;
  • The entire rack can be tested offside till that cable demarcation point at the top of the rack.

With so many connections it is important to have a solutions that is very high density, modular and flexible, it is also important to have a good vertical cable management system in the racks to handle and maintain the patch leads.

I see three options to get the computer room ready to accommodate the racks, either you run point to point cabling from the AI network racks to the AI Compute racks, be it with DAC, AOC or fibre patch leads, or you have structured cabling between the AI Network racks and the AI Compute racks with either the patch panels ready on top of the cable pathways and the racks are adjusted to have the patch panels fitted through the top, or you install the patch panels in an Over Head Enclosure OHE and patch from there into the AI Compute racks. Each method has it’s advantages and disadvantages.

Point-to-Point Cabling

Structured Cabling

+ Cost of DAC or AOC is lower

– Cost of transceivers and structured cabling is higher

+ Power usage of DAC is low, AOC and fiber higher

– Power usage of fiber transceivers is higher

+ Latency of DAC is low, AOC and fiber higher

– Latency of fiber transceivers is higher

– Reach of DAC on high bandwidths is short, fiber and AOC is longer

+ Fiber connectivity’s reach is long

– After installation link can’t be tested if it is functional, till it is connected to the equipment

+ Testing of installation quality before patching

+ No couplings in between that could cause RL or IL issues.

– Proper Inspect, Clean, Connect process needs to be in place before connecting each connector, also at equipment.

– When there is an issue with the link, the complete link needs to be replaced.

+ Trouble shooting possible to identify transceiver or fiber link issue and repair of individual component possible

– End of Live (EoL) or scale-up of equipment, complete rip and replace needed of infrastructure

+ EoL or scale-up, only equipment equipment to be adjusted or replaced. Structured cabling can stay in place.

– Very high volume, 1000+ of cabling in network rack, which has limited vertical management space

+ Option to use ODF to do patching and have efficient cable management for large volume patching.

The difference between the Point-to-Point cabling and structured cabling is a matter of decision  between the initial cost compared to the operational risk and cost, and the long term cost. The difference between structured cabling to the racks or to the OHE, is more around operational risk of exposed fibre patch leads.

Installation patch Panel in AI Compute Rack

Patching from OHE to AI Compute rack

+ Patch cables will be inside the rack

– Patch leads would be partly unmanaged outside the rack

– additional space to be foreseen in racks

+ No additional space needed in racks

– moving patch panels in rack runs the risk of damage and would require re-testing

+ Structured cabling to be tested in advance

– Removal of patch panel when AI compute rack gets decommissioned

+ Clear demarcation of cabling infrastructure under responsibility of AI Compute rack supplier.

One of the considerations above is about whether the structured cabling will be re-usable and what what platform future bandwidths are looking at. This is something we’ll address in the next part of this blog. Feel free to provide your comments and to challenge the thoughts – “Anyone who has never made a mistake has never tried anything new.”– Albert Einstein

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact US

If you want to know more about us, you can fill out the form to contact us and we will answer your questions at any time.