How Direct-to-Chip Liquid Cooling Actually Works: The Engineering Behind the Cold Plate

Vendor presentations show a cold plate on a chip and an arrow labeled "coolant." The reality involves four distinct engineering stages, each with its own fluid dynamics, pressure constraints, material compatibility requirements, and failure modes. This is the full loop, stage by stage, with the operating parameters that matter.

Stage 1: The Cold Plate

The cold plate is a metal block, typically copper or aluminum, machined with internal channels that sit directly on the processor package. Thermal interface material (TIM) fills the gap between the chip's integrated heat spreader and the cold plate's contact surface. The quality of this thermal interface is the single most important variable in the entire system. A 0.1 mm air gap in the TIM application can increase thermal resistance by 20% or more.

Coolant enters the cold plate through an inlet port, flows through the internal channel geometry, absorbs heat from the processor through conductive transfer from the metal walls, and exits through an outlet port. The channel geometry varies by manufacturer. Some use simple serpentine patterns. Others use microfinned or skived fin structures that increase the surface area in contact with the coolant. More surface area means more heat transfer per unit of flow, but also means higher pressure drop through the cold plate.

Single-phase operating parameters: Coolant is a propylene glycol and water blend, typically 30/70 or 40/60 ratio. Inlet temperature 25 to 40 degrees Celsius. Flow rate 1.0 to 1.5 LPM per cold plate for a 300 to 500W chip, scaling to 1.5 to 2.5 LPM for chips above 700W. Pressure drop across the cold plate: 3 to 8 PSI depending on channel geometry and flow rate. Outlet temperature 10 to 15 degrees Celsius above inlet for a fully loaded chip.

Two-phase operating parameters: Coolant is a refrigerant (historically 3M Novec, transitioning to HFO-based fluids like Chemours Opteon). Inlet pressure 130 to 150 PSI. The refrigerant enters as a subcooled liquid and undergoes partial or full vaporization at the chip surface. Outlet is a two-phase mixture of liquid and vapor. The ratio of vapor to liquid (quality) at the outlet depends on the thermal load. A chip at full TDP might push outlet quality to 60 to 80% vapor. Flow rates are lower than single-phase, typically 0.3 to 0.8 LPM per cold plate, because the latent heat absorbed during vaporization carries far more energy per unit mass.

Stage 2: The Manifold and Distribution

Coolant travels between cold plates and the CDU through a manifold system. The manifold is the plumbing backbone of the rack. It typically consists of a supply header that distributes coolant from the CDU to individual server nodes, and a return header that collects heated coolant (or two-phase mixture) and routes it back.

The connections between servers and the manifold are where the engineering gets interesting. Quick-disconnect couplings allow servers to be connected and disconnected from the cooling loop without draining the manifold. These couplings must be leak-free at operating pressure, compatible with the fluid chemistry, and operable without specialized tools. In a production data center, a technician needs to be able to swap a server node without shutting down the cooling loop for the rest of the rack.

Blind-mate connectors take this a step further. The server slides into the rack on rails and the fluid connections engage automatically as the server seats. No manual coupling required. The connector alignment is mechanical, built into the rail and server chassis design. This is the direction the OEMs are heading for high-volume deployments.

The manifold is also where flow balancing either happens or does not. Each branch of the manifold feeding a cold plate can include a flow restriction device, either a fixed orifice or a pressure-compensated regulator, to ensure even distribution. Without flow balancing, the cold plates closest to the pump (lowest pressure drop path) receive more flow than the cold plates at the end of the line. In a rack with 8 GPU nodes, the difference in flow between the first and last cold plate can be 15 to 25% without correction.

Operating parameters: Manifold supply pressure is typically 40 to 50 PSI for single-phase and 130 to 150 PSI for two-phase. Total rack flow rates range from 10 to 40 LPM for single-phase (depending on the number of nodes and TDP per chip) and 3 to 15 LPM for two-phase. The manifold itself is usually stainless steel or high-grade polymer tubing rated for the operating pressure and fluid chemistry. Copper is avoided in manifolds carrying water-glycol because of galvanic corrosion risk when copper manifold components connect to aluminum cold plates. The material pairing matters.

Stage 3: The Coolant Distribution Unit (CDU)

The CDU is the heat exchanger that sits between the rack-level cooling loop and the facility-level heat rejection system. It receives heated coolant from the rack manifold return, transfers that heat to a secondary fluid loop (typically facility water), and sends cooled coolant back to the rack supply manifold.

In a single-phase system, the CDU is a liquid-to-liquid heat exchanger with a pump. Heated water-glycol from the rack loop passes through one side. Facility chilled water or condenser water passes through the other side. Heat transfers across the exchanger surfaces. The CDU pump drives the rack-side loop. Some CDUs include a secondary pump for the facility side. Most rely on the facility water loop's own pumps.

In a two-phase system, the CDU is a condenser. Two-phase mixture (liquid and vapor) returns from the rack. The vapor passes through a condenser where it rejects its latent heat to the facility water loop and returns to liquid form. A receiver tank collects the condensed liquid. A pump or gravity feed returns subcooled liquid to the rack supply manifold. The condenser design is critical because the rate at which it can process incoming vapor determines the maximum thermal load the system can handle. If the condenser backs up, vapor pressure in the return manifold increases and flow to the cold plates drops.

Operating parameters: CDU capacity is rated in kilowatts. A rack-level CDU for a single-phase system typically handles 50 to 150 kW. For two-phase, 80 to 200 kW. Row-level CDUs can handle 200 to 500 kW or more. The facility water temperature entering the CDU matters enormously. At 15 degrees Celsius supply water, a CDU has maximum heat rejection capacity. At 30 degrees Celsius (warm water), the delta-T across the exchanger shrinks and capacity drops. NVIDIA has been pushing the industry toward 45 degree Celsius supply water, which eliminates the need for chillers entirely but requires CDU and cold plate designs optimized for that higher inlet temperature.

Stage 4: Heat Rejection

Heat exits the building through one of three paths.

Chilled water plant: Traditional approach. Mechanical chillers cool facility water to 7 to 15 degrees Celsius. That chilled water feeds the CDUs. The chillers reject heat to the outside air via cooling towers (evaporative, consuming water) or dry coolers (air-cooled, consuming no water but requiring more energy and physical space). This is the architecture that most existing data centers use. It works, but the chiller plant is a massive energy consumer and the cooling towers can use millions of gallons of water per year.

Warm water direct rejection: If the cooling loop operates at high enough temperatures (supply water above 35 to 45 degrees Celsius), the CDU's facility-side water can be warm enough to reject its heat directly to the outside air through dry coolers, without a chiller in between. No mechanical refrigeration. No evaporative water loss. The energy savings are substantial: eliminating the chiller can cut cooling energy consumption by 30 to 50%. The constraint is climate. In hot, humid climates where ambient air temperature exceeds 40 degrees Celsius for extended periods, the delta-T between the warm water and the ambient air becomes too small for dry coolers to work. Supplemental cooling is needed for those conditions.

Heat reuse: The heated facility water, instead of rejecting its thermal load to the atmosphere, feeds a district heating system, an industrial process, or an agricultural operation. Germany's Energy Efficiency Act mandates heat reuse for new data centers above 500 kW. The economics work when there is a consistent heat consumer within reasonable piping distance. The engineering works when the cooling loop operates at temperatures high enough to be useful as a heat source, typically above 50 degrees Celsius on the return side.

The Full Loop in Numbers

For a single-phase DTC system cooling a rack of 8 GPUs at 500W each (4 kW total GPU thermal load, roughly 8 to 10 kW total rack thermal load including memory, networking, and power delivery):

Coolant: propylene glycol/water 30/70. Total rack flow: 12 to 16 LPM. Supply pressure at manifold: 45 PSI. Pressure drop per cold plate: 5 PSI. Supply temperature: 35 degrees Celsius. Return temperature: 45 to 48 degrees Celsius. CDU capacity: 15 kW. Facility water supply: 20 degrees Celsius. Pump power: 50 to 100W.

For a two-phase DTC system cooling the same rack:

Coolant: HFO-based refrigerant. Total rack flow: 4 to 8 LPM. Supply pressure at manifold: 145 PSI. Pressure drop per cold plate: 10 to 20 PSI (variable with vapor generation). Supply temperature: 25 degrees Celsius (subcooled liquid). Return: two-phase mixture at 50 to 70% vapor quality. CDU/condenser capacity: 15 kW. Facility water supply: 20 degrees Celsius. Pump power: 30 to 60W (lower flow rate compensates for higher pressure).

The two-phase system uses less fluid, less flow, and less pump energy for the same thermal load. It also operates at three times the pressure, requires vapor-rated connectors, needs a condenser instead of a simple heat exchanger, and introduces two-phase flow dynamics that make the system harder to predict, harder to balance, and harder to scale.

Both systems work. The choice between them is not a thermal performance question. It is an operations question, a supply chain question, and increasingly, a regulatory question. The engineering is solvable either way. The decision is about which set of tradeoffs the operator is equipped to manage for the next 15 years.