AI Factory - The AI Stack Tracker

AI Factory Analysis

AI factories are massive data centers purpose-built for training frontier AI models. US hyperscalers have deployed the largest clusters, though China is rapidly building capacity.

Key Metrics

Installed training capacity

Capacity = (Accelerators installed) × (Average utilization) × (Time online)

Cluster power density

Power Density (kW/rack) drives cooling architecture, floor design, and failure rates

What matters in this layer

The limiting factors are often mundane: permitting, transformers, chilled water, and network lead times. Operators who can compress these timelines turn capital into compute faster.

Build velocity

Time from site selection to first training run determines advantage. Standardized designs, supply contracts, and execution discipline compound over repeated builds.

Operations and reliability

Achieving high utilization requires strong SRE practices, failure recovery, and workload scheduling. Reliability is a strategic capability, not a back‑office concern.

xAI's Colossus Cluster Goes Live

Elon Musk's xAI has brought online the Colossus cluster with 100,000+ H100 GPUs, making it one of the largest AI training installations in the world.

1 week ago Infrastructure

Meta Plans 2GW Data Center Campus

Meta has announced plans for a 2 gigawatt AI data center campus, representing one of the largest single-site compute deployments ever planned.

2 weeks ago Expansion

China Builds National AI Compute Network

China is constructing a national network of AI compute centers, pooling resources across state-backed entities to maximize utilization of domestic chips.

3 weeks ago Strategy

Key Metrics

What matters in this layer

xAI's Colossus Cluster Goes Live

Meta Plans 2GW Data Center Campus

China Builds National AI Compute Network

Edit