GPU Compute

GPU Infrastructure for Zero-Knowledge Proofs: Architecture and Operations

Emad Siddiq Mar 20, 2026 14 min read

Zero-knowledge proof generation is one of the most computationally intensive workloads in the blockchain ecosystem. ZK rollups, ZK bridges, and privacy protocols all depend on prover infrastructure that can generate proofs fast enough to meet on-chain deadlines. GPUs have become the hardware of choice for this work - but building and operating GPU clusters for ZK proving is a distinct discipline from general cloud GPU usage.

Why GPUs for ZK Proofs?

Zero-knowledge proof systems like Groth16, PLONK, and STARKs rely heavily on mathematical operations that are highly parallelizable: multi-scalar multiplication (MSM), number theoretic transforms (NTT), and polynomial evaluations. These operations map efficiently onto GPU architectures, which excel at executing thousands of parallel threads simultaneously.

A single NVIDIA H100 can generate ZK proofs 10-50x faster than the equivalent CPU computation, depending on the proof system and circuit size. For time-sensitive applications like ZK rollup sequencing, this speedup isn't just nice to have - it's operationally required.

Hardware Selection

Not all GPUs are created equal for ZK workloads. The key factors are:

VRAM Capacity

ZK circuits are memory-intensive. Large circuits (billions of constraints) require storing witness data, intermediate polynomial evaluations, and proof artifacts in GPU memory. For production ZK rollup proving, 80 GB VRAM (H100/H800) is becoming the baseline. Smaller circuits can run on consumer-grade GPUs like the RTX 4090 (24 GB) or RTX 5090 (32 GB).

Compute Capability

MSM and NTT operations benefit from high FP32 throughput and large L2 cache. NVIDIA's Ada Lovelace and Hopper architectures are well-suited. The H100's 80 GB HBM3 memory with 3.35 TB/s bandwidth is particularly effective for the memory-bound phases of proof generation.

Multi-GPU Scaling

Many proving systems can distribute work across multiple GPUs. NVLink interconnects (available on H100 NVL, H200) enable high-bandwidth GPU-to-GPU communication. For PCIe-connected GPUs, the proving software needs to account for the narrower interconnect bandwidth.

CUDA and Driver Management

ZK proving software is typically built on CUDA and requires specific driver and toolkit versions. This sounds straightforward until you're managing a fleet of GPUs across multiple machines:

Driver pinning - CUDA toolkit versions must match driver versions. Upgrading one without the other breaks everything. We pin specific NVIDIA driver versions per fleet and test upgrades in staging.
Container isolation - NVIDIA Container Toolkit enables GPU passthrough to Docker containers with controlled driver and CUDA versions. This is essential for running multiple proving workloads with different CUDA requirements on the same machine.
Error detection - GPU memory errors (ECC failures), thermal throttling, and driver crashes need automated detection and response. A GPU producing incorrect proofs is worse than a GPU producing no proofs at all - invalid proofs waste gas and can delay batches.

Fleet Orchestration

Operating a multi-node GPU cluster for ZK proving requires orchestration that goes beyond standard container orchestration:

Workload Scheduling

Different ZK networks have different proving cadences. A ZK rollup might need continuous proving with sub-minute latency. A ZK bridge might have batch proving every 10 minutes. The orchestration layer needs to schedule workloads across available GPUs, handle priority levels, and support preemption for time-sensitive proofs.

State Management

Proving workloads are stateful - they need access to circuit parameters (often gigabytes of trusted setup data), witness data from the sequencer, and sometimes intermediate state from previous proving rounds. This state needs to be available on the machine where the workload runs, which means either pre-distributing data or maintaining fast shared storage.

Instant Workload Switching

ZK prover networks often incentivize speed: the first prover to submit a valid proof earns the reward. Our orchestration system can switch GPU workloads in seconds - killing the current process, loading new circuit parameters, and beginning proof generation for a different network. This flexibility is critical when operating across multiple ZK ecosystems.

Monitoring GPU Infrastructure

GPU monitoring has unique requirements compared to traditional infrastructure monitoring:

GPU utilization and memory - Track per-GPU utilization, memory usage, and temperature. Low utilization might indicate a stuck workload; high temperature indicates cooling issues.
Proof throughput - Proofs per minute per GPU. This is the primary performance metric. Sudden drops indicate software issues, thermal throttling, or hardware degradation.
Proof validity - Sample-verify generated proofs to catch hardware errors before they impact on-chain submissions.
Power consumption - GPU workloads are power-intensive. Track power draw per node to detect anomalies and manage datacenter power budgets.
PCIe bandwidth - Monitor host-to-GPU data transfer rates. Bottlenecks here indicate storage or system bus issues.

Cost Optimization

GPU infrastructure is expensive. Optimizing cost while maintaining proving performance is an ongoing operational challenge:

Right-sizing GPUs - Not every workload needs H100s. Many circuits fit in 24 GB of VRAM and run efficiently on consumer GPUs that cost a fraction of datacenter cards.
Multi-tenancy - When one ZK network's proving demand is low, those GPUs should be serving another workload - whether that's a different ZK prover, AI inference, or model training.
Bare metal vs. cloud - For sustained GPU workloads, bare metal hosting is typically 60-70% cheaper than cloud GPU instances. The tradeoff is operational complexity.

What We've Learned Operating ZK GPU Infrastructure

After months of operating GPU clusters for multiple ZK prover networks, here are the lessons that aren't in any documentation:

Driver updates break things more often than CUDA code changes. Test every driver update on a staging GPU before rolling out.
GPU memory errors are more common than you'd expect. ECC monitoring catches them before they produce invalid proofs.
Thermal management is an operational discipline. GPUs that run at 85C consistently degrade faster and throttle unpredictably.
The difference between good and bad proving performance often comes down to data pipeline optimization - getting witness data to the GPU fast enough, not the proof computation itself.
Container orchestration for GPUs is still maturing. We've built custom tooling on top of standard orchestrators to handle GPU-specific scheduling and lifecycle management.

Need GPU infrastructure for ZK proving?

Merkle Labs operates multi-node GPU clusters with custom fleet orchestration for ZK proof generation. Consumer and datacenter GPUs available. Talk to us about your proving infrastructure.