Benchmarking LLM Inference on NVIDIA B200, H200, H100, and RTX PRO 6000

CloudRift’s Post

CloudRift

1,451 followers

3mo

B200 vs H200 vs H100 vs RTX PRO 6000. Who is the king of cost-efficient inference?

Dmitry Trifonov

3mo

Not long ago NVIDIA’s Blackwell architecture has landed in datacenters with the B200 and RTX PRO 6000, promising major improvements in both performance and efficiency over the previous Hopper generation. But how do these gains translate to real-world LLM inference? I present an LLM inference throughput benchmark for RTX PRO 6000 SE vs H100 vs H200 vs B200, based on the vllm serve and vllm bench serve benchmarking tools. Pro 6000 is significantly cheaper as it has the latest Blackwell architecture, but it has slower GDDR (vs HBM) memory and lacks NVLink. B200 has much better specs than all others, but does the premium price tag make sense over more affordable alternatives? https://lnkd.in/guABDz26

Benchmarking LLM Inference on NVIDIA B200, H200, H100, and RTX PRO 6000 medium.com

To view or add a comment, sign in

CloudRift’s Post

Explore content categories