CloudRift’s Post

1,451 followers

1w Edited

V100 32GB VMs are now on CloudRift at $0.29 per GPU/hour, in partnership with Cato Digital. A few things worth knowing: The V100 is older silicon (Volta, launched 2017) but still capable. A single 32GB V100 fits a LoRA fine-tune of Llama 3 8B, Whisper Large inference, or a batch embedding pipeline. For workloads that do not require Hopper-class hardware, it is more than enough. The same GPU on AWS or Azure runs above $3 per GPU/hour, and the 32GB variant is usually only sold in 8-GPU bundles. CloudRift offers it as a single-GPU VM, billed by the second, with no minimums. The capacity comes from Cato Digital. Rather than commissioning new servers, they redeploy enterprise GPU systems retired from Meta and NVIDIA fleets, which gives the hardware a longer operational life and a smaller carbon footprint than buying new. If you are running fine-tuning, batch inference, rendering, or scientific compute on a budget, this is roughly a tenfold reduction in compute cost for workloads where it fits. Thanks to the team of Cato Digital and Colin Murcray for the partnership! → https://lnkd.in/gPChGrQM #GPU #AI #SustainableAI #MLOps

1 Comment

Colin Murcray 1w

Very excited about this partnership with CloudRift! Take our GPUs for a test drive on their platform. You can’t beat this price and performance.

To view or add a comment, sign in

More Relevant Posts

Colin Murcray
1w
Report this post
Very excited about this partnership with CloudRift! Take our GPUs for a test drive on their platform. You can’t beat this price and performance.
CloudRift

1,451 followers
2w Edited

V100 32GB VMs are now on CloudRift at $0.29 per GPU/hour, in partnership with Cato Digital. A few things worth knowing: The V100 is older silicon (Volta, launched 2017) but still capable. A single 32GB V100 fits a LoRA fine-tune of Llama 3 8B, Whisper Large inference, or a batch embedding pipeline. For workloads that do not require Hopper-class hardware, it is more than enough. The same GPU on AWS or Azure runs above $3 per GPU/hour, and the 32GB variant is usually only sold in 8-GPU bundles. CloudRift offers it as a single-GPU VM, billed by the second, with no minimums. The capacity comes from Cato Digital. Rather than commissioning new servers, they redeploy enterprise GPU systems retired from Meta and NVIDIA fleets, which gives the hardware a longer operational life and a smaller carbon footprint than buying new. If you are running fine-tuning, batch inference, rendering, or scientific compute on a budget, this is roughly a tenfold reduction in compute cost for workloads where it fits. Thanks to the team of Cato Digital and Colin Murcray for the partnership! → https://lnkd.in/gPChGrQM #GPU #AI #SustainableAI #MLOps
Like Comment
To view or add a comment, sign in
Dmitry Trifonov
1w
Report this post
An interesting idea from Cato Digital: upcycle V100 GPUs into very capable 16x-GPU NV-linked 512GB VRAM 1.3TB RAM rigs. Excited to check out what these machines can do.
CloudRift

1,451 followers
2w Edited

V100 32GB VMs are now on CloudRift at $0.29 per GPU/hour, in partnership with Cato Digital. A few things worth knowing: The V100 is older silicon (Volta, launched 2017) but still capable. A single 32GB V100 fits a LoRA fine-tune of Llama 3 8B, Whisper Large inference, or a batch embedding pipeline. For workloads that do not require Hopper-class hardware, it is more than enough. The same GPU on AWS or Azure runs above $3 per GPU/hour, and the 32GB variant is usually only sold in 8-GPU bundles. CloudRift offers it as a single-GPU VM, billed by the second, with no minimums. The capacity comes from Cato Digital. Rather than commissioning new servers, they redeploy enterprise GPU systems retired from Meta and NVIDIA fleets, which gives the hardware a longer operational life and a smaller carbon footprint than buying new. If you are running fine-tuning, batch inference, rendering, or scientific compute on a budget, this is roughly a tenfold reduction in compute cost for workloads where it fits. Thanks to the team of Cato Digital and Colin Murcray for the partnership! → https://lnkd.in/gPChGrQM #GPU #AI #SustainableAI #MLOps
2 Comments
Like Comment
To view or add a comment, sign in
Heiko Polinski
1w
Report this post
If you're a founder, ML engineer, or researcher running compute on a budget: We just put V100 32GB cards on CloudRift at $0.29/hr, about 10x cheaper than AWS. The hardware is refurbished from retired Meta and NVIDIA fleets, so the carbon footprint is smaller too. ⬇️ ⬇️ ⬇️
CloudRift

1,451 followers
2w Edited

V100 32GB VMs are now on CloudRift at $0.29 per GPU/hour, in partnership with Cato Digital. A few things worth knowing: The V100 is older silicon (Volta, launched 2017) but still capable. A single 32GB V100 fits a LoRA fine-tune of Llama 3 8B, Whisper Large inference, or a batch embedding pipeline. For workloads that do not require Hopper-class hardware, it is more than enough. The same GPU on AWS or Azure runs above $3 per GPU/hour, and the 32GB variant is usually only sold in 8-GPU bundles. CloudRift offers it as a single-GPU VM, billed by the second, with no minimums. The capacity comes from Cato Digital. Rather than commissioning new servers, they redeploy enterprise GPU systems retired from Meta and NVIDIA fleets, which gives the hardware a longer operational life and a smaller carbon footprint than buying new. If you are running fine-tuning, batch inference, rendering, or scientific compute on a budget, this is roughly a tenfold reduction in compute cost for workloads where it fits. Thanks to the team of Cato Digital and Colin Murcray for the partnership! → https://lnkd.in/gPChGrQM #GPU #AI #SustainableAI #MLOps
Like Comment
To view or add a comment, sign in
Claudia P.
1mo
Report this post
AI spending on infrastructure will hit $1.37 trillion in 2026. That's 54% of total AI investment. The numbers tell a clear story: Infrastructure is where AI lives or dies. NetApp just earned a spot on CRN's AI 100 list for Infrastructure and Edge Computing—and for good reason. While others focus on compute, we're solving the data problem. We've partnered with NVIDIA to deliver DGX SuperPOD-certified systems. Translation: Your GPUs get fed the data they need, when they need it. The AI race isn't won by the fastest chip. It's won by the smartest data strategy.

The 25 Hottest Infrastructure And Edge Computing Companies: The 2026 CRN AI 100 crn.com
Like Comment
To view or add a comment, sign in
Stephanie Hansen-Oldenberg
4w
Report this post
AI spending on infrastructure will hit $1.37 trillion in 2026. That's 54% of total AI investment. The numbers tell a clear story: Infrastructure is where AI lives or dies. NetApp just earned a spot on CRN's AI 100 list for Infrastructure and Edge Computing—and for good reason. While others focus on compute, we're solving the data problem. We've partnered with NVIDIA to deliver DGX SuperPOD-certified systems. Translation: Your GPUs get fed the data they need, when they need it. The AI race isn't won by the fastest chip. It's won by the smartest data strategy.

The 25 Hottest Infrastructure And Edge Computing Companies: The 2026 CRN AI 100 crn.com
Like Comment
To view or add a comment, sign in
Andre Serafim
3w
Report this post
🚀 Training Larger LLMs on TPUs More Efficiently and at Lower Cost As Large Language Models continue to scale to hundreds of billions of parameters, accelerator memory has become one of the main bottlenecks driving cost and performance challenges during training. A new post on the Google Open Source Blog, in collaboration with Intel, shows how this challenge can be addressed using host offloading with JAX, leveraging the large memory capacity and compute power of Intel® Xeon® processors (5th and 6th Gen with AMX) alongside TPUs on Google Cloud. 💡 Key takeaways: + Selective offloading of activations (e.g., Q/K/V projections) from TPUs to host memory significantly reduces accelerator memory pressure + Avoids excessive recomputation (rematerialization), improving overall throughput + Practical results show up to ~10% reduction in training time, directly translating into fewer TPU core-hours and lower TCO + The approach is especially effective for large models such as PaliGemma2 28B and Llama2-13B, using JAX and MaxText + This type of hybrid CPU + accelerator architecture reinforces how software and infrastructure decisions can have a direct impact on cost, scale, and sustainability for generative AI workloads. 🔍 A must-read for anyone working with LLM training, TPUs, JAX, FinOps, or AI cost optimization: 👉 https://lnkd.in/dcFti8Se #AI #LLM #TPU #JAX #IntelXeon #GoogleCloud #CloudComputing #FinOps #OpenSource #AIInfrastructure
Like Comment
To view or add a comment, sign in
Kinjal Vaishnav
1w
Report this post
Most people think the bottleneck in AI training is compute. It's not. It's data movement. When you're training a model across 1,000 GPUs, every single training step requires syncing gradients between all of them. With normal networking, that means: → CPU gets interrupted → Data copied into kernel buffer → Copied again to the NIC → Sent over the network → Remote CPU wakes up → More copies on the other side That's 4 copies and 2 CPU wakeups — for every sync — across thousands of GPUs — thousands of times per second. This is why RDMA exists. --- RDMA (Remote Direct Memory Access) lets one machine read or write directly into another machine's RAM — with zero CPU involvement, zero OS overhead, zero data copies. The data moves at hardware speed, straight between memory chips, over the network. Latency drops from ~100 microseconds (TCP) to ~1 microsecond (RDMA). CPU cycles used: 0. --- In AI infrastructure, this shows up everywhere: 🔷 NCCL (NVIDIA's GPU communication library) uses RDMA for all-reduce 🔷 GPUDirect RDMA — data goes Network → NIC → GPU memory, skipping system RAM entirely 🔷 AWS EFA, Azure RDMA, GCP GPUDirect — all cloud providers expose this to serious AI workloads 🔷 Every major LLM (GPT-4, Llama, Gemini) was trained on clusters where RDMA is the fabric --- And it connects to SmartNICs too. Modern SmartNICs (NVIDIA Bluefield, AMD Pensando) combine RDMA offload + programmable compute. They're not just fast pipes. They're intelligent fabric nodes. --- If you work in infra, ML systems, HPC, or cloud networking — RDMA isn't optional knowledge anymore. It's the invisible layer that makes large-scale AI possible. 🖼️ Built a full visual explainer — Normal TCP vs RDMA, how it works, protocols (InfiniBand / RoCE / iWARP), GPU connection, and who uses it. #RDMA #InfiniBand #RoCE #GPUDirect #AIInfrastructure #Networking #HPC #MLSystems #CloudInfrastructure #SmartNIC
1 Comment
Like Comment
To view or add a comment, sign in
Mark Hirsch
1w
Report this post
𝗜𝗻𝘃𝗲𝗻𝘁𝗲𝗰 𝗷𝘂𝘀𝘁 𝗰𝗮𝗹𝗹𝗲𝗱 𝘁𝗵𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝗹-𝗽𝘂𝗿𝗽𝗼𝘀𝗲 𝘀𝗲𝗿𝘃𝗲𝗿 𝗺𝗮𝗿𝗸𝗲𝘁 𝗯𝗮𝗰𝗸 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗱𝗲𝗮𝗱. The Taiwanese ODM is forecasting AI server growth this year and general-purpose server orders extending through 2028, per DigiTimes. That second part is the real story. General-purpose server demand has been flat for two years while hyperscalers poured everything into AI infrastructure. Enterprises ran seven-year-old servers instead of refreshing. The market wrote off CPU-based capacity as yesterday's problem. Inventec sees it coming back. And not just as a blip — orders running to 2028. The timing matters. This is not one segment cooling off while another heats up. Inventec is saying both categories will pull on the supply chain simultaneously. AI servers for GPU clusters. General-purpose servers for the CPU infrastructure that agentic AI workloads actually need to function at scale. DigiTimes Research reported last week that cloud providers are adjusting the ratio between GPUs and CPUs because agentic tools rely more heavily on traditional compute than anyone expected. That lines up with what Inventec is forecasting. If both segments are competing for the same fab capacity, HBM supply, power delivery components, and cooling systems, every bottleneck that exists today gets worse. TSMC does not have idle capacity sitting around. Neither does the rest of the supply chain. The dual-demand scenario also kills the narrative that enterprises waiting for cheaper servers will get relief. If Inventec is right, lead times stay long and pricing stays firm through 2028. Inventec's customers include major US & China CSPs tier-two cloud providers. Their forecast is not speculative. They are an ODM — they build what customers commit to buy. Worth watching whether Wistron, Quanta, and Foxconn echo this view over the next quarter. If they do, the supply chain just got a lot tighter than the market thinks. Click here to subscribe to The Hirsch Report: https://lnkd.in/gyzanhxn Source Article: https://lnkd.in/g_7Yv4pF #AI #Servers #SupplyChain #Inventec #DataCenter #CloudComputing #Infrastructure #HirschReport #DeepFactChecked
Like Comment
To view or add a comment, sign in
Smart Systems, Inc.

270 followers
3w
Report this post
Hidden Bottlenecks in AI Infrastructure: Why GPU as a Service Solves More Than Just Compute https://bit.ly/4ucnoMb #GPUaaS #AIInfrastructure #HiddenBottlenecks #CloudGPU #ScalableAI #AIOptimization #HighPerformanceComputing #MLOps #AIDeployment #ComputePower #Smartystems

Hidden Bottlenecks in AI Infrastructure: Why GPU as a Service Solves More Than Just Compute https://www.smartsystems.ai
Like Comment
To view or add a comment, sign in
Ludovic Rota
4w
Report this post
"According to data collected from 23,000 clusters across thousands of companies using Cast AI's AI agent, average GPU utilization across enterprise servers is at 5%, meaning roughly 95% of provisioned GPU capacity is not being used. CPU utilization, the report says, is similarly low, at 8% of total capacity." #technology #ai https://lnkd.in/eiM9Z_sj

Companies are hoarding AI compute because of FOMO — and they're sitting on most of it businessinsider.com
Like Comment
To view or add a comment, sign in

1,451 followers

View Profile Connect

CloudRift’s Post

More Relevant Posts

Explore content categories