Hacking a Server-Grade NVIDIA GPU Into a Home Desktop
Oscar Molnar scored an old data center NVIDIA V100 to crush AI workloads on a budget — but it took some work to install it in a desktop.
What's the deal with computer hardware lately? Memory and storage prices have gone stratospheric. A top-of-the-line Raspberry Pi 5 now costs more than $300. Don’t even ask about a GPU — if you have to ask, you probably can’t afford one. With the days of $35 single-board computers and dirt-cheap RAM behind us, we have to start getting creative again, like back when turning Wi-Fi routers into computers was a big thing.
That’s what Oscar Molnar is doing, and it got him a nice deal on a GPU. For about $250, he got an NVIDIA Tesla V100 SXM2 16GB card. The trick is that this GPU was plucked from a data center. While it still has a lot of life left in it, it is obsolete for commercial applications. But repurposing a server-grade GPU for home use isn’t easy. There is no PCIe connector, and the power connector isn’t what you’d expect either.
The Tesla V100 SXM2 was originally designed for NVIDIA DGX servers and hyperscale computing systems. Unlike a normal graphics card, it plugs into a proprietary socket and relies on specialized server hardware for power, cooling, and communication. Fortunately for Molnar, third-party SXM2-to-PCIe adapter boards are available. By combining one of these adapters with the secondhand V100, he was able to install the card alongside his existing RTX 4080.
The V100 may date back to 2017, but it still packs 16GB of HBM2 memory and an impressive 900GB/s of memory bandwidth. That actually exceeds the bandwidth available on a more recent RTX 4080. For AI inference workloads, where moving model weights through memory is often the limiting factor, bandwidth matters far more than many people realize.
However, there was still some work to be done to make this GPU suitable for home use. The adapter’s cooling fan was about as loud as a vacuum cleaner — Molnar measured the stock setup at 82 dB. After some experimentation, he discovered that the fan used standard PWM control signals despite its unusual connector. A handful of jumper wires and a custom cable allowed the fan to be connected directly to a motherboard header, reducing the noise dramatically while keeping temperatures below 50°C under load.
With both GPUs installed, Molnar’s system now has 32GB of combined VRAM. Using tensor splitting in llama.cpp, large language models can be distributed across both devices. In one test, a quantized 27-billion-parameter Qwen3.6 model with a 128,000-token context window achieved 32 tokens per second during inference.
The software setup required some hacking, particularly because newer NVIDIA drivers have dropped support for the Volta architecture used by the V100. By pinning specific driver, kernel, and CUDA versions under NixOS, Molnar was able to get both the RTX 4080 and V100 working together reliably.
This kind of project certainly isn’t for everyone, but it is a good reminder that if you are willing to get creative, you can still get plenty of computing power for a reasonable price.