NVIDIA Unveils the 30-PetaFLOP Rubin CPX for Million-Token-Scale Context Windows

LLMs, VLMs, and more given a big boost in context processing — when the hardware launches late next year, at least.

NVIDIA has announced a new graphics processor that, it hopes, will provide the computational power required for "massive-context processing" in artificial intelligence systems — to a claimed million-token scale.

"The Vera Rubin platform will mark another leap in the frontier of AI computing — introducing both the next-generation Rubin GPU and a new category of processors called CPX," Jensen Huang, NVIDIA founder and chief executive officer, says of the company's latest launch. "Just as RTX revolutionized graphics and physical AI, Rubin CPX is the first CUDA GPU purpose-built for massive-context AI, where models reason across millions of tokens of knowledge at once."

NVIDIA has unveiled a new class of GPU that it hopes will push LLMs, VLMs, and other models to a million-token context window scale: Rubin CPX. (📷: NVIDIA)

The large language models (LLMs) underpinning the current AI boom are statistical token manipulators: trained on vast troves of often-illegitimately-gained data, they boil everything down into "tokens" — then, when presented with an input prompt that itself has been turned into tokens, respond with the most statistically-likely tokens by way of continuation. If all has gone well, those tokens represent an answer to your query; otherwise, they represent an answer-shaped object that, the LLM being entirely incapable of anything resembling thought or reasoning regardless of marketing departments' claims otherwise, will have little or no resemblance to facts or reality.

The more tokens you can provide, the more likely the answer-shaped token stream provided will be of use — but the computational complexity increases, leaving most models limited to relatively small "context windows." That's where Rubin, named for astronomer and physicist Vera Rubin, comes in, with NVIDIA claiming it provides a way to scale LLMs and other generative AI models — including image and video generation models, which work similarly — to context windows of up to a million tokens.

The Rubin CPX, NVIDIA claims, delivers up to 30 peta floating-point operations per second (petaFLOPS) of NVFP4 precision compute, and includes 128GB of GDDR7 memory — swapping the performance of high-bandwidth memory for the ability to cram more on the board. Compared to NVIDIA's Grace-Blackwell GB300 NVL72 systems, the company says it can deliver a tripling in attention performance — a model's ability to process context sequences.

A rack filled with 144 Rubin CPX, 144 Rubin, and 36 Vera chips will deliver a claimed eight exaFLOPS of NVFP4 compute. (📷: NVIDIA)

The company isn't expecting anyone to make use of a single Rubin CPX, though: NVIDIA envisions the boards being combined with non-CPX Rubin GPUs and Vera CPUs, showing off a fully-stocked rack implementation dubbed the Vera Rubin NVL144 CPX — a combination of 144 Rubin CPX GPUs, 144 plain Rubin CPUs, and 36 Vera CPUs for a total of eight exaFLOPS of NVFP4 compute. While this is unlikely to be cheap, NVIDIA makes a bold claim of profitability: $100 million spent on its Rubin-based hardware could deliver, the company claims, "as much as" $5 billion in revenue.

More information on the Rubin CPX is available on the NVIDIA Developer Technical Blog; hardware is expected to become available at the end of next year — at an as-yet announced price point.

machine learning

artificial intelligence

computer vision

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

NVIDIA Unveils the 30-PetaFLOP Rubin CPX for Million-Token-Scale Context Windows

LLMs, VLMs, and more given a big boost in context processing — when the hardware launches late next year, at least.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles