NVIDIA Launches Chat with RTX, a Free, Personalizable Large Language Model Chatbot for GeForce GPUs

Pulling in context from local files and YouTube videos, Chat with RTX serves as a demo of NVIDIA's Tensor-RT LLM RAG project.

Gareth Halfacree
3 months ago β€’ Machine Learning & AI

NVIDIA has released a free tech demo, Chat with RTX, which offers the ability to run a customized generative artificial intelligence (gen AI) chatbot on their local machine β€” providing they've got an NVIDIA GeForce RTX 30-series GPU or higher with at least 8GB of video RAM (VRAM), anyway.

"Chat with RTX uses retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software, and NVIDIA RTX acceleration to bring generative AI capabilities to local, GeForce-powered Windows PCs," NVIDIA's Jesse Clayton explains. "Users can quickly, easily connect local files on a PC as a dataset to an open source large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers."

NVIDIA wants to put a large language model on your GPU with Chat with RTX, a free tech demo for Windows machines. (πŸ“Ή: NVIDIA)

It's the customization aspect that NVIDIA hopes will make Chat with RTX stand out from the software-as-a-service offerings flooding the market: the chatbot can be linked to stores of local files β€” from plain text to Microsoft Word documents and PDF files β€” as well as YouTube videos and playlists in order to provide data and context missing from its training, enhancing its ability to formulate useful responses.

"Since Chat with RTX runs locally on Windows RTX PCs and workstations, the provided results are fast β€” and the user’s data stays on the device," Clayton adds. "Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection."

While Chat with RTX is described by the company as a "tech demo" β€” compatible with Windows 10 or higher, NVIDIA GeForce RTX 30-series GPUs with 8GB of VRAM or higher, and the company's latest graphics card drivers β€” NVIDIA is hoping it will lead to more.

"Chat with RTX shows the potential of accelerating LLMs with RTX GPUs," Clayton says. "The app is built from the TensorRT-LLM RAG developer reference project, available on GitHub. Developers can use the reference project to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM."

Chat with RTX is now available to download on the NVIDIA website.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles