Gemma 3 Supercharges AI on the Edge

Google's Gemma 3 packs powerful AI into small models that run on phones and laptops with a huge context window and top-tier performance.

Nick Bild
11 days agoMachine Learning & AI
The Gemma 3 family of large language models has just been released (📷: Google)

Google’s foray into the latest generation of generative artificial intelligence (AI) tools has been a bit rocky, to say the least. They were slow to release their first chatbot, Bard, and when they finally did it failed to impress most users. Between that and a lot of controversy surrounding the early model’s heavy-handed guardrails, and accusations of intentional bias potentially being baked in, Google fell behind the pack.

To some extent, they are still playing catch-up now as a result of these early missteps. However, things are looking far better these days. The more recent Gemini models are much more capable than their predecessors, and with last year’s release of the Gemma family of models, these powerful language models were freed from the confines of massive compute clusters, allowing them to run on even a typical laptop computer.

AI on the edge is where it’s at these days, so it should not come as a huge surprise that Google has released an updated family of their smallest models, called Gemma 3. These open-ish models were developed using the same research and technology that went into building the flagship Gemini 2 models. But the latest Gemma models differ in that they were built for speed. As such, they can run directly on devices such as phones.

Gemma 3 comes in four sizes — 1 billion, 4 billion, 12 billion, and 27 billion parameters — allowing developers to choose the best fit for their specific hardware and performance needs. Unlike larger models that require massive computing power, Gemma 3 is optimized to run efficiently on a single GPU or TPU, making it more accessible for independent developers and startups. However, Google’s definition of “open” does not necessarily match what would be expected of truly open-source software, so check the license closely before you decide to use a Gemma model commercially.

According to Google, Gemma 3 outperformed larger models like Llama 3-405B, DeepSeek-V3, and OpenAI's o3-mini in their testing. It also offers significant improvements in global language support, with pretrained support for over 140 languages.

One of the most notable features of Gemma 3 is its enhanced context window, allowing it to process up to 128,000 tokens at a time. This makes it well-suited for handling complex tasks, including long-form text generation, document analysis, and code completion. Furthermore, the new models introduce function calling capabilities, enabling developers to create AI-driven automation workflows more effectively.

To encourage widespread adoption, Google has made it easy for developers to integrate Gemma 3 into their existing workflows. The models are compatible with popular AI frameworks, including Hugging Face Transformers, PyTorch, JAX, Keras, and Google AI Edge. NVIDIA has also made it easy to start experimenting with Gemma 3 by adding the models to their API Catalog. Additionally, developers can fine-tune Gemma 3 on platforms like Google Colab, Vertex AI, or even consumer-grade gaming GPUs.

With the release of Gemma 3, Google has better positioned themselves to push forward in the competitive landscape of generative AI. By focusing on smaller, more efficient models that can run on everyday hardware, they aim to make AI more accessible to developers and researchers worldwide.

Nick Bild
R&D, creativity, and building the next big thing you never knew you wanted are my specialties.
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles