The Gemma 3 Family
Google released Gemma 3 in March 2025 with four sizes: 1B, 4B, 12B, and 27B. All four support multimodal input (images + text), 128k context, and 140+ languages. The 27B flagship is the focus here — it is the model that attracted attention for outperforming much larger models.
Benchmark Results
| Model | MMLU | Parameters | Context | |---|---|---|---| | Gemma 3 27B | 67.5% | 27B | 128k | | Llama 3.3 70B | 65.4% | 70B | 128k | | Gemma 3 12B | 62.4% | 12B | 128k | | Qwen 2.5 32B | 71.1% | 32B | 128k |
The 27B vs 70B comparison is the headline: Gemma 3 27B (67.5%) outperforms Llama 3.3 70B (65.4%) on MMLU at less than half the parameter count. On MATH (competition mathematics), Gemma 3 27B scores 89.0% — above Llama 3.3 70B's 77.0%.
The Qwen 2.5 32B (71.1%) still leads Gemma 3 27B on MMLU, so Qwen remains the top performer in the 27–32B range for pure language tasks. Gemma 3 27B's advantage is multimodal support combined with competitive language scores.
Multimodal Image Understanding
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
import torch
from PIL import Image
model_id = "google/gemma-3-27b-it"
processor = AutoProcessor.from_pretrained(model_id)
model = Gemma3ForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
image = Image.open("product_photo.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Describe all the product details visible in this image, including brand, model, color, and condition."},
],
}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
).to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(output[0], skip_special_tokens=True))
140+ Language Support
Gemma 3 was trained on a significantly more multilingual dataset than Gemma 2. While Llama 3.3 70B is primarily English with good multilingual coverage, Gemma 3 was explicitly designed for breadth — 140+ languages with quality emphasis on under-resourced languages.
This makes Gemma 3 relevant for deployments targeting South Asian, African, and Southeast Asian language markets where most frontier models have limited training data.
Free GPU Access via Kaggle
Kaggle provides free access to Nvidia T4 (15GB) and P100 (16GB) GPUs. For Gemma 3 27B, you need the quantized version (Q4_K_M at ~16GB), which fits on Kaggle's free tier:
# Kaggle notebook — free T4 GPU
!pip install -q bitsandbytes transformers accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3-27b-it",
quantization_config=quantization_config,
device_map="auto",
)
KerasHub Integration
import keras_hub
gemma = keras_hub.models.Gemma3CausalLM.from_preset("gemma3_instruct_27b")
gemma.generate("Explain gradient descent in three sentences:", max_length=200)
KerasHub handles weight loading, tokenization, and generation in a few lines. It also supports JAX backends, enabling TPU inference for high-throughput serving.