A Research Lab Joins the Open Model Race
The Technology Innovation Institute (TII) in Abu Dhabi entered the open LLM space with Falcon in 2023 and immediately claimed the top leaderboard position. Their approach centered on one insight: if your pretraining data is high enough quality, you need fewer parameters to reach a given performance level.
RefinedWeb: The Data Advantage
RefinedWeb is a 5-trillion-token dataset built from Common Crawl with aggressive filtering:
- URL filtering — remove adult content, spam, and known low-quality domains
- Trafilatura extraction — extract main body text (removes boilerplate, navigation, ads)
- Language identification — keep only high-confidence English content
- Deduplication — exact URL dedup, near-dedup using MinHash LSH at paragraph level
- Heuristic quality filtering — length, punctuation density, symbol ratios
The result is a corpus where each token is substantially more information-dense than raw Common Crawl. Falcon 7B trained on 1.5T tokens of RefinedWeb — and the data quality showed in benchmarks.
Multi-Query Attention for Faster Inference
Falcon uses multi-query attention (MQA): all attention heads share a single key-value head rather than having dedicated KV heads per query head. This dramatically reduces the KV cache memory footprint during inference. For a 7B model serving many concurrent requests, MQA can cut memory requirements by 6-8x compared to standard multi-head attention, enabling higher batch sizes on the same GPU.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
prompt = """User: What are the main advantages of transformer architecture over RNNs?
Assistant:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=300,
do_sample=True,
temperature=0.7,
eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Falcon 40B and the Apache 2.0 License
TII released both 7B and 40B variants under Apache 2.0, matching MPT-7B's commercial-friendly precedent. Falcon 40B held the open leaderboard top position for several months before Llama 2 was released.
The Falcon-Instruct Fine-Tunes
Falcon-7B-Instruct was fine-tuned on a combination of Baize, GPT4All, and GPTeacher datasets for conversational use. It is significantly more practical than the base model for direct deployment but shows the limitations of instruction data quality available in early 2023.
Evolution of the Open Model Landscape
Falcon's appearance marked a turning point: a well-funded research lab outside the US (and outside the Big Tech AI labs) had produced a competitive open model. This validated that open model training was not exclusively a Meta/Google activity and accelerated investment in open model research globally.
Today Falcon has been superseded by Mistral, Llama 3, and Qwen families, but the RefinedWeb dataset methodology continues to influence how teams think about pretraining data quality.