The Text-to-Text Unification
The T5 paper introduced a simple but powerful idea: frame every NLP task as converting input text to output text. Translation, summarization, classification, and question answering all become the same type of problem:
- Classification:
"classify sentiment: This product is terrible."→"negative" - Translation:
"translate English to French: The weather is nice."→"Le temps est beau." - Summarization:
"summarize: [long article...]"→"[short summary]"
This unified interface enables multitask learning — T5 can be trained on all tasks simultaneously — and makes fine-tuning for new tasks straightforward.
T5 vs BERT for Generation Tasks
BERT is an encoder-only model; it cannot generate text. For any task requiring generated output (summarization, translation, question generation, text completion), T5 or another encoder-decoder model is the correct baseline — not BERT or RoBERTa.
Use BERT/RoBERTa for: classification, named entity recognition, token classification. Use T5 for: summarization, translation, question answering with long answers, any generative task.
Flan-T5: Instruction Tuning at Scale
Flan-T5 is T5 instruction-tuned on 1800+ task datasets from FLAN (Fine-tuned Language Net). This dramatically improves zero-shot and few-shot performance on unseen tasks. Flan-T5-XL (3B) outperforms GPT-3 (175B) on several benchmarks — instruction tuning compensates for scale difference.
Fine-Tuning Flan-T5 With LoRA
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, Seq2SeqTrainer, Seq2SeqTrainingArguments
from peft import get_peft_model, LoraConfig, TaskType
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
lora_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM,
r=16,
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # ~0.6% of parameters
# Custom dataset: input/output text pairs
def preprocess(examples):
inputs = tokenizer(examples["input"], max_length=512, truncation=True, padding=True)
outputs = tokenizer(examples["output"], max_length=128, truncation=True, padding=True)
inputs["labels"] = outputs["input_ids"]
return inputs
Inference With Beam Search
from transformers import pipeline
summarizer = pipeline(
"summarization",
model="google/flan-t5-base",
max_new_tokens=150,
num_beams=4,
early_stopping=True,
)
result = summarizer("summarize: " + long_text)
print(result[0]["summary_text"])
Flan-T5-XXL vs GPT-3.5 Cost Comparison
For high-volume summarization (10M requests/month):
| Option | Monthly cost at 10M requests | |---|---| | GPT-3.5-turbo API | ~$5,000 | | Flan-T5-XXL (2x A10G) | ~$800 | | Flan-T5-XL (1x A10G) | ~$400 |
Flan-T5-XL (3B) achieves 85-90% of GPT-3.5-turbo quality on structured summarization tasks at 8-12% of the API cost.