What Is Axolotl?
Axolotl is an open-source fine-tuning framework built on top of HuggingFace Transformers and PEFT that reduces the entire training setup to a single YAML configuration file. You define your model, dataset format, training method, and hardware config in YAML — Axolotl handles the rest.
This makes it particularly valuable for teams that want reproducible, version-controlled fine-tuning runs without maintaining bespoke training scripts.
Supported Dataset Formats
One of Axolotl's biggest strengths is flexible dataset ingestion. It natively supports:
- Alpaca — instruction/input/output triples
- ShareGPT — multi-turn conversations in the ChatML format
- JSONL with custom field mapping — specify source/target fields in YAML
- Raw completion — for continued pre-training on text corpora
You can mix multiple datasets in one run by listing them in the YAML config.
Sample YAML Configuration
base_model: meta-llama/Meta-Llama-3-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_4bit: true
adapter: qlora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
datasets:
- path: mosaicml/dolly_hhrlhf
type: sharegpt
conversation: chatml
sequence_len: 4096
sample_packing: true
val_set_size: 0.02
output_dir: ./outputs/llama3-qlora
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 2e-4
wandb_project: my-fine-tune
wandb_run_id: llama3-qlora-run1
Then run:
pip install axolotl
accelerate launch -m axolotl.cli.train config.yaml
Multi-GPU Training
Axolotl integrates directly with DeepSpeed and FSDP. For multi-GPU runs, add a DeepSpeed config path to your YAML:
deepspeed: deepspeed_configs/zero2.json
The zero2.json config handles gradient sharding across GPUs automatically.
Axolotl vs Unsloth
Both tools target LLM fine-tuning, but with different priorities. Unsloth is optimized for maximum single-GPU speed with hand-tuned CUDA kernels. Axolotl is optimized for flexibility: it handles more dataset formats, more training methods, and scales more naturally to multi-GPU and multi-node setups. For a quick single-GPU LoRA run, Unsloth is faster. For complex multi-dataset, multi-GPU production pipelines, Axolotl is the more flexible choice.