Llamafactory Distributed Training

Wed, 21 Jan 2026 23:06:53 +0800

Content

Conduct SFT training using the llamafactory framework on L20*8 servers with Ubuntu 22.04. Utilize both single-node multi-GPU and multi-node multi-GPU modes. Selected base model: Qwen3-32B.

Environment Configuration

Clone the code repository, set up a new conda environment, and install dependencies.

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
conda activate llamafactory_env
pip install -e .
pip install -r requirements/metrics.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Prepare SFT data, place it in the data folder, and register it in dataset_info.json.

cd ./data
Open dataset_info.json
and add the dataset, for example:

"my_example": {
    "file_name": "my_example.json"
  },

# Use Alpaca or ShareGPT format for SFT data. Alpaca format example is used here.

# Alpaca format: (where `instruction` and `input` are automatically concatenated with `\n`)
[{
"instruction": "Human instruction (required)",
"input": "Human input (optional)",
"output": "Model response (required)",
"system": "System prompt (optional)",
"history":
[
["First round instruction (optional)", "First round response (optional)"],
["Second round instruction (optional)", "Second round response (optional)"]
]
}]

Single-Node Multi-GPU Training

# Prepare a yaml file by referring to existing templates in `./examples` and run it.

# If using deepspeed for multi-GPU training, specify the number of GPUs via CUDA_VISIBLE_DEVICES.

CUDA_VISIBLE_DEVICES=0,1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_30b_lora_sft.yaml

# After training, the parameters can be found in the path specified by `output_dir` in the yaml file.

After training, you can invoke the LoRA adapter using vLLM. Here is a docker compose template.

Finetuning on 🌲Treetopia🌲

Llamafactory Distributed Training

Content

Environment Configuration

Single-Node Multi-GPU Training