Content Conduct SFT training using the llamafactory framework on L20*8 servers with Ubuntu 22.04. Utilize both single-node multi-GPU and multi-node multi-GPU modes. Selected base model: Qwen3-32B.
Environment Configuration Clone the code repository, set up a new conda environment, and install dependencies. git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory conda activate llamafactory_env pip install -e . pip install -r requirements/metrics.txt -i https://pypi.tuna.tsinghua.edu.cn/simple Prepare SFT data, place it in the data folder, and register it in dataset_info.json. cd ./data Open dataset_info.json and add the dataset, for example: "my_example": { "file_name": "my_example.json" }, # Use Alpaca or ShareGPT format for SFT data. Alpaca format example is used here. # Alpaca format: (where `instruction` and `input` are automatically concatenated with `\n`) [{ "instruction": "Human instruction (required)", "input": "Human input (optional)", "output": "Model response (required)", "system": "System prompt (optional)", "history": [ ["First round instruction (optional)", "First round response (optional)"], ["Second round instruction (optional)", "Second round response (optional)"] ] }] Single-Node Multi-GPU Training # Prepare a yaml file by referring to existing templates in `./examples` and run it. # If using deepspeed for multi-GPU training, specify the number of GPUs via CUDA_VISIBLE_DEVICES. CUDA_VISIBLE_DEVICES=0,1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_30b_lora_sft.yaml # After training, the parameters can be found in the path specified by `output_dir` in the yaml file. After training, you can invoke the LoRA adapter using vLLM. Here is a docker compose template.
...