Llamafactory Distributed Training

Content Conduct SFT training using the llamafactory framework on L20*8 servers with Ubuntu 22.04. Utilize both single-node multi-GPU and multi-node multi-GPU modes. Selected base model: Qwen3-32B. Environment Configuration Clone the code repository, set up a new conda environment, and install dependencies. git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git cd LLaMA-Factory conda activate llamafactory_env pip install -e . pip install -r requirements/metrics.txt -i https://pypi.tuna.tsinghua.edu.cn/simple Prepare SFT data, place it in the data folder, and register it in dataset_info.json. cd ./data Open dataset_info.json and add the dataset, for example: "my_example": { "file_name": "my_example.json" }, # Use Alpaca or ShareGPT format for SFT data. Alpaca format example is used here. # Alpaca format: (where `instruction` and `input` are automatically concatenated with `\n`) [{ "instruction": "Human instruction (required)", "input": "Human input (optional)", "output": "Model response (required)", "system": "System prompt (optional)", "history": [ ["First round instruction (optional)", "First round response (optional)"], ["Second round instruction (optional)", "Second round response (optional)"] ] }] Single-Node Multi-GPU Training # Prepare a yaml file by referring to existing templates in `./examples` and run it. # If using deepspeed for multi-GPU training, specify the number of GPUs via CUDA_VISIBLE_DEVICES. CUDA_VISIBLE_DEVICES=0,1 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/qwen3_30b_lora_sft.yaml # After training, the parameters can be found in the path specified by `output_dir` in the yaml file. After training, you can invoke the LoRA adapter using vLLM. Here is a docker compose template. ...

2026-01-21 路 2 min 路 349 words 路 Me

DeepSeek-671B Distributed Deployment

1. Overview a. This guide describes the deployment of the DeepSeek-671B model across two servers, each equipped with 8x NVIDIA L20 GPUs. The technology stack utilizes Docker for containerization, the vLLM high-performance inference engine, and the Ray distributed computing framework. b. Official Documentation: vLLM-Distributed c. The official tutorial involves complex steps requiring frequent switching between multiple SSH sessions. To simplify the process, this article consolidates and optimizes the official workflow into a systematic, one-stop deployment guide. ...

2026-01-06 路 4 min 路 731 words 路 Me

L20 8-GPU Server Deep Dive: Integrated Deployment Guide for Multimodal AI Systems (LLM + VLM + RAG + ASR + Dify + MinerU)

Overview This guide provides a step-by-step walkthrough for deploying a full-stack multimodal AI system on a single server equipped with 8x NVIDIA L20 GPUs. The stack includes LLM, VLM, Embedding/Reranker (RAG), ASR, Dify (LLM Orchestration Agent Platform), and MinerU (PDF Extraction). VRAM Estimation for LLMs Key Strategy: Since Large Language Model (LLM) performance correlates more strongly with parameter scale (B) than with quantization levels, we prioritize models with higher parameter counts. For this deployment, we selected the int4 AWQ versions of Qwen3-235B and GLM-4.5V-106B to maximize overall intelligence and performance within the available VRAM. ...

2026-01-05 路 3 min 路 612 words 路 Me