Z Image Turbo

Content

This article introduces the usage of Z-Image-Turbo in conjunction with ComfyUI.

Advantages of Z-Image-Turbo:

Strong Chinese prompt-following and Chinese character generation capabilities.
Requires only 8 inference steps for image generation. With a compact 6B parameter count, it can run on consumer-grade hardware (16GB VRAM) using quantization.

Due to network restrictions in certain regions that prevent the use of ComfyUI-Manager for automatic downloads, all file downloads are provided for manual installation.

Prerequisites

Configure ComfyUI. You will need to install ControlNet components. Additionally, you can install the llama-cpp-vlm extension to enable image-to-text interrogation based on Qwen3-VL. To view the generated text output, install comfyUI-custom-scripts.

ControlNet Repository: Here
Llama-cpp-vlm Repository: Here
ComfyUI-custom-scripts Repository: Here

Note: The download link for the llama-cpp-python.whl plugin required by Llama-cpp-vlm is listed below along with the Qwen3-VL model download link.

Model Download Summary

Z-image-turbo Triad: The download link includes both full-precision and quantized versions. During execution, you can select the quantized versions for diffusion_models and text_encoders to minimize VRAM usage. Place them into the ComfyUI directory as shown in the image: Download Link

directory-1

ControlNet Base Model: ** Download Link**

directory-2

ControlNet Human Pose Control Model: Requires body_pose_model.pth, hand_pose_model.pth, and facenet.pth: Download Link

directory-3

ControlNet Depth Control Model: Download Link

directory-4

Qwen3-VL Model + Wheel files for required plugins: Download Link and Wheel Download Please verify your system version and Python version before downloading.

directory-5