FLUX.2 [klein]
This commit is contained in:
162
README.md
162
README.md
@@ -1,10 +1,89 @@
|
||||
# FLUX.2
|
||||
by Black Forest Labs: https://bfl.ai.
|
||||
|
||||
Documentation for our API can be found here: [docs.bfl.ai](https://docs.bfl.ai/).
|
||||
**Frontier Visual Intelligence** — State-of-the-art image generation and editing from [Black Forest Labs](https://bfl.ai).
|
||||
|
||||
---
|
||||
|
||||
<p align="center">
|
||||
<a href="https://docs.bfl.ai">API Docs</a> •
|
||||
<a href="https://huggingface.co/black-forest-labs">Hugging Face</a> •
|
||||
<a href="https://bfl.ai/blog">Blog</a>
|
||||
</p>
|
||||
|
||||
This repo contains minimal inference code to run image generation & editing with our FLUX.2 open-weight models.
|
||||
|
||||
## News
|
||||
|
||||
- **[15.01.2026]** Today, we release the FLUX.2 [klein] family of models, our fastest models yet. Sub-second generation on consumer GPUs. Read more about it in our [blog post](https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence).
|
||||
- **[25.11.2025]** We are releasing FLUX.2 [dev], a 32B parameter model for text-to-image generation, and image editing (single reference image and multiple reference images).
|
||||
|
||||
## Model Overview
|
||||
|
||||
| Name | Step-distilled | Guidance-distilled | Text-to-Image | Image Editing (Single reference) | Image Editing (Multi-reference) | License |
|
||||
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
||||
| [FLUX.2 [klein] 4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B) | ✅ | ✅ | ✅ | ✅ | ✅ | [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
|
||||
| [FLUX.2 [klein] 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) | ✅ | ✅ | ✅ | ✅ | ✅ | [FLUX Non-Commercial License](model_licenses/LICENSE-FLUX-NON-COMMERICAL) |
|
||||
| [FLUX.2 [klein] 4B Base](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-4B) | ❌ | ❌ | ✅ | ✅ | ✅ | [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
|
||||
| [FLUX.2 [klein] 9B Base](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-9B) | ❌ | ❌ | ✅ | ✅ | ✅ | [FLUX Non-Commercial License](model_licenses/LICENSE-FLUX-NON-COMMERICAL) |
|
||||
| [FLUX.2 [dev]](https://huggingface.co/black-forest-labs/FLUX.2-dev) | ❌ | ✅ | ✅ | ✅ | ✅ | [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV) |
|
||||
|
||||
**All models support**: Text-to-Image ✅ | Single-ref Editing ✅ | Multi-ref Editing ✅
|
||||
|
||||
## Which Model Should I Use?
|
||||
|
||||
| Need | Recommended |
|
||||
|------|-------------|
|
||||
| Real-time apps, interactive workflows | [klein] 4B or 9B (distilled) |
|
||||
| Consumer GPU (e.g. RTX 3090/4070) | [klein] 4B |
|
||||
| Fine-tuning, LoRA training | [klein] Base or FLUX.2 [dev] |
|
||||
| Maximum quality, no latency constraints | FLUX.2 [dev] |
|
||||
|
||||
## `FLUX.2 [klein]`
|
||||
|
||||
FLUX.2 [klein] is our fastest model family — generating and editing (multiple) images in under a second without sacrificing quality. Built for real-time applications, creative iteration, and deployment on consumer hardware.
|
||||
|
||||
### Key Capabilities
|
||||
- **Sub-second inference** — Generate or edit images under a second on modern hardware
|
||||
- **Unified generation & editing** — Text-to-image, image editing, and multi-reference in one model
|
||||
- **Runs on consumer GPUs** — Klein 4B fits in ~8GB VRAM (RTX 3090/4070 and up)
|
||||
- **Apache 2.0 on 4B** — Open-source, fine-tuning, and customization
|
||||
|
||||
### Performance
|
||||
|
||||
Klein models define the Pareto frontier for quality vs. latency and VRAM across text-to-image, single-reference editing, and multi-reference generation:
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/klein_benchmark.jpg" alt="FLUX.2 [klein] vs Baselines — Elo vs Latency and VRAM" width="800"/>
|
||||
</p>
|
||||
<sub>Higher Elo + Lower Latency/VRAM = Better.</sub>
|
||||
|
||||
### The Klein Family
|
||||
|
||||
| Model | Best For |
|
||||
|:---|:---|
|
||||
| **[klein] 4B** | Maximum speed, consumer hardware, edge deployment |
|
||||
| **[klein] 9B** | Best quality-to-latency ratio, production apps |
|
||||
| **[klein] 4B Base** | Fine-tuning on limited hardware, full customization |
|
||||
| **[klein] 9B Base** | Research, LoRA training, maximum output diversity |
|
||||
|
||||
**Distilled vs Base:**
|
||||
- Use **Distilled** (4-step) for production apps and real-time generation
|
||||
- Use **Base** (50-step) for fine-tuning, LoRA training, and maximum flexibility
|
||||
|
||||
**Licensing:** 4B models are [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md). 9B models use the [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV).
|
||||
|
||||
### Text-to-image examples
|
||||
|
||||
Example focused on realism
|
||||

|
||||
|
||||
Example focused on output diversity
|
||||

|
||||
|
||||
### Editing examples
|
||||
|
||||

|
||||
|
||||
## `FLUX.2 [dev]`
|
||||
|
||||
`FLUX.2 [dev]` is a 32B parameter flow matching transformer model capable of generating and editing (multiple) images. The model is released under the [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV) and can be found [here](https://huggingface.co/black-forest-labs/FLUX.2-dev).
|
||||
@@ -31,11 +110,7 @@ The FLUX.2 autoencoder has considerably improved over the [FLUX.1 autoencoder](h
|
||||
|
||||
## Local installation
|
||||
|
||||
The inference code was tested on GB200 and H100 (with CPU offloading).
|
||||
|
||||
### GB200
|
||||
|
||||
On GB200, we tested `FLUX.2 [dev]` using CUDA 12.9 and Python 3.12.
|
||||
The inference code was tested on GB200 using CUDA 12.9 and Python 3.12.
|
||||
|
||||
```bash
|
||||
python3.12 -m venv .venv
|
||||
@@ -43,16 +118,6 @@ source .venv/bin/activate
|
||||
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu129 --no-cache-dir
|
||||
```
|
||||
|
||||
### H100
|
||||
|
||||
On H100, we tested `FLUX.2 [dev]` using CUDA 12.6 and Python 3.10.
|
||||
|
||||
```bash
|
||||
python3.10 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu126 --no-cache-dir
|
||||
```
|
||||
|
||||
## Run the CLI
|
||||
|
||||
Before running the CLI, you may download the weights from [here](https://huggingface.co/black-forest-labs/FLUX.2-dev) and set the following environment variables.
|
||||
@@ -60,21 +125,20 @@ Before running the CLI, you may download the weights from [here](https://hugging
|
||||
```bash
|
||||
export FLUX2_MODEL_PATH="<flux2_path>"
|
||||
export AE_MODEL_PATH="<ae_path>"
|
||||
export KLEIN_4B_MODEL_PATH="<klein_4b_path>"
|
||||
export KLEIN_4B_BASE_MODEL_PATH="<klein_4b_base_path>"
|
||||
export KLEIN_9B_MODEL_PATH="<klein_9b_path>"
|
||||
export KLEIN_9B_BASE_MODEL_PATH="<klein_9b_base_path>"
|
||||
```
|
||||
|
||||
If you don't set the environment variables, the weights will be downloaded
|
||||
automatically.
|
||||
If you don't set the environment variables, the weights will be downloaded automatically.
|
||||
|
||||
You can start an interactive session to do both text to image generation as well as editing (one or multiple) images with the following command:
|
||||
|
||||
You can start an interactive session with loaded weights by running the
|
||||
following command. That will allow you to do both text to image generation as
|
||||
well as editing one or multiple images.
|
||||
```bash
|
||||
export PYTHONPATH=src
|
||||
python scripts/cli.py
|
||||
PYTHONPATH=src python scripts/cli.py
|
||||
```
|
||||
|
||||
On H100, we additionally set the flag `--cpu_offloading True`.
|
||||
|
||||
## Watermarking
|
||||
|
||||
We've added an option to embed invisible watermarks directly into the generated images
|
||||
@@ -82,52 +146,6 @@ via the [invisible watermark library](https://github.com/ShieldMnt/invisible-wat
|
||||
|
||||
Additionally, we are recommending implementing a solution to mark the metadata of your outputs, such as [C2PA](https://c2pa.org/)
|
||||
|
||||
## 🧨 Lower VRAM diffusers example
|
||||
|
||||
The below example should run on a RTX 4090. For more examples check the [diffusers quantization guide here](docs/flux2_dev_hf.md)
|
||||
|
||||
```python
|
||||
import torch
|
||||
from diffusers import Flux2Pipeline
|
||||
from diffusers.utils import load_image
|
||||
from huggingface_hub import get_token
|
||||
import requests
|
||||
import io
|
||||
|
||||
repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
|
||||
device = "cuda:0"
|
||||
torch_dtype = torch.bfloat16
|
||||
|
||||
def remote_text_encoder(prompts):
|
||||
response = requests.post(
|
||||
"https://remote-text-encoder-flux-2.huggingface.co/predict",
|
||||
json={"prompt": prompts},
|
||||
headers={
|
||||
"Authorization": f"Bearer {get_token()}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
)
|
||||
prompt_embeds = torch.load(io.BytesIO(response.content))
|
||||
|
||||
return prompt_embeds.to(device)
|
||||
|
||||
pipe = Flux2Pipeline.from_pretrained(
|
||||
repo_id, text_encoder=None, torch_dtype=torch_dtype
|
||||
).to(device)
|
||||
|
||||
prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."
|
||||
|
||||
image = pipe(
|
||||
prompt_embeds=remote_text_encoder(prompt),
|
||||
#image=load_image("https://huggingface.co/spaces/zerogpu-aoti/FLUX.1-Kontext-Dev-fp8-dynamic/resolve/main/cat.png") #optional image input
|
||||
generator=torch.Generator(device=device).manual_seed(42),
|
||||
num_inference_steps=50, #28 steps can be a good trade-off
|
||||
guidance_scale=4,
|
||||
).images[0]
|
||||
|
||||
image.save("flux2_output.png")
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
If you find the provided code or models useful for your research, consider citing them as:
|
||||
|
||||
Reference in New Issue
Block a user