FLUX.2 [klein]

2026-01-15 15:12:38 +01:00
parent ab7cca6801
commit b56ac61450
12 changed files with 530 additions and 119 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,89 @@
 # FLUX.2
-by Black Forest Labs: https://bfl.ai.

-Documentation for our API can be found here: [docs.bfl.ai](https://docs.bfl.ai/).
+**Frontier Visual Intelligence** — State-of-the-art image generation and editing from [Black Forest Labs](https://bfl.ai).
+
+---
+
+<p align="center">
+<a href="https://docs.bfl.ai">API Docs</a> •
+<a href="https://huggingface.co/black-forest-labs">Hugging Face</a> •
+<a href="https://bfl.ai/blog">Blog</a>
+</p>

 This repo contains minimal inference code to run image generation & editing with our FLUX.2 open-weight models.

+## News
+
+- **[15.01.2026]** Today, we release the FLUX.2 [klein] family of models, our fastest models yet. Sub-second generation on consumer GPUs. Read more about it in our [blog post](https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence).
+- **[25.11.2025]** We are releasing FLUX.2 [dev], a 32B parameter model for text-to-image generation, and image editing (single reference image and multiple reference images).
+
+## Model Overview
+
+| Name | Step-distilled | Guidance-distilled | Text-to-Image | Image Editing (Single reference) | Image Editing (Multi-reference) | License |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
+| [FLUX.2 [klein] 4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B) | ✅ | ✅ | ✅ | ✅ | ✅ | [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
+| [FLUX.2 [klein] 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) | ✅ | ✅ | ✅ | ✅ | ✅ | [FLUX Non-Commercial License](model_licenses/LICENSE-FLUX-NON-COMMERICAL) |
+| [FLUX.2 [klein] 4B Base](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-4B) | ❌ | ❌ | ✅ | ✅ | ✅ | [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
+| [FLUX.2 [klein] 9B Base](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-9B) | ❌ | ❌ | ✅ | ✅ | ✅ | [FLUX Non-Commercial License](model_licenses/LICENSE-FLUX-NON-COMMERICAL) |
+| [FLUX.2 [dev]](https://huggingface.co/black-forest-labs/FLUX.2-dev) | ❌ | ✅ | ✅ | ✅ | ✅ | [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV) |
+
+**All models support**: Text-to-Image ✅ | Single-ref Editing ✅ | Multi-ref Editing ✅
+
+## Which Model Should I Use?
+
+| Need | Recommended |
+|------|-------------|
+| Real-time apps, interactive workflows | [klein] 4B or 9B (distilled) |
+| Consumer GPU (e.g. RTX 3090/4070) | [klein] 4B |
+| Fine-tuning, LoRA training | [klein] Base or FLUX.2 [dev] |
+| Maximum quality, no latency constraints | FLUX.2 [dev] |
+
+## `FLUX.2 [klein]`
+
+FLUX.2 [klein] is our fastest model family — generating and editing (multiple) images in under a second without sacrificing quality. Built for real-time applications, creative iteration, and deployment on consumer hardware.
+
+### Key Capabilities
+- **Sub-second inference** — Generate or edit images under a second on modern hardware
+- **Unified generation & editing** — Text-to-image, image editing, and multi-reference in one model
+- **Runs on consumer GPUs** — Klein 4B fits in ~8GB VRAM (RTX 3090/4070 and up)
+- **Apache 2.0 on 4B** — Open-source, fine-tuning, and customization
+
+### Performance
+
+Klein models define the Pareto frontier for quality vs. latency and VRAM across text-to-image, single-reference editing, and multi-reference generation:
+
+<p align="center">
+<img src="assets/klein_benchmark.jpg" alt="FLUX.2 [klein] vs Baselines — Elo vs Latency and VRAM" width="800"/>
+</p>
+<sub>Higher Elo + Lower Latency/VRAM = Better.</sub>
+
+### The Klein Family
+
+| Model | Best For |
+|:---|:---|
+| **[klein] 4B** | Maximum speed, consumer hardware, edge deployment |
+| **[klein] 9B** | Best quality-to-latency ratio, production apps |
+| **[klein] 4B Base** | Fine-tuning on limited hardware, full customization |
+| **[klein] 9B Base** | Research, LoRA training, maximum output diversity |
+
+**Distilled vs Base:**
+- Use **Distilled** (4-step) for production apps and real-time generation
+- Use **Base** (50-step) for fine-tuning, LoRA training, and maximum flexibility
+
+**Licensing:** 4B models are [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md). 9B models use the [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV).
+
+### Text-to-image examples
+
+Example focused on realism 
+![t2i-klein-grid](assets/t2i_klein_realism.jpg)
+
+Example focused on output diversity
+![t2i-klein-others](assets/t2i_klein_others.jpg)
+
+### Editing examples
+
+![i2i-klein](assets/i2i_klein.jpg)
+
 ## `FLUX.2 [dev]`

 `FLUX.2 [dev]` is a 32B parameter flow matching transformer model capable of generating and editing (multiple) images. The model is released under the [FLUX.2-dev Non-Commercial License](model_licenses/LICENSE-FLUX-DEV) and can be found [here](https://huggingface.co/black-forest-labs/FLUX.2-dev).
@@ -31,11 +110,7 @@ The FLUX.2 autoencoder has considerably improved over the [FLUX.1 autoencoder](h

 ## Local installation

-The inference code was tested on GB200 and H100 (with CPU offloading).
-
-### GB200
-
-On GB200, we tested `FLUX.2 [dev]` using CUDA 12.9 and Python 3.12.
+The inference code was tested on GB200 using CUDA 12.9 and Python 3.12.

 ```bash
 python3.12 -m venv .venv
@@ -43,16 +118,6 @@ source .venv/bin/activate
 pip install -e . --extra-index-url https://download.pytorch.org/whl/cu129 --no-cache-dir
 ```

-### H100
-
-On H100, we tested `FLUX.2 [dev]` using CUDA 12.6 and Python 3.10.
-
-```bash
-python3.10 -m venv .venv
-source .venv/bin/activate
-pip install -e . --extra-index-url https://download.pytorch.org/whl/cu126 --no-cache-dir
-```
-
 ## Run the CLI

 Before running the CLI, you may download the weights from [here](https://huggingface.co/black-forest-labs/FLUX.2-dev) and set the following environment variables.
@@ -60,21 +125,20 @@ Before running the CLI, you may download the weights from [here](https://hugging
 ```bash
 export FLUX2_MODEL_PATH="<flux2_path>"
 export AE_MODEL_PATH="<ae_path>"
+export KLEIN_4B_MODEL_PATH="<klein_4b_path>"
+export KLEIN_4B_BASE_MODEL_PATH="<klein_4b_base_path>"
+export KLEIN_9B_MODEL_PATH="<klein_9b_path>"
+export KLEIN_9B_BASE_MODEL_PATH="<klein_9b_base_path>"
 ```

-If you don't set the environment variables, the weights will be downloaded
-automatically.
+If you don't set the environment variables, the weights will be downloaded automatically.
+
+You can start an interactive session to do both text to image generation as well as editing (one or multiple) images with the following command:

-You can start an interactive session with loaded weights by running the
-following command. That will allow you to do both text to image generation as
-well as editing one or multiple images.
 ```bash
-export PYTHONPATH=src
-python scripts/cli.py
+PYTHONPATH=src python scripts/cli.py
 ```

-On H100, we additionally set the flag `--cpu_offloading True`.
-
 ## Watermarking

 We've added an option to embed invisible watermarks directly into the generated images
@@ -82,52 +146,6 @@ via the [invisible watermark library](https://github.com/ShieldMnt/invisible-wat

 Additionally, we are recommending implementing a solution to mark the metadata of your outputs, such as [C2PA](https://c2pa.org/)

-## 🧨 Lower VRAM diffusers example
-
-The below example should run on a RTX 4090. For more examples check the [diffusers quantization guide here](docs/flux2_dev_hf.md)
-
-```python
-import torch
-from diffusers import Flux2Pipeline
-from diffusers.utils import load_image
-from huggingface_hub import get_token
-import requests
-import io
-
-repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
-device = "cuda:0"
-torch_dtype = torch.bfloat16
-
-def remote_text_encoder(prompts):
-    response = requests.post(
-        "https://remote-text-encoder-flux-2.huggingface.co/predict",
-        json={"prompt": prompts},
-        headers={
-            "Authorization": f"Bearer {get_token()}",
-            "Content-Type": "application/json"
-        }
-    )
-    prompt_embeds = torch.load(io.BytesIO(response.content))
-
-    return prompt_embeds.to(device)
-
-pipe = Flux2Pipeline.from_pretrained(
-    repo_id, text_encoder=None, torch_dtype=torch_dtype
-).to(device)
-
-prompt = "Realistic macro photograph of a hermit crab using a soda can as its shell, partially emerging from the can, captured with sharp detail and natural colors, on a sunlit beach with soft shadows and a shallow depth of field, with blurred ocean waves in the background. The can has the text `BFL Diffusers` on it and it has a color gradient that start with #FF5733 at the top and transitions to #33FF57 at the bottom."
-
-image = pipe(
-    prompt_embeds=remote_text_encoder(prompt),
-    #image=load_image("https://huggingface.co/spaces/zerogpu-aoti/FLUX.1-Kontext-Dev-fp8-dynamic/resolve/main/cat.png") #optional image input
-    generator=torch.Generator(device=device).manual_seed(42),
-    num_inference_steps=50, #28 steps can be a good trade-off
-    guidance_scale=4,
-).images[0]
-
-image.save("flux2_output.png")
-```
-
 ## Citation

 If you find the provided code or models useful for your research, consider citing them as: