ECNU-CILab
/

ArtAug-lora-FLUX.1dev-v1

Text-to-Image

Diffusers

lora

Model card Files Files and versions Community

Artiprocher commited on Dec 18, 2024

Commit

994909d

verified ·

1 Parent(s): 1a35a1b

Update README.md

Browse files

Files changed (1) hide show

README.md +28 -12

README.md CHANGED Viewed

@@ -2,22 +2,38 @@
 license: apache-2.0
 ---
-# FLUX 艺术增强 LoRA
-## 简介
-这是一个为 FLUX.1-dev 训练的 LoRA 模型，能够使模型生成的图像更符合人类的审美，包括但不限于：丰富的细节、唯美的光影、美学的构图、清晰的画面。本模型不需要任何触发词。
-* 论文：coming soon
-* 开源代码：https://github.com/modelscope/DiffSynth-Studio
-* 模型：
     * ModelScope: https://www.modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1
-    * HuggingFace: coming soon
-* 在线体验：点击右上角“一键生成”
-## 使用方式
-本模型使用 DiffSynth-Studio 训练而来，我们推荐使用 DiffSynth-Studio 进行生成。
 ```shell
 git clone https://github.com/modelscope/DiffSynth-Studio.git
@@ -42,9 +58,9 @@ image = pipe(prompt="a house", seed=0)
 image.save("image_artaug.jpg")
 ```
-由于本模型使用了通用的 FLUX LoRA 格式封装，可以被大多数 LoRA 加载器加载，你可以将这个 LoRA 模型接入到你的工作流中。
-## 图像样例
 |FLUX.1-dev|FLUX.1-dev + ArtAug LoRA|
 |-|-|

 license: apache-2.0
 ---
+# FLUX Aesthetics Enhancement LoRA
+## Introduction
+This is a LoRA model trained for FLUX.1-dev, which enhances the aesthetic quality of images generated by the model. The improvements include, but are not limited to: rich details, beautiful lighting and shadows, aesthetic composition, and clear visuals. This model does not require any trigger words.
+* Paper：https://arxiv.org/abs/2412.12888
+* Opensourced project：https://github.com/modelscope/DiffSynth-Studio
+* Model：
     * ModelScope: https://www.modelscope.cn/models/DiffSynth-Studio/ArtAug-lora-FLUX.1dev-v1
+    * HuggingFace: Coming soon
+* Demo：Comming soon
+## Methodology
+![](workflow.jpg)
+The ArtAug project is inspired by reasoning approaches like GPT-o1, which rely on model interaction and self-correction. We developed a framework aimed at enhancing the capabilities of image generation models through interaction with image understanding models. The training process of ArtAug consists of the following steps:
+1. **Synthesis-Understanding Interaction**: After generating an image using the image generation model, we employ a multimodal large language model (Qwen2-VL-72B) to analyze the image content and provide suggestions for modifications, which then lead to the regeneration of a higher quality image.
+2. **Data Generation and Filtering**: Interactive generation involves long inference times and sometimes produce poor image content. Therefore, we generate a large batch of image pairs offline, filter them, and use them for subsequent training.
+3. **Differential Training**: We apply differential training techniques to train a LoRA model, enabling it to learn the differences between images before and after enhancement, rather than directly training on the dataset of enhanced images.
+4. **Iterative Enhancement**: The trained LoRA model is fused into the base model, and the entire process is repeated multiple times with the fused model until the interaction algorithm no longer provides significant enhancements. The LoRA models produced in each iteration are combined to produce this final model.
+This model integrates the aesthetic understanding of Qwen2-VL-72B into FLUX.1[dev], leading to an improvement in the quality of generated images.
+## Usage
+This model is trained using [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). We recommend users to use DiffSynth-Studio for inference.
 ```shell
 git clone https://github.com/modelscope/DiffSynth-Studio.git
 image.save("image_artaug.jpg")
 ```
+Since this model is encapsulated in the universal FLUX LoRA format, it can be loaded by most LoRA loaders, allowing you to integrate this LoRA model into your own workflow.
+## Examples
 |FLUX.1-dev|FLUX.1-dev + ArtAug LoRA|
 |-|-|