This project aims to support a SDXL version of GLIGEN adapters, with huggingface-style pipeline. The project is part of the effort in creating InteractDiffusion XL. More details is at the Github Repo
Motivation
IGLIGEN reproduces GLIGEN on diffusers frameworks and made the training procedure easier to be replicate. They have released the code and pretrained weights for SD v1.4/v1.5, SD v2.0/v2.1, but the support for SDXL is still awaited with great anticipation. This repo open source the pretrained weight of GLIGEN adapter for SDXL, together with the diffusers pipeline and training code. We thank the author of GLIGEN and IGLIGEN on their work.
Usage
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"jiuntian/gligen-xl-512", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")
prompt = "An image of grassland with a dog."
# Image generation with GLIGEN
output_images = pipeline(
prompt,
num_inference_steps=50,
height=512, width=512,
gligen_scheduled_sampling_beta=0.4,
gligen_boxes=[[0.1, 0.6, 0.3, 0.8]],
gligen_phrases=["a dog"],
num_images_per_prompt=1,
output_type="pt"
).images
Citation
The authors of this repo (IGLIGEN-XL) are not affiliated with the authors of GLIGEN and IGLIGEN. Since IGLIGEN-XL is based on GLIGEN and IGLIGEN, if you use the IGLIGEN-XL code or adapters, please kindly consider citing the original GLIGEN and IGLIGEN paper:
@article{li2023gligen,
title={GLIGEN: Open-Set Grounded Text-to-Image Generation},
author={Li, Yuheng and Liu, Haotian and Wu, Qingyang and Mu, Fangzhou and Yang, Jianwei and Gao, Jianfeng and Li, Chunyuan and Lee, Yong Jae},
journal={CVPR},
year={2023}
}
@article{lian2023llmgrounded,
title={Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}
The project is part of the effort in creating InteractDiffusion XL.
Please kindly consider citing InteractDiffusion if you use IGLIGEN-XL code/trained weights.
@inproceedings{hoe2023interactdiffusion,
title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models},
author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu},
year={2024},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}
- Downloads last month
- 17
Model tree for jiuntian/gligen-xl-512
Base model
stabilityai/stable-diffusion-xl-base-1.0