metadata

license: apache-2.0
base_model: stabilityai/stable-diffusion-xl-base-1.0
tags:
  - art
  - t2i-adapter
  - stable-diffusion
  - image-to-image

T2I-Adapter-SDXL - Sketch

T2I Adapter is a network providing additional conditioning to stable diffusion. Each t2i checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint.

This checkpoint provides conditioning on canny for the StableDiffusionXL checkpoint.

Model Details

Developed by: T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Model type: Diffusion-based text-to-image generation model
Language(s): English
License: Apache 2.0
Resources for more information: GitHub Repository, Paper.
Cite as:

@misc{ title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models}, author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie}, year={2023}, eprint={2302.08453}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Checkpoints

Model Name	Control Image Overview	Control Image Example	Generated Image Example
Adapter/t2iadapter_canny_sdxlv1 Trained with canny edge detection	A monochrome image with white edges on a black background.
Adapter/t2iadapter_sketch_sdxlv1 Trained with PidiNet edge detection	A hand-drawn monochrome image with white outlines on a black background.
Adapter/t2iadapter_depth_sdxlv1 Trained with Midas depth estimation	A grayscale image with black representing deep areas and white representing shallow areas.
Adapter/t2iadapter_openpose_sdxlv1 Trained with OpenPose bone image	A OpenPose bone image.

Example

To get started, first install the required dependencies:

pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl # for now
pip install git+https://github.com/patrickvonplaten/controlnet_aux.git # for conditioning models and detectors  
pip install transformers accelerate safetensors

Images are first downloaded into the appropriate control image format.
The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let's have a look at a simple example using the Canny Adapter.

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.pidi import PidiNetDetector

# load adapter
adapter = T2IAdapter.from_pretrained(
  "Adapter/t2i-adapter-sketch-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae= AutoencoderKL.from_pretrained(
  "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
  model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

# Load PidiNet
pidinet = PidiNetDetector.from_pretrained("lllyasviel/Annotators").to("cuda")

url = "https://raw.githubusercontent.com/lllyasviel/ControlNet/main/test_imgs/cyber.png"
image = load_image(url)
image = pidinet(
  image, detect_resolution=512, image_resolution=1024, apply_filter=True
).resize((896, 1152))

prompt = "a robot, mount fuji in the background, 4k photo, highly detailed"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"

gen_images = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  image=image,
  num_inference_steps=30,
  adapter_conditioning_scale=1, 
  cond_tau=1
).images
gen_images[0]