Model Card for VRoid Diffusion
This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.
- Text Encoder is from OpenCLIP ViT-H/14, MIT License, Training Data : LAION-2B
- VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.
Model Details
vroid_diffusion_test.safetensors
- base variant.
vroid_diffusion_test_invert_red_blue.safetensors
red
andblue
in the caption is swapped.pink
andskyblue
in the caption is swapped.
vroid_diffusion_test_monochrome.safetensors
- all training images are converted to grayscale.
Model Variant
- VRoid Diffusion Unconditional
- This is unconditional image generator without CLIP.
Model Description
- Developed by: Abstract Engine.
- License: Mitsua Open RAIL-M License.
Uses
Direct Use
Text-to-Image generation for research and educational purposes.
Out-of-Scope Use
Any deployed use case of the model.
Training Details
- Trained resolution : 256x256
- Batch Size : 48
- Steps : 45k
- LR : 1e-5 with warmup 1000 steps
Training Data
We use full version of VRoid Image Dataset Lite with some modifications.
- Downloads last month
- 161
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.