README.md · Guernika/CoreMLStableDiffusion at 1935d77b4c7a65f4e9a8b615f862d1da32a3a93a

metadata

license: creativeml-openrail-m

Converting Models to Core ML

Step 1: Create a Python environment and install dependencies:

conda create -n guernika python=3.8 -y
conda activate guernika
cd /path/to/unziped/scripts/location
pip install -e .

Step 2: Log in to or register for your Hugging Face account, generate a User Access Token and use this token to set up Hugging Face API access by running huggingface-cli login in a Terminal window.

Step 3a: Navigate to the version of Stable Diffusion that you would like to use on Hugging Face Hub and accept its Terms of Use. The default model version is CompVis/stable-diffusion-v1-4. The model version may be changed by the user as described in the next step.

Step 3b: You may also convert an existing model from a ckpt package by using the convert_original_stable_diffusion_to_diffusers.py script. After converting it you can continue using the --model-location argument to indicate the location of your converted model.

Step 4: Execute the following command from the Terminal to generate Core ML model files (.mlpackage) and Guernika compatible model.

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-encoder --convert-vae-decoder --convert-safety-checker -o <output-mlpackages-directory> --bundle-resources-for-swift-cli

WARNING: This command may download several GB worth of PyTorch checkpoints from Hugging Face.

This generally takes 15-20 minutes on an M1 MacBook Pro. Upon successful execution, the 4 neural network models that comprise Stable Diffusion will have been converted from PyTorch to Core ML (.mlpackage) and saved into the specified <output-mlpackages-directory>. Some additional notable arguments:

--model-version: The model version defaults to CompVis/stable-diffusion-v1-4. Developers may specify other versions that are available on Hugging Face Hub, e.g. stabilityai/stable-diffusion-2-base & runwayml/stable-diffusion-v1-5.
--model-location: The location of a local model defaults to None.
--bundle-resources-for-swift-cli: Compiles all 4 models and bundles them along with necessary resources for text tokenization into <output-mlpackages-directory>/Resources which should provided as input to the Swift package. This flag is not necessary for the diffusers-based Python pipeline.
--chunk-unet: Splits the Unet model in two approximately equal chunks (each with less than 1GB of weights) for mobile-friendly deployment. This is required for ANE deployment on iOS and iPadOS. This is not required for macOS. Swift CLI is able to consume both the chunked and regular versions of the Unet model but prioritizes the former. Note that chunked unet is not compatible with the Python pipeline because Python pipeline is intended for macOS only. Chunking is for on-device deployment with Swift only.
--attention-implementation: Defaults to SPLIT_EINSUM which is the implementation described in Deploying Transformers on the Apple Neural Engine. --attention-implementation ORIGINAL will switch to an alternative that should be used for non-ANE deployment. Please refer to the Performance Benchmark section for further guidance.
--check-output-correctness: Compares original PyTorch model's outputs to final Core ML model's outputs. This flag increases RAM consumption significantly so it is recommended only for debugging purposes.