SD-πXL: Generating Low-Resolution Quantized Imagery via Score Distillation (SIGGRAPH Asia 2024)

SD-πXL: Generating Low-Resolution Quantized Imagery via Score Distillation.

Introduction • Installation • Quickstart • User Guide • Citation • Acknowledgment • License

👾 Introduction

This is the codebase for our work SD-πXL: Generating Low-Resolution Quantized Imagery via Score Distillation, which will be presented at SIGGRAPH Asia 2024. SD-πXL is designed to create low-resolution, color-limited images like pixel art, which are popular in video games and digital design. This tool allows users to input a text prompt or an image, choose a specific image size and color palette, and then automatically generates the desired artwork. SD-πXL ensures the images are both aesthetically pleasing and useful for practical applications like cross-stich embroidery. Our technical paper with detailed explanations can be found here.

🔧 Installation

^{▴ Back to top}

Prior to running the code, it is advised to create a virtual environment. You can simply do so via the following command:

conda env create -f environment.yml

You can also manually install the required packages via

conda create --name SD-piXL python=3.10
conda activate SD-piXL
pip3 install torch torchvision torchaudio
pip install matplotlib accelerate omegaconf einops transformers scipy tensorboard openai-clip xformers opencv-python
pip install git+https://github.com/huggingface/diffusers
pip3 install -U scikit-learn
pip install -U peft

⏱️ Quickstart

^{▴ Back to top}

After installing the dependencies, simply run.

accelerate launch main.py -c config.yaml --download

The first time you run SD-πXL, it might take some time during initialization because the program downloads the necessary model weights from 🤗 Hugging Face. As our approach is based on optimization instead of deep neural network inference, it can be quite slow (as in takes a few hours to complete). It also requires a lot of VRAM (24GB is recommended). To decrease the memory requirements, see the section Memory Requirements.

After completion of the program, intermediate and final results are stored in workdir/config/folderName where the folderName is generated based on the date of execution. With the default config file, you should obtain the following result.


Input Image	SD-πXL Output (argmax)	SD-πXL Output (softmax)

SD-πXL saves intermediate model checkpoints (every 5000 steps - can be changed with the --save_step argument.) The following section provides detailed instructions on how to adjust parameters to achieve various outcomes.

📜 User Guide

^{▴ Back to top}

Run

The full command is

accelerate launch main.py --config [CONFIG] --seed [SEED_ID] -pt [PROMPT] -npt [NEGATIVE_PROMPT] --download --input_image [SRC_INPUT_IMG] --size [H,W] --palette [SRC_PALETTE] --verbose --make_video --video_size [H,W] --fps [FPS]

Each of these arguments have default value either in main.py, either in the config file. We explain below each argument.

--config: Config file. Default is set to config.yaml
--seed: Seed. Default is is the config file.
-pt: Prompt. Default is is the config file.
-npt: Negative prompt. Default is is the config file.
--download: if added, will download the required resources, such as SDXL's weights, ControlNet's weights, etc. Will only download them the first time they are needed.
--input_image: used input image for conditionning the generation. Default is in the config file.
--size: size of the generated image. Default is in the config file. To be provided as a tuple, example: --size 64,64
--palette: path to the used palette. Default is in the config file.
--verbose: run with verbose mode.
--make_video: will create a video from the saved intermediate steps.
--video_size: size of the video. Default is 512,512
--fps: FPS of the video, default is 30

Example:

accelerate launch main.py --config config.yaml --seed 0 -pt "a Chinese dragon flying through the air on a dark background with smoke coming out of its mouth and tail." -npt "" --download --input_image assets/image/chinese_dragon.png --size 96,96 --palette assets/palettes/lospec/neon-space.hex --verbose --make_video --video_size 288,288 --fps 24

Result


Input Image	SD-πXL Output (argmax)	SD-πXL Output (softmax)	Video (compressed as a gif)

Verbose

By adding --verbose to the command, you will enter verbose mode. Not only will you save intermediate results, but the programm will also:

Save the input image.
Save the ControlNet conditioning images, typically the Canny edge and Depth map of the input image.
Save the gradients. The saved gradient is taken at a randomly sampled t value.
Save the entropy per pixel.
Save the results of the Gumbel-Reparameterized generation (both soft- and argmax), in addition to the non-Gumbel reparameterized versions.
Save the augmented image of the current iteration.
Log the variance, entropy, and max probability with tensorboard. Run tensorboard --logdir workdir/Path/tensorboard/ to visualize them.
Log more states of the program, such as additional loss information.


Initialization	Canny edge	Depth map	Augmentation	Gradient	Entropy	Gumbel-Argmax	Gumbel-Softmax

Configuration file

The configuration file must be in the ./config directory. Only provide the name of the file as input, not the full path. The config file is automatically saved to the result directory. Fields in the given config/config.yaml are either self-explanatory and commented with possible options, please refer to it. We provide some additional details below.

Saving intermediate steps: saving_resize is the size used to save the intermediate and final results. It must be an integer multiple of the target size (generator.image_H, generator.image_W). Intermediate results are saved every training.save_steps.
Initialization: generator.initialize_renderer determine whether the renderer is initialized with the input image. The initialization method depends on initialization_method and init_distance.
Kmeans: if initialization_method is set to kmeans, then kmeans_nb_colors is the number of colors used for the color decomposition.
Softmax: smooth_softmax is set True to use softmax during training, set to False to use argmax otherwise.
Gumbel: gumbel whether to use the Gumbel-Softmax reparameterization with parameter tau. If you want a crisp pixel art result that adheres exactly to the color palette, set gumbel to True. If you merely want to have colors that lie in the convex hull of the color palette, set gumbel to False. If training.augmentation.random_tau is True, then tau will be randomly picked between random_tau_min and random_tau_max.
Optimization: the optimization will run for training.steps steps. Other parameters for the optimizer are self-explanatory in the configuration file.
Image augmentation: The generated image is augmented before being fed to the Latent Diffusion Model. The resize mode is given in training.resize_mode ('bilinear' is recommended), and training.augmentation allows to modify the graycale probability, horizontal flip probability.
Losses: Different losses can be used with our method. Our paper only presents the Fast Fourier Transform Loss, whose scale can be tweaked via training.fft_scale (default 20). Other losses are left for users willing to experiment the program with different smoothing techniques.
Diffusion model: diffusion.model_id is either sdxl, either ssbd1. We recommend using taesdxl for the variational autoencoder.
ControlNet: you can toggle off the use of controlnet with use_controlnet set to False. You can set up either one or two controlnet conditioning (depth or canny), and each of them has three variants and is associated with a controlnet scale. The scales are additive and can produce deformed results if the sum exceeds 1. Removing both controlnet ids leads to an error.
LoRA: you can use any LoRA compatible with SDXL by setting the diffusion.lora_path to a correct LoRA id. An example of such id is nerijs/pixel-art-xl.
Score Distillation: we provide several parameters for score distillation. The most sensitive is the guidance_scale, which controls how the prompt influence the produced image. Refer to the paper for more information. The sampling_method_t is set to bounded_max, which means that t will be sampled uniformly in [t_min, t_max], with t_max linearly decreasing to reach t_bound_max when the epoch reaches t_bound_reached × training.steps.
Image generation: If no input image is provided, but you still want to initialize the generator by setting generator.initialize_renderer to True, the program will use the input prompt to generate an image using SDXL. The number of inference steps, the guidance scale and the number of reference images to be generated are configurable.
Caption: If, on the contrary, the you do not want to give a prompt, you can set automatic_caption to True for SD-πXL to use BLIP to generate a caption for the input image. This caption will then be used as a prompt.

Memory requirements

The memory requirements to run SD-πXL can be quite high, and it is recommended to use a GPU with 24GB of VRAM. To decrease the used VRAM, you can modify the config file as such:

Change the model_id from sdxl to ssd1b. SSD1B is a distilled 50% smaller version of SDXL.
Change canny_mid and depth_mid to canny_small and depth_small
Remove either canny or depth conditioning (removing both will result in an error).

Using another model architecture would also yield substantial memory requirement decrease and might even speed up the algorithm (at the potential expense of quality, though). Unfortunately, it would also probably require to rewrite the Score Distillation pipeline for each architecture.

Palettes

Palettes can be found in assets/palettes/ as .hex files. Each line is the hexadecimal code of a color, following the RGB convention.

You can visualize all the palettes in a given folder via the following script

PYTHONPATH=./ python assets/palettes/palette_show.py --hex_dir=assets/palettes/lospec/

The provided palettes in the assets/palettes/lospec folder are not our production. Visit Lospec for more high quality palettes. We share a few palettes in this repo and attribute them to their respective authors below:

The palettes provided in the two other folders are our own creations. The palettes in the assets/palettes/lattices are simply a regular decomposition of the RGB cube. For instance, the file 4_4_4_lattice.hex simply divides each coordinates of the RGB cube in 4, and takes all the colors such that each coordinate , resulting in a color palette of 4×4×4 = 64 colors.

Visualize Layer decomposition

You can vizualise the result layer by layer via this command.

python models/pixelart.py --checkpoint=workdir/config/folderName/checkpoint_10000/model_checkpoint.pth --save_dir=results

This will save the results in a results/ folder. On the latter example, you would obtain this:


Layer 0	Layer 1	Layer 2	Layer 3	Layer 4	Layer 5	Layer 6	Layer 7	Layer 8	Layer 9

You can add a --palette argument as long as the palette used as argument has the same size as the palette used to optimize the image generator. For instance, because the palette used was lospec/neon-space.hex with 10 colors, you can use any other 10-color palette:

python models/pixelart.py --checkpoint=workdir/config/folderName/checkpoint_10000/model_checkpoint.pth --save_dir=results --palette=assets/palettes/other/aqua_verde.hex

Alongside the layer decomposition, it also saves the result:


Argmax - Palette recolor	Softmax - Palette recolor

Fabrications

Low-resolution images with low number of colors are particularly suitable for fabrication. You can find an example of such fabrications below.


SD-πXL	Embroidery	Fuse beads	Interlocking bricks

The embroidery was made using PFAFF® creative Icon 2 machine. The interlocking brick was made and rendered using the Mecabricks software.

🪪 Citation

^{▴ Back to top}

If this work or this codebase is useful to you, please cite it using the bibtex entry below:

@article{Binninger:SDpiXL:2024,
title={SD-piXL: Generating Low-Resolution Quantized Imagery via Score Distillation},
author={Binninger, Alexandre and Sorkine-Hornung, Olga},
journal={SIGGRAPH ASIA 2024, Technical Papers},
year={2024},
doi={10.1145/3680528.3687570}
}

📢 Acknowledgements

^{▴ Back to top}

We thank the anonymous reviewers for their constructive feedback. Ximing Xing's open-source version of VectorFusion was instrumental in the development and design of our source code. This work was supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 101003104, ERC CoG MYCLOTH).

📃 License

^{▴ Back to top}

This repository is licensed under the MIT License. The provided input images in ./assets/image were produced via DALL·E.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
config		config
libs		libs
models		models
pipelines		pipelines
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SD-πXL: Generating Low-Resolution Quantized Imagery via Score Distillation (SIGGRAPH Asia 2024)

👾 Introduction

🔧 Installation

⏱️ Quickstart

📜 User Guide

Run

Example:

Result

Verbose

Configuration file

Memory requirements

Palettes

Visualize Layer decomposition

Fabrications

🪪 Citation

📢 Acknowledgements

📃 License

About

Releases

Packages

Languages

License

AlexandreBinninger/SD-piXL

Folders and files

Latest commit

History

Repository files navigation

SD-πXL: Generating Low-Resolution Quantized Imagery via Score Distillation (SIGGRAPH Asia 2024)

👾 Introduction

🔧 Installation

⏱️ Quickstart

📜 User Guide

Run

Example:

Result

Verbose

Configuration file

Memory requirements

Palettes

Visualize Layer decomposition

Fabrications

🪪 Citation

📢 Acknowledgements

📃 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages