Releases: bghira/SimpleTuner
v1.0.1
This is a maintenance release with not many new features.
What's Changed
- fix reference error to use_dora by @bghira in #929
- fix merge error by @bghira in #930
- fix use of --num_train_epochs by @bghira in #932
- merge fixes by @bghira in #934
- documentation updates, deepspeed config reference error fix by @bghira in #935
- Fix caption_with_cogvlm.py for cogvlm2 + textfile strategy by @burgalon in #936
- dependency updates, cogvlm fixes, peft/lycoris resume fix by @bghira in #939
- feature: zero embed padding for t5 on request by @bghira in #941
- merge by @bghira in #942
- comet_ml validation images by @burgalon in #944
- Allow users to init their LoKr with perturbed normal w2 by @AmericanPresidentJimmyCarter in #943
- merge by @bghira in #948
- fix typo in PR by @bghira in #949
- update arg name for norm init by @bghira in #950
- configure script should not set dropout by default by @bghira in #955
- VAECache: improve startup speed for large sets by @bghira in #956
- Update FLUX.md by @anae-git in #957
- mild bugfixes by @bghira in #963
- fix bucket worker not waiting for all queue worker to finish by @burgalon in #967
- merge by @bghira in #968
- fix DDP for PEFT LoRA & minor exit error by @bghira in #974
New Contributors
Full Changelog: v1.0...v1.0.1
v1.0 the total recall edition
Everything has changed! And yet, nothing has. Some defaults may. No, will - be different. It's hard to know which ones.
For those who can do so, it's recommended to use configure.py
to reconfigure your environment on this new release.
It should go without saying, but for those in the middle of a training run, do not upgrade to this release until you finish.
Refactoring and Enhancements:
-
Refactor
train.py
into a Trainer Class:- The core logic of
train.py
has been restructured into aTrainer
class, improving modularity and maintainability. - Exposes an SDK for reuse elsewhere.
- The core logic of
-
Model Family Unification:
- References to specific model types (
--sd3
,--flux
, etc.) have been replaced with a unified--model_family
argument, streamlining model specification and reducing clutter in configurations.
- References to specific model types (
-
Configuration System Overhaul:
- Switched from
.env
configuration files to JSON (config.json
), with multiple backends supporting JSON configuration loading. This allows more flexible and readable configuration management. - Updated the configuration loader to auto-detect the best backend when launching.
- Switched from
-
Enhanced Argument Handling:
- Deprecated old argument references and moved argument parsing to
helpers/configuration/cmd_args.py
for better organization. - Introduced support for new arguments such as
--model_card_safe_for_work
,--flux_schedule_shift
, and--disable_bucket_pruning
.
- Deprecated old argument references and moved argument parsing to
-
Improved Hugging Face Integration:
- Modified
configure.py
to avoid asking for Hugging Face model name details unless required. - Added the ability to pass the SFW (safe-for-work) argument into the training script.
- Modified
-
Optimizations and Bug Fixes:
- Fixed several references to learning rate (lr) initialization and corrected
--optimizer
usage. - Addressed issues with attention masks swapping and fixed the persistence of text encoders in RAM after refactoring.
- Fixed several references to learning rate (lr) initialization and corrected
-
Training and Validation Enhancements:
- Added better dataset examples with support for multiple resolutions and mixed configurations.
- Configured training scripts to disable gradient accumulation steps by default and provided better control over training options via the updated config files.
-
Enhanced Logging and Monitoring:
- Improved the handling of Weights & Biases (wandb) logs and updated tracker argument references.
-
Documentation Updates:
- Revised documentation to reflect changes in model family handling, argument updates, and configuration management.
- Added guidance on setting up the new configuration files and examples for multi-resolution datasets.
-
Miscellaneous Improvements:
- Enabled support for NSFW tags in model cards enabled by default.
- Updated
train.sh
to minimal requirements, reducing complexity and streamlining the training process.
More detailed change log
- lycoris model card updates by @bghira in #820
- Generate and store attention masks for T5 for flux by @AmericanPresidentJimmyCarter in #821
- Fix validation by @AmericanPresidentJimmyCarter in #822
- backwards-compatible flux embedding cache masks by @bghira in #823
- merge by @bghira in #824
- parquet add width and height columns by @frankchieng in #825
- quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
- remove clip warning by @bghira in #827
- update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
- Fix caption_with_blip3.py on CUDA by @anhi in #833
- fix quanto resuming by @bghira in #834
- lycoris: resume should use less vram now by @bghira in #835
- (#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837
- quanto + deepspeed minor fixes for multigpu training by @bghira in #839
- deepspeed sharding by @bghira in #840
- fix: only run save full model on main process by @ErwannMillon in #838
- merge by @bghira in #841
- clean-up by @bghira in #842
- follow-up fixes for quanto limitation on multigpu by @bghira in #846
- merge by @bghira in #850
- (#851) remove shard merge code on load hook by @bghira in #853
- csv backend updates by @williamzhuk in #645
- csv fixes by @bghira in #856
- add schedulefree optim w/ kahan summation by @bghira in #857
- merge by @bghira in #858
- merge by @bghira in #861
- schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
- fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
- (#519) add side by side comparison with base model by @bghira in #865
- merge fixes by @bghira in #870
- (#864) add flux final export for full tune by @bghira in #871
- wandb gallery mode by @bghira in #872
- sdxl: dtype inference followup fix by @bghira in #873
- merge by @bghira in #878
- combine the vae cache clear logic with bucket rebuild logic by @bghira in #879
- flux: mobius-style training via augmented guidance scale by @bghira in #880
- track flux cfg via wandb by @bghira in #881
- multigpu VAE cache rebuild fixes; random crop auto-rebuild; mobius flux; json backend now renamed to discovery ; wandb guidance tracking by @bghira in #888
- fixing typo in flux document for preserve_data_backend_cache key by @riffmaster-2001 in #882
- reintroduce timestep dependent shift as an option during flux training for dev and schnell, disabled by default by @bghira in #892
- adding SD3 timestep-dependent shift for Flux training by @bghira in #894
- fix: set optimizer details to empty dict w/ deepspeed by @ErwannMillon in #895
- fix: make sure wandb_logs is always defined by @ErwannMillon in #896
- merge by @bghira in #900
- Dataloader Docs - Correct caption strategy for instance prompt by @barakyo in #902
- refactor train.py into Trainer class by @bghira in #899
- Update TRAINER_EXTRA_ARGS for model_family by @barakyo in #903
- Fix text encoder nuking regression by @mhirki in #906
- added lokr lycoris init_lora by @flotos in #907
- Fix Flux schedule shift and add resolution-dependent schedule shift by @mhirki in #905
- Swap the attention mask location, because Flux swapped text and image… by @AmericanPresidentJimmyCarter in #908
- support toml, json, env config backend, and multiple config environments by @bghira in #909
- Add
"none"
to --report_to argument by @twri in #911 - Add support for tiny PEFT-based Flux LoRA based on TheLastBen's post on Reddit by @mhirki in #912
- Update lycoris_config.json.example with working defaults by @mhirki in #918
- fix constant_with_warmup not being so constant or warming up by @bghira in #919
- follow-up fix for setting last_epoch by @bghira in #920
- fix multigpu schedule issue with LR on resume by @bghira in #921
- multiply the resume state step by the number of GPUs in an attempt to overcome accelerate v0.33 issue by @bghira in #922
- default to json/toml before the env file in case multigpu is configured by @bghira in #923
- fix json/toml configs str bool values by @bghira in #924
- bypass some "helpful" diffusers logic that makes random decisions to run on CPU by @bghira in #925
- v1.0 merge by @bghira in #910
*...
v0.9.3.9 - bugfixes, better defaults
What's Changed
LyCORIS
- lycoris model card updates by @bghira in #820
- lycoris: resume should use less vram now by @bghira in #835
- update lycoris to dev branch, 3.0.1dev3 by @bghira in #828
- (#644) temporarily block training on multi-gpu setup with quanto + PEFT, inform user to go with lycoris instead by @bghira in #837
Flux
- Generate and store attention masks for T5 for flux by @AmericanPresidentJimmyCarter in #821
- backwards-compatible flux embedding cache masks by @bghira in #823
- remove clip warning by @bghira in #827
- fix quanto resuming by @bghira in #834
- (#864) add flux final export for full tune by @bghira in #871
- flux: mobius-style training via augmented guidance scale by @bghira in #880
- track flux cfg via wandb by @bghira in #881
Misc features
- add schedulefree optim w/ kahan summation by @bghira in #857
- schedulefree: return to previous stable settings and add a new preset for aggressive training by @bghira in #862
- (#519) add side by side comparison with base model by @bghira in #865
- wandb gallery mode by @bghira in #872
- combine the vae cache clear logic with bucket rebuild logic by @bghira in #879
Misc bug-fixes
- parquet add width and height columns by @frankchieng in #825
- quanto: remove warnings about int8/fp8 confusion as it happened so long ago now; add warning about int4 by @bghira in #826
- Fix caption_with_blip3.py on CUDA by @anhi in #833
- quanto + deepspeed minor fixes for multigpu training by @bghira in #839
- deepspeed sharding fixes by @bghira in #840
- fix: only run save full model on main process by @ErwannMillon in #838
- (#851) remove shard merge code on load hook by @bghira in #853
- csv backend updates by @williamzhuk in #645
- csv fixes by @bghira in #856
- fix validation image filename only using resolution from first img, and, unreadable/untypeable parenthesis by @bghira in #863
- sdxl: dtype inference followup fix by @bghira in #873
New Contributors
- @anhi made their first contribution in #833
- @ErwannMillon made their first contribution in #838
- @williamzhuk made their first contribution in #645
Full Changelog: v0.9.8.3.2...v0.9.3.9
v0.9.3.8.2 - 2 releases a day keeps the bugs away
What's Changed
- fix merged model being exported by accident instead of normal lora safetensors after LyCORIS was introduced by @bghira in #817
Full Changelog: v0.9.8.3.1...v0.9.8.3.2
v0.9.8.3.1 - state dict fix for final resulting safetensors
Minor, but important - the intermediary checkpoints weren't impacted before, just part of the weights ended up mis-labeled.
What's Changed
- state dict export for final pipeline by @bghira in #812
- lycoris: disable text encoder training (it doesn't work here, yet)
- state dict fix for the final pipeline export after training by @bghira in #813
- lycoris: final export fix, correctly save weights by @bghira in #814
- update lycoris doc by @bghira in #815
- lycoris updates by @bghira in #816
Full Changelog: v0.9.8.3...v0.9.8.3.1
v0.9.8.3 - essential fixes and improvements
What's Changed
General
- Non-BF16 capable optimisers removed in favour of a series of new Optimi options
- new crop_aspect option
closest
that usescrop_aspect_buckets
as a list of options - fewer images are discarded, minimum image size isn't set by default for you any longer
- better behaviour with mixed datasets, more equally sampling large and small sets
- caveat dreambooth training now probably wants
--data_backend_sampling=uniform
instead ofauto-weighting
- caveat dreambooth training now probably wants
- multi-caption fixes, it was always using the first caption before (whoops)
- TF32 now enabled by default for users with
configure.py
- New arguments for
--custom_transformer_model_name_or_path
to use a flat repository or local dir containing just the transformer model - InvernVL captioning script contributed by @frankchieng
- ability to change constant learning rate on resume
- fix SDXL controlnet training, allowing it to work with quanto
- DeepSpeed fixes, caveat broken validations
Flux
- New LoRA targets
ai-toolkit
andcontext-ffs
, with context-ffs behaving more like text encoder training - New LoRA training resumption support via
--init_lora
- LyCORIS support
- Novel attention masking implementation via
--flux_attention_masked_training
thanks to @AmericanPresidentJimmyCarter (#806) - Schnell
--flux_fast_schedule
fixed (still not great)
Pull Requests
- Fix --input_perturbation_steps so that it actually has an effect by @mhirki in #772
- add
ai-toolkit
option in --flux_lora_target choices by @benihime91 in #773 - Create caption_with_internvl.py by @frankchieng in #778
- Add LyCORIS training to SimpleTuner by @AmericanPresidentJimmyCarter in #776
- (#782) fix type comparison in configure script by @bghira in #783
- update path in documentation by @yggdrasil75 in #784
- Add Standard as default LoRA type by @AmericanPresidentJimmyCarter in #787
- Lora init from file by @kaibioinfo in #789
- wip: optimi by @bghira in #785
- add new lora option for context+ffs by @kaibioinfo in #795
- add auto-weighting for dataset selection with user probability modulation by @bghira in #797
- fix for sampling population smaller than request by @bghira in #802
- fix instance prompt sampling multiple prompts always taking the first… by @bghira in #801
- Fixes for --flux_fast_schedule by @mhirki in #803
- tf32, custom transformer paths, and error log for short batch, add fixed length calc by @bghira in #805
- Add attention masking to the custom helpers/flux/transformer.py by @AmericanPresidentJimmyCarter in #806
New Contributors
- @benihime91 made their first contribution in #773
- @frankchieng made their first contribution in #778
- @yggdrasil75 made their first contribution in #784
- @kaibioinfo made their first contribution in #789
Full Changelog: v0.9.8.2...v0.9.8.3
v0.9.8.2 - fixed comfyUI inference, better aspect bucketing limits
Main issues solved:
- Updated quickstart/flux documentation to have newer recommendations.
- New
pixel_area
mode forresolution_type
reduces guesswork. - bfloat16 error
- comfyUI not reading the LoRAs created this week. that is my fault, it is now back to the way of intiialising adapters we used in the prev release.
- stick with the latest release tag via
git checkout v0.9.8.2
to avoid issues with the main branch creeping up on you in the future, if necessary
- stick with the latest release tag via
- batch sizes no longer need to be perfectly aligned with the dataset
Thank you for testing the changes and helping make it better.
What's Changed
- fix sentencepiece dep, hub username missing in model card by @bghira in #727
- Make tokeniser 2 load failure permanent by @bghira in #729
- Make Docker foolproof by @komninoschatzipapas in #732
- minor fixes follow-up by @bghira in #744
- refactor train.py to reduce LoCs. by @sayakpaul in #739
- add pixel_area resolution type by @bghira in #756
- for #757 - refactor file discovery, update provider req by @bghira in #758
- Add input perturbation by @mhirki in #723
- merge follow-ups and other fixes to release branch by @bghira in #764
- Follow up from #739 by @sayakpaul in #762
- update defaults + docs by @bghira in #767
- follow-up residuals by @bghira in #768
- residuals by @bghira in #769
- fix double divide by @bghira in #770
- revert from get_peft_model to add_adapter by @bghira in #771
Full Changelog: v0.9.8.1...v0.9.8.2
v0.9.8.1 - much-improved flux training
Dreambooth results from this release
What's Changed
- Quantised Flux LoRA training (but only non-quantised models can resume training still)
- More VRAM and System memory use reductions contributed by team and community members
- CUDA 12.4 requirement bump as well as blacklisting of Python 3.12
- Dockerfile updates, allowing deployment of the latest build without errors
- More LoRA training options for Flux Dev
- Basic (crappy) Schnell training support
- Support for preserving Flux Dev's distillation or introducing CFG back into the model for improved creativity
- CFG skip logic to ensure no blurry results on undertrained LoRAs without requiring a CFG-capable base model
Detailed change list
- release: follow-ups, memory reduction, quanto LoRA training by @bghira in #638
- quanto: allow further vram reduction with bf16 base weights, reorder model loading operations to evacuate text encoders before DiT loads by @bghira in #647
- merge vram reductions & regression fixes by @bghira in #649
- CUDA 12.4; more efficient model loading, lower overall sysmem / vram usage; enforce hashed VAE object names by default; reduce default VAE batch size; change default LR scheduler to polynomial by @bghira in #660
- update something about kolor in the train.py by @chongxian in #658
- refactor saving utitilites part ii by @sayakpaul in #661
- Residues of #661 by @sayakpaul in #662
- Update Dockerfile by @komninoschatzipapas in #675
- merge: reorder model casting, fix kolors on newer diffusers, update cuda, refactor save hooks by @bghira in #666
- Flux: Purge text encoders from system RAM. by @mhirki in #694
- [Tests] basic save hook tests by @sayakpaul in #679
- Add the ability to train all nn.Linear for flux by @AmericanPresidentJimmyCarter in #707
- Add real guidance scale to flux validation (optional) by @AmericanPresidentJimmyCarter in #706
- flux: import some of kohya suggestions (wip) by @bghira in #711
- Add CFG skip for Flux lora training, fix precomputed embeds bug for C… by @AmericanPresidentJimmyCarter in #712
- fixed flux training by @bghira in #714
- final flux updates for release by @bghira in #716
New Contributors
- @chongxian made their first contribution in #658
Full Changelog: v0.9.8...v0.9.8.1
v0.9.8 - flux 24 gig training has entered the chat
Flux
It's here! Runs on 24G cards using Quanto's 8bit quantisation, or 25.7G on a Macbook system (slowly)!
If you're after accuracy, a 40G card will do Just Fine, with 80G cards being somewhat of a sweet spot for larger training efforts.
What you get:
- LoRA, full tuning (but probably just don't do that)
- Documentation to get you started fast
- Probably better for just square crop training for now - might artifact for weird resolutions
- Quantised base model unlocking the ability to safely use Adafactor, Prodigy, and other neat optimisers as a consolation prize for losing access to full bf16 training (AdamWBF16 just won't work with Quanto)
What's Changed
- trainer: simplify check by @bghira in #592
- documentation updates, apple pytorch 2.4 by @bghira in #595
- staged storage for image embed support by @bghira in #596
- fix: loading default image embed backend by @bghira in #597
- fix: loading default image embed backend by @bghira in #598
- multi-gpu console output improvements by @bghira in #599
- vae cache: hash_filenames option for image sets by @bghira in #601
- multi-gpu console output reduction by @bghira in #602
- fix for relative cache directories with NoneType being unsubscriptable by @bghira in #603
- multigpu / relative path fixes for caching by @bghira in #604
- backend for csv based datasets by @bghira in #600
- CSV data backend by @bghira in #605
- config file versioning to allow updating defaults without breaking backwards compat by @bghira in #607
- config file versioning for backwards compat by @bghira in #608
- experiment: small DiT model by @bghira in #609
- merge by @bghira in #610
- Fix crash when using jsonl files by @swkang-rp in #611
- merge by @bghira in #612
- flux training by @bghira in #614
- update base_dir to output_dir by @bghira in #615
- merge by @bghira in #616
- flux: validations should ignore any custom schedulers by @bghira in #618
- release: flux by @bghira in #617
- bugfix: correctly set hash_filenames to true or false for an initial dataset creation by @bghira in #620
- release: minor follow-up fixes by @bghira in #628
- Flux: Fix random validation errors due to some tensors being on the cpu by @mhirki in #629
- Improve config support for transformers with accelerate by @touchwolf in #630
- quanto: exploring low-precision training. by @bghira in #622
- remove all text encoders from memory correctly by @bghira in #637
New Contributors
- @swkang-rp made their first contribution in #611
- @touchwolf made their first contribution in #630
Full Changelog: v0.9.7.8...v0.9.8
v0.9.7.8 - Kwai Kolors
What's Changed
Kolors is now a first-class citizen with full training support (no ControlNet).
Full Changelog: v0.9.7.7...v0.9.7.8