Getting Started
In this guide, we’ll walk through how to train LORAs for the Flux AI Image generation model end to end including:
- Tools and tips for building and organizing the dataset
- Captioning options
- Training the model using a Comfy UI workflow
- Testing the LORA
This method was used to generate:
Specs Required
This guide was written with a 24GB Geforce RTX 3090 and 24GB RAM. I have heard Flux Dev training gets an OOM error when using a 16GB 4090.
Flux Dev or Flux Schnell?
We will be using the Flux Dev model, but the Flux Schnell model should work in a similar way.
If you need to download the dev model, go here. You’ll need to login and accept the ToS. Place the model in your checkpoints folder.
Preparing a Dataset
As often is the case, preparation is key to success here. You must have high quality, properly captioned images of your subject or style to produce a good LORA.
How many images you need in your dataset depends on your subject.
If you’re making a LORA of a person or something specific, 25 images will work fine.
For general styles/looks, LORAs that can work across a variety of prompts/settings, or otherwise conceptual LORAs, 100-200 images has worked well for me.
The larger your dataset is, the more chance you will have blurry or incorrectly captioned images.
Also, using images of various sizes and aspect ratios will produce better final results. We’ll walk through choosing specific resolutions later.
Finding Images
If you’re building a conceptual or style LORA, I’d recommend spending some time looking for a good image source/archive. (The source of the NASA LORA shown above was the APOD NASA archive site.)
If you’re manually collecting images across the web or out of a big archive, check out PureRef which lets you drag and drop any image into an infinite canvas and organize, resize, and save them as one file.
If you’ve found a large set of images somewhere, like in the NASA example, I have had a TON of success using ChatGPT to write Python scraping scripts to download the images using Scrapy in a single prompt. You can also do this with YouTube videos using similar libraries.
If you scrape a large set of images, it’s worth asking the LLM to also sort them into different aspect ratio folders, which will save time later. Example ChatGPT Prompt Here
Organizing and Normalizing Images
Before we train our LORA, we need to properly organize our dataset images into folders based on their input size. If you have very large images, you may also need to resize them.
Resizing Images
The LORA training script will resize input images, but doing this beforehand gives you more control over the input images and lets you see poor resizes before you spend hours training, so this step is recommended.
Even if you do not manually resize your images, or they don’t need to be resized, the images need to be organized into folders by resolution, unless you are training on a single resolution.
Imagemagick
If you need to resize or crop images in bulk, or if you need to normalize images of a similar aspect ratio to specific dimensions, see Imagemagick.
ChatGPT/LLMs can help you write good convert
or mogrify
(change images in place) commands for your dataset.
Choosing Resolutions
Before training, you’ll need to pick your 1-3 resolutions/aspect ratios from the tables below for your dataset. Create datasets/Your_Loras_name/
folder(s) in your ComfyUI folder. Inside, create a folder for each of your chosen resolutions. I have been using this list of resolutions.
It is possible to have images or resize them to resolutions that will fail to process in the training step, so it’s recommended to use a resolution from the table.
The maximum resolutions will take 2-3 days to train to 3000 steps on similar hardware. Only use the minimum resolutions if your source images are very small.
AR | Minimum | Recommended | Maximum |
---|---|---|---|
1:1 | 320 x 320 | 1024 x 1024 | 1408 x 1408 |
3:2 | 384 x 256 | 1216 x 832 | 1728 x 1152 |
4:3 | 448 x 320 | 1152 x 896 | 1664 x 1216 |
16:9 | 448 x 256 | 1344 x 768 | 1920 x 1088 |
21:9 | 576 x 256 | 1536 x 640 | 2176 x 960 |
Image Review and Moving Images
For large datasets, review and remove any poor quality, irrelevant, or duplicate images now.
If your images have watermarks, they will impact the LORA results. Consider using an img2img workflow with ComfyUI’s mask editor (right click on the load image node) to remove them.
I like to review images in bulk using XN View MP which also lets you sort files by image dimensions, which can be useful if your scraping script didn’t sort the images for you.
Captioning
Download the Captioning ComfyUI Workflow here.
For small datasets, I’ve had good success manually captioning images. Even if you use the automatic method below, you should review/modify them for best results.
To automatically caption large datasets, we’ll use Miaoshouai Tagger which is fine-tuned using Civit.ai image tags and images. You can use the workflow below to bulk caption your images.
Caption files need to be txt files in the same folder as the image with the exact same name. Example: coolLora/myimage.jpg coolLora/myimage.txt
Training
Download the Lora Training Comfy UI Workflow here.
Training should take 2-8 hours with proper settings and using reasonably sized images, even with very large datasets. If things are running too slow (you can see your it/s in the console), try lowering your image resolutions.
Running the Training Workflow
Enable/Disable the 3 Dataset Buckets, enter the path to your image/caption folders, and set the dimensions
In the Lora Training Config section, enter your Lora name, trigger word, save directory, and see other options.
Make sure you load the correct Transformer and T5
Sample Prompts will get generated at each loop (750 steps by default)
Other optional training settings can be found in the Settings group
Testing
Download the ComfyUI Flux LORA Testing Workflow here.
Your Lora and the intermediate steps should be saved to your output location. Move the LORA to your ComfyUI/models/loras
folder and you’re ready to use your new LORA!
For testing various prompts, strengths and settings, try the Lora Testing workflow linked above. This will generate 2x1 grids with the Lora on and off using configurable strength ranges.
Share!
Make sure to share your LORA (unless it’s of you or your dog) with the CivitAI community and, if you used this guide, make sure to leave a link to your LORA in the comments below.
Check out other useful ComfyUI Workflows in this Github Repo.
Appendix: Tools and Workflows
ChatGPT Example Chat for Data Scraping - For generating Python scraping scripts, captioning commands, and image processing workflows. ComfyUI Workflows Repository - Collection of useful workflows. Miaoshouai Tagger Github - Automatically captions large datasets using Civit.ai image tags.Click to expand
Flux Models
Workflows
Dataset Preparation Tools
Other