The Stable Diffusion webui known as A1111 among users is the preferred graphical user interface for proficient users. Due to the enthusiastic community, most new features are introduced to this free Stable Diffusion GUI first. However, it may not be the simplest software to operate and the lack of documentation may add to the challenge. The extensive range of functions it provides can also be overwhelming.
You have the option of using AUTOMATIC1111 on Google Colab, Windows, or Mac. Refer to the Quick Start Guide to determine which Stable Diffusion is suitable for you if you are new to it.
This guide aims to instruct you on how to use AUTOMATIC1111 GUI. It can be used as a tutorial with numerous examples to follow step-by-step. Additionally, it can be utilized as a reference manual. You can skim through it and revisit it when you need to use a specific function.
Numerous examples are included to demonstrate the impact of a setting because it is deemed as the most effective way to clarify its function.
Upon launching the GUI, you will immediately notice the presence of the txt2img tab. Its primary purpose is to perform the fundamental function of Stable Diffusion, which involves transforming a given text prompt into images.
Stable Diffusion Checkpoint is the tool you need to generate your desired images. You can select the model you want to use, with the v1.5 base model being recommended for first-time users, download it and save in the models folder under stable diffusion subfolder of the automatic1111 installation folder.
To create your image, start by describing the painting of a cat you want to see, inspired by Picasso artwork or another artists of your choice. Aim for a dreamlike quality with a dark, moody atmosphere that captures the essence of Picasso art.
Specify the width and height of the output image, with at least one side set to 512 pixels when using a v1 model. For instance, you can choose a portrait image with a 2:3 aspect ratio and set the width to 512 and height to 768.
Don’t forget to set the batch size, which determines the number of images generated each time. We recommend generating a few images to test the prompt as each one will be different.
Finally, click the Generate button and wait for the image(s) to be created.
By default, when using the Stable Diffusion Checkpoint, you will receive an additional image of composite thumbnails, which provides a quick overview of the generated images.
If you wish to save a specific image to your local storage, you can select it from the thumbnails displayed below the main image canvas. Once you’ve chosen your desired image, right-click on it to bring up the context menu. From there, you can choose to either save the image or copy it to the clipboard.
This covers the basics of image selection and saving. However, if you want a more in-depth understanding of each function, continue reading the rest of this section for a detailed explanation of each feature. Keep in mind that in automatic1111 GUI settings you can save by default every image created in PNG or JEG format.
Image generation parameters
The Stable Diffusion checkpoint provides a dropdown menu for selecting models, and to use it, you must place the model files in the folder stable-diffusion-webui > models > Stable-diffusion. To update the list of models, you can click the refresh button next to the dropdown menu after adding a new model.
To generate images, you need to enter a prompt in the text box. It is important to be detailed and specific in your prompt, using try-and-true keywords. If you want to exclude specific elements from your image, you can enter a negative prompt in the respective text box. Negative prompts are recommended when using v2 models, and you can use a universal negative prompt for this purpose.
Here’s an example of prompt input keywords:
|Portrait||Focuses image on the face / headshot.|
|Digital painting||Digital art style|
|Concept art||Illustration style, 2D|
|Ultra realistic illustration||drawing that are very realistic. Good to use with people|
|Underwater portrait||Use with people. Underwater. Hair floating|
|Underwater steampunk||underwater with wash color|
These keywords refine the art style beyond your initial prompt:
|hyperrealistic||improve details and resolution|
|Modernist||saturated color, high contrast|
|art nouveau||Add details, building style|
This are for the kind of output you desire in terms of resolution:
|unreal engine||Very realistic and detailed 3D|
|sharp focus||Increase resolution|
|8k||Increase resolution, though can lead to it looking more fake. Makes the image more camera like and realistic|
|vray||3D rendering best for objects, landscape and building.|
The Sampling method determines the algorithm for the denoising process, with DPM++ 2M Karras being a recommended option for balancing speed and quality. Ancestral samplers, marked with an “a,” may lead to unstable images even with large sampling steps, making tweaking the image difficult. Sampling steps determine the number of steps for the denoising process, with 25 steps generally working for most cases, 20 is the minimum with “Euler a” sampling method.
The dimensions of the resulting image can be adjusted by setting the width and height values. For v1 models, it is recommended to have at least one side of the image set to 512 pixels, while for the v2-768px model, at least one side should be 768 pixels.
Batch count refers to the number of times the image generation process is run, while batch size determines the number of images generated each time the process is executed. The total number of images generated can be calculated by multiplying the batch count with the batch size. It is generally preferable to adjust the batch size for faster processing, while batch count may need to be adjusted if memory issues arise.
CFG scale, or Classifier Free Guidance scale, is a parameter that controls the extent to which the model adheres to the provided prompt. Setting a value of 1 will result in the prompt being mostly ignored, while a value of 30 will lead to strict adherence to the prompt. A value of 7 provides a good balance between following the prompt and allowing for creative freedom. It is important to avoid setting CFG values too high or low, as this can result in unstable behavior or overly saturated colors in the generated images. The effect of changing CFG can be observed in sample images with fixed seed values.
Seed refers to the initial random tensor in the latent space, which is used to generate the image. Each generated image has its own unique seed value, and setting the seed allows for the content of the image to be fixed while the prompt can be adjusted. If AUTOMATIC1111 is set to -1, a random seed value will be used.
Fixing the seed is often done to maintain the same content while adjusting the prompt. For example, if an image was generated using a specific prompt, fixing the seed would allow for the same content to be used while tweaking the prompt to generate variations of the same image.
If you like this dog image, put the seed “2603366125” in the seed interface:
Now add the term “small hat” to the prompt “photo of dog, dress, city night background, small hat”:
The composition of the generated scene can be significantly altered by strong keywords, potentially resulting in a completely different image. In such cases, experimenting with swapping in a keyword at a later sampling step may be useful to achieve the desired result. in this particular case the introduction of the keyword “small hat” remove one of the front leg of the dog. These issues might happen often with stable diffusion.
To set the seed back to a random value (-1), you can look for the dice icon, which is usually located near the seed value input field. Clicking on the dice icon will generate a new random seed value, allowing for the creation of a completely new image.
The high-resolution fix option applies an upscaler to increase the size of the generated image. This is necessary because the native resolution of Stable Diffusion is only 512 pixels (or 768 pixels for certain v2 models), which may be too small for many use cases.
Simply setting the width and height to higher values like 1024 pixels would deviate from the native resolution and could result in issues with image composition, such as generating images with multiple heads or other distortions. Therefore, it is necessary to first generate a small image of 512 pixels on either side, and then scale it up to a larger size using an upscaler. This approach ensures that the image composition remains stable and the resulting image is of high quality.
Check Hires. fix to enable high-resolution fix then to enhance the image resolution, select an upscaler. Refer to this article for an overview.
The Latent upscaler options offer several methods to upscale the image in the latent space. This is done following the text-to-image generation sampling steps, similar to image-to-image processing. For other upscalers, a combination of traditional and AI techniques is utilized.
If you choose a Latent upscaler, consider adjusting the Hires steps. This parameter determines the number of sampling steps performed after the latent image is upscaled.
The denoising strength parameter, also applicable only to Latent upscalers, controls the amount of noise added to the latent image before Hires sampling steps are executed. It functions similarly to its use in image-to-image processing.
To achieve a clear image, the denoising strength parameter should be greater than 0.5. However, setting it too high may significantly alter the image.
The advantage of employing a latent upscaler is that it avoids the upscaling artifacts that other upscalers, such as ESRGAN, may produce. The Stable Diffusion decoder generates the image, ensuring consistent style. However, depending on the denoising strength value, the images may be altered to some degree.
The upscale factor determines the image’s size, indicating how many times larger it will be. For instance, setting it to 2 scales a 512-by-768 pixel image to 1024-by-1536 pixels. Alternatively, the “resize width to” and “resize height to” options can be specified to establish the new image size.
To avoid the complexity of determining the appropriate denoising strength, you could use an AI upscaler like ESRGAN. In general, splitting the txt2img and upscaling into two stages offers more versatility. Rather than utilizing the high-resolution fix option, the Extra page can be used to execute upscaling.
Image file actions
You will notice a series of buttons that allow you to execute several functions on the generated images. These buttons are located from left to right and are as follows:
Open folder: This button opens the output folder where the images are saved. However, it may not work on all systems.
Save: This button allows you to save a single image. Once clicked, it will display a download link below the buttons. If you select the image grid, it will save all images.
Zip: This button allows you to compress the selected image(s) for download.
Send to img2img: By clicking this button, you can send the chosen image to the img2img tab.
Send to inpainting: This button sends the selected image to the inpainting tab, located within the img2img tab.
Send to extras: This button transfers the selected image to the Extras tab.
The img2img tab is primarily used for image-to-image functions such as inpainting and image transformation. Many users visit this tab to perform these tasks.
In the img2img tab, one common use case is to create new images that follow the composition of the base image using image-to-image functions.
Here are the steps to achieve this:
- Step 1: Start by dragging and dropping the base image onto the img2img tab on the img2img page.
- Step 2: Adjust the width or height to match the aspect ratio of the new image. The image canvas will display a rectangular frame indicating the aspect ratio. For instance, you could set the width to 760 and keep the height at 512 for a landscape image.
- Step 3: Set the sampling method and sampling steps. For example, you could use DPM++ 2M Karass with 25 steps.
- Step 4: Set the batch size to 4.
- Step 5: Write a prompt for the new image. For instance, you could use “a photorealistic dragon”.
- Step 6: Click the Generate button to generate images. Adjust the denoising strength (0.3, 0.6 and 0.8) and repeat the process for different outputs.
Several settings are shared with txt2img, and only the new ones are explained. For instance, you can use the following resize modes to reconcile differences in aspect ratios:
- Just resize: scales the input image to fit the new image dimension, which may cause stretching or squeezing of the image.
- Crop and resize: fits the new image canvas into the input image, preserving the aspect ratio of the original image while removing parts that don’t fit.
- Resize and fill: fits the input image into the new image canvas and fills the extra part with the average color of the input image, preserving the aspect ratio.
- Just resize (latent upscale): similar to “Just resize”, but the scaling is done in the latent space. Set the denoising strength to a value larger than 0.5 to avoid producing blurry images.
The denoising strength parameter controls the amount of change in the generated image. A setting of 0 means no changes will occur, while a value of 1 means the generated image will not follow the input image. A good starting point is usually a denoising strength of 0.75, at 0.8 our cat is disappeared and there is a dragon.
Additionally, you can use the built-in script “poor man’s outpainting” to extend an image outside its boundaries.
Rather than uploading an image, it is also possible to create an initial picture by sketching it. To enable the color sketch tool, use the following argument when launching the webui:
- Step 1: Go to the Sketch tab on the img2img page.
- Step 2: Upload a background image to the canvas. You can choose either the black or white backgrounds provided below.
- Step 3: Use the color sketch tool to draw your creation on the canvas.
- Step 4: Write a prompt for the new image. For example, “Award winning house”.
- Step 5: Click the Generate button to generate images based on your sketch and prompt.
Instead of creating a sketch from scratch, you can use the sketch function to modify an existing image. For example, you can remove an element from an image by painting over it and then applying image-to-image processing. To achieve this, use the eye dropper tool to select a color from the surrounding area and then use the paintbrush tool to cover the element you want to remove. Once you’ve finished your modifications, you can apply image-to-image processing by following the steps outlined in the previous sections.
The inpainting function is one of the most frequently used features in the img2img tab. Suppose you have created an image that you are satisfied with in the txt2img tab, but there is a minor flaw that you would like to fix. For instance, the facial features may be distorted or unclear. In such a scenario, you can use the “Send to inpaint” button to transfer the image from the txt2img tab to the img2img tab, where you can use the inpainting function to regenerate the affected area.
When you switch to the Inpaint tab of the img2img page after sending your image from the txt2img tab using the “Send to inpaint” button, you should be able to view your image. To regenerate the garbled area, you can use the paintbrush tool to create a mask over the affected region.
By utilizing the “Send to Inpaint” feature, you have accurately configured parameters such as image sizes. To further optimize the image, follow these steps:
- Adjust the denoising strength: Begin with a value of 0.75, and reduce it for greater changes, or increase it for minor adjustments.
- Select the original option for Mask Content.
- Choose “Inpaint masked” for Mask Mode.
- Set the batch size to 4.
- Click the Generate button and select the preferred option from the results.
Inpaint sketch is a unique feature that blends the capabilities of both inpainting and sketching tools. With Inpaint sketch, you can create sketches and paintings with precision, and only the painted area will be regenerated. This means that the unpainted area will remain untouched, resulting in a refined and seamless finish. Here’s an example to illustrate this feature.
Inpaint upload is a software tool that allows you to remove unwanted objects or blemishes from an image. Typically, when using this tool, you would need to manually draw a mask around the area you wish to remove. However, with Inpaint upload, you have the added convenience of being able to upload a separate mask file instead of drawing it yourself.
This feature can be particularly useful if you have a pre-existing mask file that you would like to use, or if you prefer to use a more advanced masking tool to create your mask. By uploading a separate mask file, you can save time and ensure that your mask is accurate and precisely tailored to your needs.
Overall, the ability to upload a separate mask file with Inpaint upload is a valuable feature that can make image editing tasks easier and more efficient.
The Batch feature of Inpaint allows you to inpaint or perform image-to-image operations on multiple images at once. This can save you a significant amount of time and effort when dealing with a large number of images that require similar editing.
With the Batch feature, you can select multiple images and apply the same inpainting or image-to-image operations to all of them simultaneously. This can be particularly useful if you have a large collection of images that require the same type of editing, such as removing watermarks, logos, or other unwanted objects. Rather than manually editing each image one by one, you can simply select them all and let Inpaint’s Batch feature do the work for you.
Additionally, the Batch feature allows you to customize the settings for each individual image, so you can apply different levels of editing to different images as needed. This flexibility allows you to tailor your editing process to the specific requirements of each image, while still benefiting from the time-saving capabilities of the Batch feature.
Overall, the Batch feature in Inpaint is a powerful tool that can streamline your image editing workflow and help you achieve consistent, high-quality results across a large number of images.
Get prompt from an image
The Interrogate CLIP button by AUTOMATIC1111 is a helpful feature that automates the prompt-guessing process for images uploaded to the img2img tab. This is particularly useful when you are working with images that you don’t have a specific prompt for.
To use this feature, you would first navigate to the img2img page on AUTOMATIC1111’s platform. Then, you would upload an image that you want to work on to the img2img tab. Once the image is uploaded, you can simply click the Interrogate CLIP button to automatically generate a prompt based on the content of the image.
By automating the prompt-guessing process, the Interrogate CLIP button saves you time and effort while still providing a useful starting point for your image editing work. You can then use the generated prompt as a basis for further refining and customizing your edits, based on your specific goals and preferences.
Overall, the Interrogate CLIP button is a convenient feature that enhances the usability and functionality of AUTOMATIC1111’s image editing platform.
A prompt will appear in the prompt text box, when using an AI-powered image editing tool like AUTOMATIC1111, prompts are essential to generate the desired output however, when it comes to anime images, the Interrogate CLIP feature may not always provide the best prompts. In such cases, the Interrogate DeepBooru button can be used instead. This button is specifically designed to generate prompts for anime images, using a database of anime-related tags and keywords.
AUTOMATIC1111 provides a valuable solution for users who need to upscale their images. While there are AI upscalers available online, these may not always be easily accessible or affordable for everyone. By providing a free AI-powered image upscaling tool on its platform, this tool enables users to achieve high-quality enlargements without having to pay for an external service.
Moreover, upscaling tool uses advanced deep learning algorithms to produce high-quality results that are superior to traditional upscaling methods. This means that users can achieve greater clarity, detail, and sharpness in their upscaled images, which is particularly useful for tasks like printing or graphic design.
To upscale an image using AUTOMATIC1111, follow these steps:
- Go to the Extra page on the AUTOMATIC1111 website.
- Upload the image you want to upscale onto the image canvas.
- Set the desired scale factor under the “resize” label. This will determine how much larger the new image will be on each side.
- Choose an upscaler, such as the R-ESRGAN 4x+ for general-purpose upscaling.
Click the “Generate” button to generate a new image on the right side of the screen.
With AUTOMATIC1111’s advanced upscaling tools, users can achieve high-quality results with ease and convenience, without the need for expensive software or external services. By following these simple steps, anyone can upscale their images to meet their specific needs and achieve professional-grade results.
After upscaling an image using AUTOMATIC1111, it is important to carefully inspect the new image at its full resolution. This will help ensure that any artifacts or imperfections introduced by the upscaler are visible and can be addressed if necessary. One way to do this is to open the new image in a new tab and disable auto-fit, so that the image is displayed at its actual size.
Even if you don’t need to enlarge the image by a factor of 4x, it can still be beneficial to do so and then resize it later. This can help improve the sharpness and overall quality of the final image.
In addition to setting a scale factor, AUTOMATIC1111 also provides the option to specify the dimensions to resize in the “scale to” tab. This allows for more precise control over the final size of the image and can be useful in situations where specific dimensions are required.
AUTOMATIC1111 platform provides a range of upscalers for users to choose from, including both traditional and AI-based options. The Upscaler dropdown menu offers several built-in options, with the ability to install additional upscalers as well.
Traditional upscalers like Lanczos and Nearest have been around for some time, and while they may not offer the same level of power as AI-based upscalers, they are predictable in their behavior.
AI upscalers like ESRGAN, R-ESRGAN, ScuNet, and SwinIR are designed to literally make up content in order to increase resolution. Some are even trained for specific styles. To determine which upscaler works best for a particular image, it is recommended to test them out and closely examine the results at full resolution.
For situations where a combination of upscalers is desired, AUTOMATIC1111 also provides Upscaler 2, which allows users to blend the results of two upscalers together. The amount of blending can be controlled using the Upscaler 2 Visibility slider.
If none of the available upscalers meet a user’s needs, it is possible to install additional upscalers from external sources.
During the upscaling process, you have the option to restore faces using one of two available options: GFPGAN and CodeFormer. Enabling either of these options will apply correction to the faces in the image. It is recommended to set the visibility of the correction to the lowest value possible to avoid affecting the style of the image. To do this, adjust the visibility slider until the correction is just enough to restore the faces without drastically altering the overall style of the image.
Many Stable Diffusion GUIs, including AUTOMATIC1111, have a feature that writes the generation parameters to the png file of the generated image. This feature allows for quick and easy retrieval of the generation parameters. If the image was generated using AUTOMATIC1111, the Send to buttons can be used to quickly copy the parameters to various pages. This is particularly useful when you come across an image online and want to check if the prompt used to generate the image is still embedded in the file. Even for non-generated images, this function can be helpful in quickly sending the image and its dimensions to a page.
Checkpoint merger is a tool or software that allows you to merge two or more machine learning models or checkpoints into a single model. This can be useful in situations where you want to combine the strengths of different models, or when you want to continue training a model from a checkpoint but with additional data.
The checkpoint merger takes the weights and parameters from each checkpoint and combines them into a single checkpoint. This new checkpoint can then be used to initialize a new model or continue training an existing one.
Checkpoint merger is commonly used in deep learning applications, especially in fields such as computer vision and natural language processing where large amounts of data and complex models are used.
AUTOMATIC1111’s checkpoint merger allows users to combine up to three models to create a new model, which is typically used to mix the styles of two or more models. However, the resulting merge is not guaranteed to be desirable and may produce unwanted artifacts.
The primary models, A, B, and C, are the input models used for merging, and the merging is done according to the displayed formula, which varies depending on the interpolation method selected. The available interpolation methods are:
- No interpolation: This method uses only model A and is useful for file conversion or replacing the VAE.
- Weighted sum: This method merges two models, A and B, with a multiplier weight M applying to B. The formula for this method is A * (1 – M) + B * M.
- Add difference: This method merges three models using the formula A + (B – C) * M.
The checkpoint format options available in AUTOMATIC1111 are ckpt and SafeTensors. SafeTensors is a new model format developed by Hugging Face that is considered safer than ckpt because loading a SafeTensor model won’t execute any malicious codes, even if they are present in the model.
In the context of AUTOMATIC1111’s checkpoint merger, the “Bake in VAE” option allows you to replace the VAE (Variational Autoencoder) decoder of one of the input models (A, B, or C) with a better one released by Stability. The VAE decoder is responsible for generating the images from the latent space vectors produced by the VAE encoder. By replacing the VAE decoder with a better one, you can potentially improve the quality of the generated images.
To use this option, select the desired VAE decoder from the “Bake in VAE” dropdown menu. When the merged model is generated, the selected VAE decoder will be used in place of the original one in the input model. It’s important to note that not all models are compatible with all VAE decoders, so it’s recommended to experiment with different combinations to find the best results.
Training a model using the Train page typically involves the following steps:
Prepare the data: You need to prepare a dataset of input-output pairs for the model to learn from. For textual inversion, this would typically involve a collection of texts and their corresponding embeddings. For hypernetworks, this would involve a set of tasks and their corresponding network architectures.
Define the model: You need to define the architecture of the model you want to train. This involves specifying the number of layers, the activation functions, and the other hyperparameters of the model.
Set the training parameters: You need to specify the hyperparameters for training the model, such as the learning rate, batch size, number of epochs, etc.
Train the model: Once the data, model, and training parameters are set, you can start training the model. This involves feeding the input-output pairs to the model and updating the model parameters based on the error between the predicted output and the actual output.
Evaluate the model: Once the model is trained, you need to evaluate its performance on a validation set to ensure that it is not overfitting. You may also want to test the model on a separate test set to see how well it generalizes to new data.
Save the model: If the model performs well, you can save its parameters to a file so that you can use it later for prediction or further training.
It is important to note that training a model can be a complex and time-consuming process, and requires a good understanding of machine learning algorithms and techniques. It is recommended that you have some prior experience with machine learning before attempting to train a model using the Train page.
Settings and more
AUTOMATIC1111’s settings page provides users with an extensive list of customizable settings to optimize their experience with the tool. It is important to note that after making any changes to the settings, the user should click the “Apply settings” button to ensure that the changes are saved.
One of the most important settings to consider is the face restoration method. AUTOMATIC1111 offers two options for face restoration: GFPGAN and CodeFormer. CodeFormer is considered the better option, as it produces more natural-looking results.
Another useful setting is Stable Diffusion. This feature enables users to download and select a VAE (Variational Autoencoder) released by Stability to enhance the quality of the generated images, particularly the eyes and faces, in v1 models.
While there are numerous other settings available, it is recommended to experiment with them to determine which work best for your specific needs.
you can download VAE for Stable Diffusion v1.5 safetensor file to be used with automatic1111, save the file inside “models\VAE” folder. You can download also the stable diffusion v1.5 model (also safetensor) and save it in the “models\Stable-diffusion” folder. Both safetensor files are compatible with each other but don’t mess ckpt with safetensor otherwise they won’t work.