fallenmystic
Tribune
I'm glad to notice that we have more 3D artists starting to use AI to enhance their works lately.
I can't claim to be an expert on Stable Diffusion. But since I've spent relatively more time using the tool than most of the people on CF, I decided to share some of the tips that I think could help them along the journey:
I didn't intend to make it such a long post when I started to write it. I'll stop it here for now but I hope this thread can become a place where people can share their tips and workflows with each other regarding the AI-assisted image generation process, instead of arguing about it.
I can't claim to be an expert on Stable Diffusion. But since I've spent relatively more time using the tool than most of the people on CF, I decided to share some of the tips that I think could help them along the journey:
Hardware
- If you are looking to buy a PC that can be used for AI image generation, get an Nvidia card with the most VRAMs you can afford. Consider 6Gb to be the absolute minimum requirement for running Stable Diffusion(SD) locally.
- Get a tablet if you can. It doesn't matter if you can actually draw or not, but it'll make your life so much easier if you have one. There are a lot of tasks involving masking a region or drawing a crude sketch when using SD, which can be quite painful to do with a mouse.
- If you don't have a good enough PC to run SD locally, try one of the many websites that offer SD as an online service. There are even free ones like Stable Horde (you can even run the popular A11 client on top of it).
Software
- Automatic11's Web UI (or "A11's WebUI" for short) is the most popular choice for running SD locally. Personally, I recommend Vladmandic's fork, which is an opinionated version of the same client. If you want an easier, but less feature-rich client you may find one you like here.
- Using SD in conjunction with other image manipulation software isn't mandatory but strongly recommended. Especially, it'll give you a nice synergy if you can use a bitmap image editor (e.g. GIMP, Photoshop, Paint.NET), a drawing tool (e.g. Krita, Adobe Illustrator), or a photo retouching tool (e.g. Darktable).
- If you are an artist using familiar with a non-AI 2D/3D tool like Daz3D or Blender, try to find a way to incorporate SD into your workflow instead of replacing it. Especially, learn how to generate control map images from the tool you're familiar with. There are even plugins that can automate the process for certain programs.
General
- SD, or AI in general is a fast-moving field (it hasn't even passed a year since the first public release of SD!). So try to find a source from which you can get the latest news, new techniques, models, or workflows regarding the subject. r/StableDiffusion is a good place to start, and Civitai is invaluable if you want to find a model or a guide.
- Try to establish your own workflow. One of the charms of AI-assisted image generation is that there can be many different ways to achieve a goal. So, spend some time in experimenting and get hints from what other people are doing.
- See yourself as more of a film director than as a painter. Working with SD is like having a great team of actors, actresses, cinematographers, and all the other technicians needed to make a good film at your disposal. But it's your job to provide them with a good script and directions. You can't hope to make a good film by simply telling them something like "Hey, make me some cool fantasy story". You need to learn the ways to precisely express your ideas to your actors/actresses and the staff, and know what they can and cannot do. Sometimes, you may need to describe what you want in words while it might be better to draw them a sketch or show them a reference image at other times.
- Try to get yourself familiar with the whole ecosystem instead of focusing too much on prompting. This chart can be helpful if you want to learn about what tools/techniques are available currently and what they are in a nutshell.
- Try to establish an iterative workflow. Remember you can also use the output of one stage as the input of another, and this includes control net images and even steps involving external tools like Photoshop/Krita. For example, you can generate an image in img2img, convert it into a lineart a suitable preprocessor, edit it in an external program then use it as a control net input to generate another image. Be creative and do some experiments.
- If you're using A11's client or one of its variants, try to spend some time exploring the extension repository. Some extensions like Image Browser can be very handy while there are others which may open up interesting possibilities like making videos using SD.
Model
- One of the very first tasks you may want to do after installing SD locally or registering to a suitable online service is to find a good base model (or "checkpoint") that suits your taste. The default Stable Diffusion models aren't very impressive and are utterly lacking when it comes to generating NSFW content. Visit Civitai.com and filter for checkpoints.
- This might be a subjective opinion, but avoid checkpoints (and other models) targeted for Stable Diffusion 2.x. There's a good reason why the vast majority of user contributions on Civitai are still based on SD 1.5. Skip SD 2.0/2.1 but keep your eyes open for the upcoming SDXL.
- You'll need at least one general-purpose checkpoint, and optionally another for inpainting. For photorealism, there are a few popular choices which you can search on Civitai using a tag like "#photorealistic" or "#photorealism".
- For most tasks (maybe except for upscaling), a "pruned" model is good enough which is smaller in size. If the model's format is "PickleTensor", use it at your own risk. While it should be safe to use such a model in practice, it's a security risk in principle. Use a "SafeTensor" version whenever possible.
- Learn how to use extra networks. It's very easy to use them, especially with a proper client or extensions like Civitai Helper. You can find many interesting models on Civitai and use them for free. And it's not too difficult to train your own if you can't find what you're looking for.
Prompting
- Start simple and try to build from there. More words (or "tokens") in the prompt means less emphasis each of the tokens will get. And the order of the words is meaningful so try to start with the most important traits.
- Don't blindly copy & paste long prompts you've found on the internet. You'd be surprised to learn how much "cargo-culting" is going on with prompting practices, especially with negative prompts. Always do an experiment before following a prompting suggestion found on the internet.
- Learn how to adjust the weight of each token which might differ from one client to another.
- Don't strive to do too much with prompting. Prompting is important but far from everything SD can offer, and you won't go very far relying on prompting alone. After all, SD is pretty dumb in understanding what you want when you describe it in words. It's much more competent, however, in understanding your intention when it's suggested as an image (i.e. control net).
- Learn its limits. This may change with the future version of SD but there are things that simply cannot be done by prompting alone. Notably, it's almost impossible to compose a scene involving many subjects with different traits (e.g. naked women in agony and a grinning man wearing clothes) or interacting with each other in a non-trivial manner (e.g. a man whipping a woman). There are better tools for such a task (e.g. inpainting, control map images, etc.) than simple prompting.
Txt2Img
- Start with a small image (512x512, 768x512, 512x768, etc) and build from there. Also, it's better to use a fast sampler (e.g. UniPC or Euler) with small sampling steps (e.g. 10-30) until you find the combination of the prompt and settings that suit your need.
- While experimenting, using the same seed can be helpful to determine the impact of individual change of a setting or prompt. Use XY-Plot if you want more comprehensive data.
- The above tips can be applied to other tools like Img2Img as well.
- Don't try to get everything perfect in Txt2Img, which may not even be possible at all. Instead, try to generate a decent "base image" which can be further refined using other tools like Img2Img or inpainting.
- Once you are satisfied with the combination of the prompt and parameters try "seed hunting", which is finding the best seed by generating in large batch size/count with random seeds.
- After you find the best prompt/parameters/seed, use Hires. Fix to finalise the base image. Often, it's not possible to depict all the details in a 512x512 image so try to focus on determining the general composition/atmosphere/tone of the scene before this stage instead of spending too much time with the details.
- If you run out of VRAM while upscaling using Hires. Fix, try Tiled VAE extension. It also has other cool features like tiled upscaling or regional prompting.
Img2Img / Inpaint
- Understanding how denoising strength works is crucial for using this tool effectively.
- When inpainting, understanding what "context" the AI sees is important. For instance, if you draw a mask over a small part of the hand of a character and inpaint using the "masked only" mode, it's unlikely that you'll get a good result because AI will struggle to understand that the area belongs to a human hand. The more context you provide, the higher the chance that you'll get a more consistent result at the penalty of losing details.
- If you have difficulty inserting a new object in an area or converting it to something entirely different, try the "fill" mode instead of "original".
- If you try to convert an image to a different style (e.g. converting a Daz3D render to a photorealistic image), inpainting might not be the best way to do it. When you inpaint the character in a 3D render, for example, the AI will try to make the new content to be consistent with the rest of the image, which means it'll mimic the 3D render style instead of a photorealistic one.
- Sometimes, using inpaint sketch is enough to convince AI to change a part of the image as you want. It's far easier to paint the character's hair to a different colour or draw a rope around its hands, for example, using inpaint sketch instead of making the whole control net images.
- Overusing inpainting may have a detrimental effect on the overall consistency of the image, making it look like a photo manip work instead of a real photograph. The reason is that, it's difficult to inpaint new content that exactly matches the tone, focus (e.g. depth of field), proportion, etc. of the other part of the image. As such, you can see it as having a certain "consistency budget" which you can spend on using inpaint to fix errors or add more details. When you go over the budget, it may be better to run the whole image through an img2img or even txt2img process again to make it consistent again.
Control Net
- It's difficult to talk about the subject in a comprehensive manner because it's a fast-changing field and there are many different models and many creative uses of them. As such, I'll just emphasise the importance of them. So, learn how to use various control net images. You won't get very far without using them, so this might be the single most important tip in this post.
- It's often useful to manipulate, or even create a control net image using an external tool. For example, drawing stick figures in Krita to determine the general composition of a scene, or deleting extra fingers in a generated lineart map can be very handy.
- To save rendering time, you can generate a control map image using "preview" and use the result as the input with the "none" preprocessor.
- In "pixel perfect" mode, the dimension of the output image determines the size of the generated control map image as well.
- Sometimes, using an excessive number of control net images or using them in the wrong combination may result in unnatural images. When it happens, try to use a different model or disable those not necessary.
- In general, lineart + realistic preprocessor is a good way to preserve the content of an image, especially with a depth map. It can sometimes be more precise than the canny model while being much easier to manipulate.
- OpenPose maps can be very useful but it's difficult to generate them by hand. There are ways to generate them out of an existing image, or even from a 3D program like Blender or Daz3D directly.
I didn't intend to make it such a long post when I started to write it. I'll stop it here for now but I hope this thread can become a place where people can share their tips and workflows with each other regarding the AI-assisted image generation process, instead of arguing about it.
Last edited: