• Sign up or login, and you'll have full access to opportunities of forum.

AI Howto

Go to CruxDreams.com
Disclaimer: Based on @jucundus's suggestion, I'm creating this thread to share tips and workflows for members who utilise AI tools to create content. If you want to talk about non-technical aspects of AI, please direct the discussion to a more suitable place instead.

One of the advantages of AI-assisted art is that it offers a variety of tools, models, and workflows to accommodate many different needs and preferences. Additionally, the field of AI is rapidly evolving, which means that the information discussed here may not be suitable for everyone and could become outdated by the time you read it.

As such, I suggest considering this as a collection of general tips to discover the best tools for you and create your own workflow rather than a step-by-step tutorial aimed at achieving the optimal result.

I will explain the workflow I currently use for my renders in a series of posts, starting with the basics and moving on to more advanced topics. I encourage other members to provide feedback or share their own tips and workflows in this thread.

First Choice: Local Installation or Online Service​

If you're new to AI art, the first thing to consider is how you want to use the necessary tools. Installing these programs on your PC is ideal because it helps you avoid paying for subscriptions or credits and addresses concerns about censorship and privacy. However, this requires a decent video card with substantial video memory.

Although running them on a video card with 8GB or less VRAM is possible, I'd consider 10GB as the minimum system requirement for SDXL and 12GB for FLUX. (I will explain what SDXL or FLUX mean later.) For an ideal experience, a decent video card with 16GB or more video memory is recommended. If you are considering buying new hardware to run AI tools, prioritise video memory above graphics performance. NVidia is preferred (because of CUDA), while AMD and Intel might be problematic.

If your graphics card does not meet the hardware requirements, your best option would be to utilise an online service that allows you to access their hardware. There are several types of these services, which can be roughly categorised as follows:
  1. Those providing their own image generator.
  2. Those offering popular open-source AI tools.
  3. Those renting only hardware.
The first option seems to be the most popular among our members, which is unsurprising, as it is the most prevalent and accessible choice. However, there is a caveat: the generator they offer tends to lack features compared to the popular tools preferred by more serious content creators. Additionally, they typically provide a limited selection of models, which may not be suitable for generating NSFW images. On the positive side, some of these platforms offer free credits, allowing you to generate a few images daily without requiring a subscription. This could be a good option if you're not yet ready to commit to spending money on creating AI art.

The second option is similar to the first, but it offers well-known open-source tools such as Foocus, Forge, ReForge, Invoke AI, and ComfyUI. These tools tend to be more feature-rich and are generally up-to-date with the latest advancements in AI technology compared to the custom tools provided by the services mentioned earlier. Among these options, ComfyUI stands out as the most advanced, but it is also the most complex. Typically, when something impressive occurs in the generative AI field, it is first implemented in ComfyUI. However, its complexity and node-based workflow may not be suitable for everyone. It's worth noting that there are tools built on top of ComfyUI that offer a more user-friendly interface. I personally use one of these tools that integrates with Krita, which I will introduce later.

The last option is to rent GPUs by the hour. This is the most flexible and private choice available, as it allows you to use the hardware for nearly any purpose. If you're comfortable typing commands in a Linux console, you can install any of the popular tools mentioned earlier, download models from Civitai, train your own models, or even use the GPU for non-image-related tasks, such as roleplaying with a language model. Additionally, this option is ideal for resource-intensive tasks like video production.

However, there is a downside: you are often charged for network storage, which means you incur costs even when you're not actively using the service. Nevertheless, the overall cost remains reasonable, as renting the GPU itself is pretty affordable—typically around $0.35 per hour for a 24GB video card.

To determine the best option for you, we need to discuss some of the mentioned factors in detail, such as the availability of models and features, which I will address in my next post.
 
Disclaimer: Based on @jucundus's suggestion, I'm creating this thread to share tips and workflows for members who utilise AI tools to create content. If you want to talk about non-technical aspects of AI, please direct the discussion to a more suitable place instead.

One of the advantages of AI-assisted art is that it offers a variety of tools, models, and workflows to accommodate many different needs and preferences. Additionally, the field of AI is rapidly evolving, which means that the information discussed here may not be suitable for everyone and could become outdated by the time you read it.

As such, I suggest considering this as a collection of general tips to discover the best tools for you and create your own workflow rather than a step-by-step tutorial aimed at achieving the optimal result.

I will explain the workflow I currently use for my renders in a series of posts, starting with the basics and moving on to more advanced topics. I encourage other members to provide feedback or share their own tips and workflows in this thread.

First Choice: Local Installation or Online Service​

If you're new to AI art, the first thing to consider is how you want to use the necessary tools. Installing these programs on your PC is ideal because it helps you avoid paying for subscriptions or credits and addresses concerns about censorship and privacy. However, this requires a decent video card with substantial video memory.

Although running them on a video card with 8GB or less VRAM is possible, I'd consider 10GB as the minimum system requirement for SDXL and 12GB for FLUX. (I will explain what SDXL or FLUX mean later.) For an ideal experience, a decent video card with 16GB or more video memory is recommended. If you are considering buying new hardware to run AI tools, prioritise video memory above graphics performance. NVidia is preferred (because of CUDA), while AMD and Intel might be problematic.

If your graphics card does not meet the hardware requirements, your best option would be to utilise an online service that allows you to access their hardware. There are several types of these services, which can be roughly categorised as follows:
  1. Those providing their own image generator.
  2. Those offering popular open-source AI tools.
  3. Those renting only hardware.
The first option seems to be the most popular among our members, which is unsurprising, as it is the most prevalent and accessible choice. However, there is a caveat: the generator they offer tends to lack features compared to the popular tools preferred by more serious content creators. Additionally, they typically provide a limited selection of models, which may not be suitable for generating NSFW images. On the positive side, some of these platforms offer free credits, allowing you to generate a few images daily without requiring a subscription. This could be a good option if you're not yet ready to commit to spending money on creating AI art.

The second option is similar to the first, but it offers well-known open-source tools such as Foocus, Forge, ReForge, Invoke AI, and ComfyUI. These tools tend to be more feature-rich and are generally up-to-date with the latest advancements in AI technology compared to the custom tools provided by the services mentioned earlier. Among these options, ComfyUI stands out as the most advanced, but it is also the most complex. Typically, when something impressive occurs in the generative AI field, it is first implemented in ComfyUI. However, its complexity and node-based workflow may not be suitable for everyone. It's worth noting that there are tools built on top of ComfyUI that offer a more user-friendly interface. I personally use one of these tools that integrates with Krita, which I will introduce later.

The last option is to rent GPUs by the hour. This is the most flexible and private choice available, as it allows you to use the hardware for nearly any purpose. If you're comfortable typing commands in a Linux console, you can install any of the popular tools mentioned earlier, download models from Civitai, train your own models, or even use the GPU for non-image-related tasks, such as roleplaying with a language model. Additionally, this option is ideal for resource-intensive tasks like video production.

However, there is a downside: you are often charged for network storage, which means you incur costs even when you're not actively using the service. Nevertheless, the overall cost remains reasonable, as renting the GPU itself is pretty affordable—typically around $0.35 per hour for a 24GB video card.

To determine the best option for you, we need to discuss some of the mentioned factors in detail, such as the availability of models and features, which I will address in my next post.
Thank you, @fallenmystic, a much needed tutorial! :)

The technospeak can be bewildering to a technophobe such as me, but Fallenmystic is a patient guy and will explain anything you ask.

So, I think CUDA is a kind of software that uses the GPU for image processing while leaving the CPU free for other tasks?
 
Thank you, @fallenmystic, a much needed tutorial! :) The technospeak can be bewildering to a technophobe such as me, but Fallenmystic is a patient guy and will explain anything you ask. So, I think CUDA is a kind of software that uses the GPU for image processing while leaving the CPU free for other tasks?
Yes, you’re correct. CUDA is a de facto standard software stack for GPU computing that pretty much all popular AI tools support. I really don’t like how it has enabled NVidia to enjoy a dominant status in the market. The prospect of such a powerful technology to be monopolised by the hands of a few large corporations can be an existential threat for humanity.

Fortunately, both AMD and Intel have been working hard to catch up. However, for the moment, NVidia remains the best option for AI, especially if you don’t use Linux and aren’t familiar with patching and building software like PyTorch.

You can see the details in this Wiki document from ComfyUI.
 

Choosing AI Model: Part 1​

The previous post discussed various methods for running generative AI tools. However, I only provided a broad overview without specific recommendations. This was because there are other important factors to consider when determining which tools would be ideal for individual circumstances. Today, we will focus on the most crucial aspect of the image generation pipeline: the model.

While browsing posts about AI art on CF, I noticed that people often do not clearly distinguish between the software and the model. For instance, if someone asks, "What is the best site for creating NSFW images?" it wouldn't be a good question since it isn't usually the online service that applies the censorship but the model they use (which can be multiple).

In summary, the model serves as the foundation for image generation, while tools and services offer ways to utilise that model. There are three main categories of image-generation models: base models, finetunes, and "additional (or extra) networks".

The base model refers to the foundational models companies and research groups develop by training on billions of images, which they then release to the public. Due to the immense resources needed for this process, only a limited number of base models are currently available.

Note that all popular base models are censored to varying degrees. Fortunately, you can enhance these base models by training them with additional images, which can help restore their ability to depict NSFW subjects. These improved versions are known as "finetunes" of the base models, and both are referred to as "checkpoints."

Additional networks are small supplementary models that can be used alongside a checkpoint. The most common type of these models is known as 'LoRa.' While there are other variations, such as DoRa or embeddings, you will primarily hear about LoRa, as they are by far the most widely used. They are used to depict a concept (e.g. "a whipped body"), style (e.g. "analogue film style"), or subject (e.g. "The Batman"), which the checkpoint wasn't trained initially to depict.

In summary, to get started with AI art, you'll need to choose a 'finetune' of a base model at the minimum. But which model should you select, and where can you find the available options? We will discuss these topics in the next post.
 

Choosing AI Model: Part 1​

The previous post discussed various methods for running generative AI tools. However, I only provided a broad overview without specific recommendations. This was because there are other important factors to consider when determining which tools would be ideal for individual circumstances. Today, we will focus on the most crucial aspect of the image generation pipeline: the model.

While browsing posts about AI art on CF, I noticed that people often do not clearly distinguish between the software and the model. For instance, if someone asks, "What is the best site for creating NSFW images?" it wouldn't be a good question since it isn't usually the online service that applies the censorship but the model they use (which can be multiple).

In summary, the model serves as the foundation for image generation, while tools and services offer ways to utilise that model. There are three main categories of image-generation models: base models, finetunes, and "additional (or extra) networks".

The base model refers to the foundational models companies and research groups develop by training on billions of images, which they then release to the public. Due to the immense resources needed for this process, only a limited number of base models are currently available.

Note that all popular base models are censored to varying degrees. Fortunately, you can enhance these base models by training them with additional images, which can help restore their ability to depict NSFW subjects. These improved versions are known as "finetunes" of the base models, and both are referred to as "checkpoints."

Additional networks are small supplementary models that can be used alongside a checkpoint. The most common type of these models is known as 'LoRa.' While there are other variations, such as DoRa or embeddings, you will primarily hear about LoRa, as they are by far the most widely used. They are used to depict a concept (e.g. "a whipped body"), style (e.g. "analogue film style"), or subject (e.g. "The Batman"), which the checkpoint wasn't trained initially to depict.

In summary, to get started with AI art, you'll need to choose a 'finetune' of a base model at the minimum. But which model should you select, and where can you find the available options? We will discuss these topics in the next post.
Great description!

Sadly, that what we call Finetunes, made with kohya-ss or onetrainer, are like loras weighted against the chosen model. Sometimes better than a lora. That is the reason why you can rip the concepts from the finetuned model as loras. I hope that there will be real finetunes/extensions at some point.
 
Great description!

Sadly, that what we call Finetunes, made with kohya-ss or onetrainer, are like loras weighted against the chosen model. Sometimes better than a lora. That is the reason why you can rip the concepts from the finetuned model as loras. I hope that there will be real finetunes/extensions at some point.
That's how I see them mostly: 'checkpoints with some preinstalled LoRas'. :) I don't have much problem with this, though. But when it comes to Flux, I read that they still haven't found a way to finetune the base model without deteriorating quality. I haven't used the base model often enough to know how much popular checkpoints damage the original.

I'm waiting for Flux to mature, but I have a feeling that everyone may move to something else (e.g. hopefully one from OAI) before it does.
 
I haven't used the base model often enough to know how much popular checkpoints damage the original.
Well, you can create a Flux fine-tuning model without losses if you use LoRa in combination – but if you want to use it with any LoRa, you have to train the LoRa with the fine-tuning model, NOT with the Flux base model – otherwise the weights will be messed up ^^

Best for me: Train missing clothes, shackles, devices, whips, riding crops by using kohya_ss or onetrainer finetuning options - and then use the finetuned model to create LoRa's.
I don't add characters/people to the finetuned model because at some point you lose the characteristics of them.

At the moment I'm creating some Model LoRa's using runpod in combination with the good old ai-toolkit with an A40 :) Testing the result via ComfyUI.
This time I'm creating a Tina Lora ( aka sweet-trixie / @LXXT ) due to the fact that her sets are no longer available (grrrrr! :p)

(PS: the quality usually gets better around 2600 to 4000 steps...., that example below was made at 900 steps, the A40 is now at step 1279)

02-01-2025_15-54-04.jpgImg-Seed-101211 guide-4.672762619836858_00001.jpg
 
Last edited:
As promised, a (not so) short illustrated guide how I made my last image here. I am by far not as advanced as the two guys above; to be honest, I don't understand half of what @Anwendungsfehler is talking about.

So my tutorial will be a bit more down to earth; it is written for people who have made at least simple AI images before and want to try out new things. Please read @fallenmystic's explanations above if you are a total beginner. Here, I cannot give a complete guide to the use of the software or of Stable Diffusion concepts such as sampler, scheduler, denoising strength, CFG etc.. If you don't know these things, you'll have to read a bit first in order to profit from this tutorial.

Setup:
I use a local installation to produce images by Stable Diffusion. I do not have extremely beefy hardware. All my images are produced on a gaming laptop with an 8GB NVIDIA RTX 4070 graphics card, an AMD A7 CPU and 32 GB regular RAM. This is enough for running SDXL and Flux models . Until some months ago, I did my images with a 6 GB 3070 card. As a software I use Forge, (for the link, see post above by @fallenmystic). Good for PCs like mine with limited hardware, but they have fallen a bit behind during the last couple of months.

ControlNet:
I heavily rely on a feature called ControlNet, and that is implemented quite well in the Forge software for the models I use (SDXL Pony variants). ControlNet lets you inject an additional guiding image for image generation, and this is the key to transform drawn art or 3d renders into AI images, or to capture the essence (e.g., pose) of a "good" AI text prompt creation for further processing. ControlNet consist of 2 parts, a pre-processor and a model that interprets the pre-processed image. Various pre-processors are available in Forge. My favorite pre-processors are transforming the input images into some sort of lineart by an image analysis strategy called edge detection. The most common such pre-processors is called "canny". I prefer one that is called "lineart-realistic". A ContolNet model that works very well with "lineart-realistic" is called mistoLine. It is not native in the Forge software, but you can download the file "mistoLine_fp16.safetensors" here and put it in the path "...\forge\webui\models\ControlNet", then it will be available.

Figure 1: Modify the original image.
a) First I resized Quoom's original 600x600 pixel picture (A) in Photoshop (B) and trimmed it a bit. My hardware can handle images of around 1200 to 1600 pixels with a SDXL model. Larger is not necessarily better, many SDXL models are trained around 1024 x 1024 pixels and start producing funny artifacts when image dimensions are too large. But this is less a problem when also using ControlNet, and larger images tend to have better details.

b) Quoom's picture had some features that I am not able to reproduce nicely with AI, in particular body parts and background figures. So I had to remove them in Photoshop (B), even though the "realism" of the scene suffered. I also had to correct the rope that was not really vertical, and alter those parts of the ground that were really different from the rest. Some illumination correction was also necessary, e.g., the face of the left guy.

img01.jpg

Figure 2: ControlNet, Round 1:
a) I made a ControlNet Image by going to the txt2Img tab in Forge, uploading the resized corrected image (B) to the ControlNet unit and transforming it with the "lineart-realistic" pre-processor. Checking the box "pixel-perfect" makes the newly produced lineart image as large as the input image.

b) I copied the pre-processed image and worked on it in Photoshop as depicted in Figure 2. Specifically, I removed the sleeve of the right guy, because it was an obvious geometry artifact.

img02.jpg

Figure 3: First version using a non-realistic model.
Note: I obtain my models from Civitai. You may have to create a free account to access them if they are NSFW.

a) I don't know if that is really necessary, but I often find that a non-realistic image like Quoom's render does not immediately translate well when using a realistic SDXL model. So I did a first round with a non-realistic one. My favorite SDXL models are all derived from "Pony Diffusion XL" that is good for NSFW anime-type images. For the first round I used "Atomix Pony 3D XL", which has the advantage to be very fast (see model page for specifics on how many steps and and which sampler to use).

b) I used the "img2img" mode to generate images. This means that I used the resized Quoom image (Fig 1, B) as a reference, setting the "denoising strength" to 0.8, which is high (the AI is allowed to change 80% of all image pixels). The touched-up lineart image (Fig 2, D) served as the ControlNet input. Since this image was already lineart, I set the pre-processor to "none" and the ControlNet model to "mistoLine" (see above). There are two more ControlNet parameters, a "Weight" (how important ControlNet is, I set it to 0.9), and a "Timestep Range" (until which fraction of the generation process ControlNet should be active, I set it to 0.75). These parameters, together with the denoising strength are the main switches to play around and give you a lot of flexibility.

For prompting I used a Crux-specific LoRa made by @Hornet1ba to get the bloody arms and nails right. Just ask him for it, he has a whole set. My prompt was:

Positive:
"from above. 1 nude girl 2 dressed guys. blonde teenage girl crucified to a wooden post with nails. arms apart, blood streaming. hands, fingers, feet, toes, long hair, screaming, mouth open, eyes closed, pain. 2 evil (grinning guys), boots, lifting wooden beam, ropes, (ladder, ground, vivid colors) <lora: PonyXL_crux_H1b v4-000009:1>"

Negative:
"monochrome"

In the prompts I also included so-called enbeddings that are like magical words supposed to improve particular image aspects. My favorite positive embedding for non-realistic Pony models are found here. Disclaimer: I totally don´t understand prompting and it is always trial and error for me ("blonde" often seems to works). The phrase "crucified to a wooden post with nails" is required to trigger @Hornet1ba's LoRa.

Usually I generate quite a lot of images, like 25 to 50. Sometimes they all suck in an important common feature, then one has to revise the prompt or the three parameters mentioned above. One never gets a single image in which everything is right. So I produce a "consensus" image in Photoshop from individual elements of different images (fig. 3, E). Usually I also correct contrast, brightness, Gamma, color, etc. in the final version. Plus I remove common image artifacts like dots.

I used this image to make another ControlNet lineart image (Fig. 3, F) before I switched to a realistic model.

img03.jpg

Figure 4: Realistic version
Often I try a couple of different realistic Pony models to see which one fits best. For this image I used the realistic model Tame Pony 2.5, which is actually quite good. Same procedure as above. The prompt evolved over time to:

Positive:
"from above. 1 girl 2 guys. blonde nude 18yo girl crucified to a wooden post with nails. screaming, mouth open, pain, blood streaming. (armpits, hands, fingers, feet, toes), long messy hair. 2 (ugly grinning guys lifting wooden beam), tunic, belt, boots. (ropes), ladder, ground, (dramatic lighting, shadows) <lora: PonyXL_crux_H1b v4-000009:1>"

Negative:
"dirty, lipstick, painted toenails"

Embeddings: Negative_&_Positive_Embeddings_By_Stable_Yogi

I again produced a consensus image from several generations (Fig 4, G) and eventually used ths consensus image as an input for further generations instead of the above Fig 3, E.

img04.jpg

In this last round of generations, I also used a plugin of the Forge software called Adetailer. It is basically an automated way of inpainting faces (i.e. regenerating them at higher magnification). This brought me to a consensus image as depicted in Fig 4, H.

The next step was an upscaling of the image (available in the "Extra" tab of the Forge software) using the "R-ESRGAN 4x+" upscaling algorithm. You may have noted that the longer side of all my AI images is 3200 pixels for no particular reason other than habit.

Inpainting
The last step is boring: I put the large image back into the img2img tab, copy it to the "inpaint" tab and redo a lot of details as outlined in Fig. 4, H. For this I use the inpaint "only masked" option with a size of 1024 x1024 pixel (native SDXL resolution). In this image, I actually inpainted her whole body and head piece by piece, with small prompts that only listed what I wanted to see (e.g., for her upper torso: "detailed shiny sweaty skin, (bloody whip marks, bruises, welts, scratches:1.2), toned, armpit, neck tension, shoulders, breasts, nipples, rib cage"). This can take an incredibly long time. Hands and feet are special cases, for these I often use a flux model called Flux Dev Hyper NF, which is a faster, lighter version of the "normal" Flux model. But I feel that this is getting into too much details. Altogether, inpainting is incredibly time consuming, but also can improve image quality a lot.

So, now you probably understand why it takes me so many hours to complete a single AI image, and why I laugh about AI haters who whine about the flood of "rapidly produced" soulless AI trash. Also, I would really like to learn how @MICHELE PATRI managed to produce this amazing image in less than one tenth of the time it took me to make a similar but not better one. Could you possibly make a small tutorial for us, please, @MICHELE PATRI?

(I am also proudly declaring to administrators that I neither exceeded the 5 images per day nor the 500 MB per image limit :D)
 
Last edited:
Disclaimer: I totally don´t understand prompting and it is always trial and error for me ("blonde" often seems to works). The phrase "crucified to a wooden post with nails" is required to trigger @Hornet1ba's LoRa.
It’s a fantastic tutorial! I was a bit worried my posts might be too slow and too conceptual. But with such good guides from other knowledgeable members like you, I feel mine would be more useful too. Thanks much! :)

As for prompting, I would suggest using Danbooru tags if you rely on Pony variant models. It’s the best part of using a Pony model, so there’s no reason not to use them. It’s also a good idea to put the “score tags” to the positive and negative prompts as shown in the linked guide.

It feels so much better to exchange little tips like this with other AI creators than bickering with AI haters. It was a great idea of you to suggest making a dedicated thread for this.
 
Positive:
"from above. 1 girl 2 guys. blonde nude 18yo girl crucified to a wooden post with nails. screaming, mouth open, pain, blood streaming. (armpits, hands, fingers, feet, toes), long messy hair. 2 (ugly grinning guys lifting wooden beam), tunic, belt, boots. (ropes), ladder, ground, (dramatic lighting, shadows) <lora:ponyXL_crux_H1b v4-000009:1>"
I'd like to point out one rather important element of text prompting:

The section at the end between the 'greater than' and 'less than' symbols is where @jucundus is calling the lora he used to help guide Forge what to generate. Unfortunately, the forum software interpreted a couple of important bits as an emoji. They are actually a colon and the letter 'p'. This is what the prompt should look like:

lora-call.png
 
It’s a fantastic tutorial! I was a bit worried my posts might be too slow and too conceptual. But with such good guides from other knowledgeable members like you, I feel mine would be more useful too. Thanks much! :)

As for prompting, I would suggest using Danbooru tags if you rely on Pony variant models. It’s the best part of using a Pony model, so there’s no reason not to use them. It’s also a good idea to put the “score tags” to the positive and negative prompts as shown in the linked guide.

Thanks! I am very happy that you like it. I am seriously considering switching to Krita AI (to try the flux controlnet working in comfy UI), so I would love to hear more about it's pros and cons.

I know about the danbooru tags, but I feel that they are hit or miss, (e.g the "Dutch angle" camera angle does exactly nothing, and the Emotions and expressions ASCII tags are totally useless). The pdxl embeddings should replace the score tags, and many of the realistic models have abandoned them altogether.
 
Is there a free program that can be safely used to create erotic AI art that may or may not involve death? Without the need to sign up?
 
I'd like to point out one rather important element of text prompting:

The section at the end between the 'greater than' and 'less than' symbols is where @jucundus is calling the lora he used to help guide Forge what to generate. Unfortunately, the forum software interpreted a couple of important bits as an emoji. They are actually a colon and the letter 'p'. This is what the prompt should look like:
Thanks a lot for pointing this out! I could work around this issue now by introducing a space, which should not affect the Lora syntax.
 
Is there a free program that can be safely used to create erotic AI art that may or may not involve death? Without the need to sign up?
Yes. It's called Stability Matrix, and can be used to install and manage several different offline AI Web UI's, including the Forge UI referenced in @fallenmystic 's original post and @jucundus ' tutorial.

 
Last edited:
Thanks! I am very happy that you like it. I am seriously considering switching to Krita AI (to try the flux controlnet working in comfy UI), so I would love to hear more about it's pros and cons.

I know about the danbooru tags, but I feel that they are hit or miss, (e.g the "Dutch angle" camera angle does exactly nothing, and the Emotions and expressions ASCII tags are totally useless). The pdxl embeddings should replace the score tags, and many of the realistic models have abandoned them altogether.
You are right about the score tags. I didn’t know there was such an embedding nowadays. As for Danbooru tags, however, I found them indispensable in my case. As I mostly prefer a realistic style, I wouldn’t have used Pony models if it were not for the ease of prompting they bring. If they weren’t as effective for you I wonder what could be the difference. Usually, things like excessive conditioning (this includes LoRas and embeddings too) or using wrong tags (e.g. using tags with too few usage, or missing underscore in multi-word tags) could affect negatively to the output.
 
Last edited:
Back
Top Bottom