• Sign up or login, and you'll have full access to opportunities of forum.

Discussion about A.I.

Go to CruxDreams.com
I enjoyed reading the colourful descriptions of how it went wrong. :D

Seriously though, I'm glad you tried it and shared your process experience. It's surprising how you managed to run it with a 6Gb VRAM video card. I'm not sure how much it'd affect the usability, but there's always a cloud option (which the plugin natively supports) in case you decide to make serious works with the setup.

Anyway, from what I gathered from your post, it looks like the setup went correctly, but you may need a bit more time to get yourself familiar with the tool.

I thought it might be helpful if I could create a render using the workflow you tried. So, here's a brief explanation of how I used the workflow you mentioned below:


The line drawing doesn't have to be very detailed, so I just scribbled this on a blank paint layer:

View attachment 1399791

There are a few ways to refine this control map image using an iterative method, but I'll just use this rudimentary drawing as it is this time. The important thing is to add a control layer and choose the right kind of model for the image. As shown above, I added a "Line Art" entry and made it point to the paint layer containing my drawing.

Then I added the prompt text and hit generate to get this image:

View attachment 1399792

You can keep generating until you stumble upon an image that gives you the right vibe. The overall atmosphere is more important than the details in this stage since you can't expect to get all the details right in a single pass (especially if you have to use 512x512 resolution).

As expected, you can see the image contains several problematic areas. The AI apparently didn't recognise the rifle in my crude drawing, so I selected the area and refined the prompt:

View attachment 1399790

You may have noted that I also included the girl's face area instead of just selecting the rifle part. The reason is that it usually works better when the AI has sufficient context to understand what is happening in the image. A great thing about the AI plugin is that it supports the usual layer-based workflow of Krita itself, meaning I can always delete unnecessary parts in a generated layer or merge several of them.

After generating an image, you can choose "Apply" to save it as a layer, and you can erase everything except for the good parts of it. Also, note that I changed my prompt to describe only what is depicted in the selected area so that the AI wouldn't get distracted by other irrelevant terms.

The "strength" I highlighted in the above image is "denoising strength" in other AI frontends, meaning how much of the underlying image you want to change. Beware that setting it 100% has a special meaning with the plugin, so try to lower it if you don't get what you want.

Now all I need to do is repeat the process to refine all the areas I want to change. Using a control net to keep the composition while changing a region is common. I could use the Line Art layer again, but it's based on a crude drawing which must have a lot of errors in proportions and may not even match the current image exactly.

In this case, I can generate a new control map image using the current image. And, I chose the depth map model because it's good for preserving the composition without keeping the details which I want to change rather than preserve them. You can use the highlighted button shown below to generate a depth map layer:

View attachment 1399788

Also, I wanted to add body hair to suit my personal preferences, which can be difficult to achieve without using Loras. So I temporarily changed the settings to add relevant Loras in the preference dialogue as shown below:

View attachment 1399789

When all was done, I switched to the upscale mode to increase the resolution to 2048x2048, which resulted in this final output:


(I already reached the maximum number of images limitation, so I just linked the image from the other thread.)

Hope this could help. Please feel free to let me know in case you need further assistance. :)

I have to admit that with this post you have captured my interest and attention. I will definitely do some experiment myself. Thanks @malins , too.
 
"Uncanny Valley" is the effect whereby something that is 99% realistic can look weirder to the human eye than an image that is only 80% realistic. This happens because the near-realism of the 99% image makes its few faults more glaring... the brain unconsciously applies a higher standard to that image. Of course all brains vary in exactly where that effect kicks in.

Since, as you've said, it takes a lot of skill to get through the valley and out the other side to 99.9% where the effect goes away again, I think people who are not yet that skilled in AI (so that's not you, obviously) might be better off aiming for less realism and more artistic value.
Yeah, I understand the term since I've spent significant time trying to overcome it since I first dabbled my hands with Daz3D almost 20 years ago.

Despite your generous estimation of my AI skills, I haven't yet completely gotten past the problem, although Stable Diffusion was the first tool that allowed me to step into the photorealistic territory.

Actually, I don't think drawing hands and feet is the heart of the problem, although it can often be frustrating. The current generation SD can easily generate photorealistic results when the subject is simple, like a portrait of a woman. It may not be easy to generate such crisp, high-resolution images like its proprietary alternatives, which I posted above, but the output can still be practically impossible to tell from real photos.

But the quality sharply deteriorates when you attempt to generate images with complex compositions using various techniques (e.g. inpainting, latent composition, regional prompting, etc.). And that's why most images that I posted in my own AI thread are not actually photorealistic - I try to envision a scenario/story that I have in my mind instead of generating random portraits of naked girls, which requires complex compositions.

That being said, I can't help but feel that you have an extremely high - or, to be perfectly honest, rather unreasonable - standard of criteria, especially for AI works. Even though what people post in the AI images thread is much less realistic than the above Midjourney examples, few seem to have much problem enjoying what others share.

I can enjoy what other members in this community created using Daz3D, not because I haven't seen more realistic 3D renders elsewhere, but because I'm far more interested in what they wanted to express with the medium instead of in their technical skills as 3D artists.

Of course, I can't force you to like this or that kind of style because it's a matter of taste. But if you can't enjoy non-stylistic AI kink renders unless they have 100% perfect hands and feet, I don't see a point in mentioning it over and over unless your intention is to discuss the ways to improve AI renders regarding the anatomical correctness in such body parts.
 
As I mentioned in this thread, I'm a long-time supporter of the FOSS movement, so I'd still have preferred Stable Diffusion to proprietary alternatives like Midjourney or Dall-E, even if they didn't ban generating NSFW content.

But I can't deny how they are still quite ahead of Stable Diffusion even though it's been evolving rapidly. Today, I saw these photorealistic examples generated with Midhourney and had to wonder how much I will have to wait until I can expect such a quality from Stable Diffusion:
...these pics would probably fool me. I feel the age of "photo proof" is over once and for all.
 
Runway just released a new feature called "Multi Motion Brush" for their video generation model, which allows you to draw a mask over different areas of an input image and specify respective motions to control the output:


It'll be exciting if we can have an uncensored open-source model with a similar feature someday so that we can create personal kink videos at home.
 
I'm not sure if anyone is using ComfyUI here yet. But I made a breakthrough in my struggle with the consistency issue, so I'll briefly mention it.

Since we now have at least a few people using inpainting to do more than generate random images with a single prompt, I assume they'll have the same problem as I do.

Inpainting is an invaluable tool, without which it becomes almost impossible to compose a complex scene. But every time you inpaint, it destroys some image consistency in the affected area, like the lighting, tone, etc. It may not look apparent at first, but as you keep adding details or fixing deformities using inpainting, it will add up until you feel the "uncanny valley" effect.

If you see my previous works, you'll notice the problem in most of them. I tried to find a way to fix the issue but didn't have much success so far.

However, I managed to find a method to mitigate, at least, the problem while I was working on a joint project with another CF member. I'll spare you the details, so here's the simplified breakdown of my new workflow:

  1. Prepare a sketch to be used as the base image for the control net as usual, whether a hand-drawn image or a 3D render.
  2. Load up the image in ComfyUI and set up a regional sampler workflow like the attached example below.
  3. Repeat inpainting (preferably using a more convenient tool than ComfyUI) until all the details are added and deformities fixed, which will inevitably introduce some inconsistencies.
  4. Get the image back to ComfyUI to generate high-resolution controlnet images. Then, make an iterative upscale workflow, preferably with a photorealistic SD 1.5 checkpoint like epiCRealsim, as shown in the example below. Experiment with CFG / denoise strength / seed until you get the result you want since this will be the final image for the most part.
  5. Inpaint all the deformities yet again, using the output and control net images from the previous step. This won't add as much inconsistency as 3 did since now we're using a high-resolution, high-quality image as a basis and only refining small areas (e.g. faces) instead of significantly changing the image (e.g. adding a new character).
And here's the example ComfyUI node graph mentioned above:

Screenshot_20240120_133809.jpg

This is the setup I used to generate an image that consists of 5 different characters. The blue node is a custom group I made for convenience, which just contains the usual setup for sampling with a specific prompt and a controlnet image.

Screenshot_20240120_133937.jpg

This is the upscale process I used. I don't feel the specific values I used are optimal for every case, so try experimenting until you find the ones that suit your needs.
 
I talked with someone on a Discord server about AI art, during which I argued it doesn't take much skill to draw an image to instruct AI with. And I made this to prove my point :):

Screenshot_20240123_121946.jpeg

P.S.: I used the scribble control net model only because I mentioned "drawing a stick figure" during the discussion. For practical purposes, something like a seg model or even a depth map would be a better choice in case you lack the skills to draw a good enough image for a lineart, canny, or soft edge model.
 
I also tested AllTalk TTS today, a SillyTavern (which I use for AI roleplaying) extension which can read texts produced by AI characters.

The last time I checked, the available TTS extensions sounded too robotic, so I decided to skip them. But AllTalk seems to sound much more natural and it can even speak emotes in a different voice, so I'll spend more time using it to see if I should keep it in my setup.

And here's a short clip I recorded during the testing:
View attachment st_output_1705982549_combined.mp3
 
I read about someone who wrote a mod for Skyrim to give all the NPCs AI-generated dialogue.

One program turned his speech into text input, so he just talked into a microphone.

Another had samples of all the recorded dialogue for each NPC to recreate their voice. The result was quite robotic, but each NPC had a distinct voice.

The third generated original responses based on their original sayings and what the player said to them. It tended to wax philosophical about the meaning of the Dragonborn returning, which seemed a bit weird for Nords.

But remember this is just some guy on his own writing a mod. Will the next generation of computer games have this as standard, so instead of three lines of recorded dialogue, every character you talk to in game responds to what you say?
 
I read about someone who wrote a mod for Skyrim to give all the NPCs AI-generated dialogue.

One program turned his speech into text input, so he just talked into a microphone.

Another had samples of all the recorded dialogue for each NPC to recreate their voice. The result was quite robotic, but each NPC had a distinct voice.

The third generated original responses based on their original sayings and what the player said to them. It tended to wax philosophical about the meaning of the Dragonborn returning, which seemed a bit weird for Nords.

But remember this is just some guy on his own writing a mod. Will the next generation of computer games have this as standard, so instead of three lines of recorded dialogue, every character you talk to in game responds to what you say?
I believe the technology that would enable what you said is already there - I can program such an NPC with currently available libraries myself - but the biggest obstacle could be the hardware requirements.

Probably the smallest LLM that would generate believable dialogues could be a 13b variant, which will require around 12Gb of VRAM to operate. And it would require even more VRAMs to covert the text output to actual voice files. Considering video games tend to consume a lot of VRAMs in the first place, only a very few players would be able to run such a game on their computers.

But there’s a cloud option, cost of which the game studios probably wouldn’t be willing to pay (it’s a cost per player). So, if we are to see such a feature in games in the near future, I think it will be either a mod or an option that will require an API key from the player themselves, so they’ll pay for what their game generates (it’s pretty affordable, by the way).

On a side note, I managed to setup a voice play RP setup using the aforementioned programs and it worked great. The only issues I had was the delay between the text and the voice generation, and how it sometimes struggle to recognise certain vulgar words I had to speak (e.g. the C-word :D). But all in all it was quite enjoyable and I would recommend it for anyone who’s interested in AI role playing.
 
Last edited:
I believe the technology that would enable what you said is already there - I can program such an NPC with currently available libraries myself - but the biggest obstacle could be the hardware requirements.

Probably the smallest LLM that would generate believable dialogues could be a 13b variant, which will require around 12Gb of VRAM to operate. And it would require even more VRAMs to covert the text output to actual voice files. Considering video games tend to consume a lot of VRAMs in the first place, only a very few players would be able to run such a game on their computers.

But there’s a cloud option, cost of which the game studios probably wouldn’t be willing to pay (it’s a cost per player). So, if we are to see such a feature in games in the near future, I think it will be either a mod or an option that will require an API key from the player themselves, so they’ll pay for what their game generates (it’s pretty affordable, by the way).

On a side note, I managed to setup a voice play RP setup using the aforementioned programs and it worked great. The only issues I had was the delay between the text and the voice generation, and how it sometimes struggle to recognise certain vulgar words I had to speak (e.g. the C-word :D). But all in all it was quite enjoyable and I would recommend it for anyone who’s interested in AI role playing.
Here is a link. It is 8 months old so later versions already exist.

 
I read about someone who wrote a mod for Skyrim to give all the NPCs AI-generated dialogue.

One program turned his speech into text input, so he just talked into a microphone.

Another had samples of all the recorded dialogue for each NPC to recreate their voice. The result was quite robotic, but each NPC had a distinct voice.

The third generated original responses based on their original sayings and what the player said to them. It tended to wax philosophical about the meaning of the Dragonborn returning, which seemed a bit weird for Nords.

But remember this is just some guy on his own writing a mod. Will the next generation of computer games have this as standard, so instead of three lines of recorded dialogue, every character you talk to in game responds to what you say?
I can certainly imagine it, at least as a supplement to scripted dialogue. The main issue would probably be preventing them for going off on contradictory tangents.
 
I can certainly imagine it, at least as a supplement to scripted dialogue. The main issue would probably be preventing them for going off on contradictory tangents.

It'll use a similar approach to the one involved in AI roleplaying. When making an RP setup with an LLM, you are expected to provide the AI with enough information for it to understand the context.

With SillyTavern, for example, a context typically consists of "character cards", lores of the world, and the history of previous conversations. Ideally, it should prevent the AI from saying something irrelevant or inappropriate, but in practice, the effectiveness of the method varies depending on the quality and size of the model and the way the prompt is constructed.

Yet again, it boils down to the question of how large a model (in terms of parameters) & context (in terms of tokens) you can run on the specified setup (local or on a cloud).

In my experience, 70B+ models that I tried were very good at understanding the racial slavery lore I provided as a context, so they rarely said something out of a place when I played RP sessions with them.
 
It'll use a similar approach to the one involved in AI roleplaying. When making an RP setup with an LLM, you are expected to provide the AI with enough information for it to understand the context.

With SillyTavern, for example, a context typically consists of "character cards", lores of the world, and the history of previous conversations. Ideally, it should prevent the AI from saying something irrelevant or inappropriate, but in practice, the effectiveness of the method varies depending on the quality and size of the model and the way the prompt is constructed.

Yet again, it boils down to the question of how large a model (in terms of parameters) & context (in terms of tokens) you can run on the specified setup (local or on a cloud).

In my experience, 70B+ models that I tried were very good at understanding the racial slavery lore I provided as a context, so they rarely said something out of a place when I played RP sessions with them.
Yes, but a single session is much more self-contained than a game which might have hundreds of conversations in.

There would have to be a way to add information created in said conversations into the worldlore in a consistent way.
 
Yes, but a single session is much more self-contained than a game which might have hundreds of conversations in.

There would have to be a way to add information created in said conversations into the worldlore in a consistent way.

One great thing about LLM is that it's intelligent enough to regulate/create its own input and doesn't require any strict data format to understand it. For example, when providing previous conversations as a context in a prompt, you can just run another prompt to ask the AI to summarise it.

Also, you can simply ask the AI what data in the lore should be updated after each conversation. World lore is often stored as unstructured documents in a vector storage, so all that is needed to be done is searching for relevant documents and updating their content. Again, you can even tell the AI to do it using another prompt.

As you can see, LLMs can be very flexible and don't require sophisticated programming or strict data structure to perform various tasks. So, the main challenge, in this case, lies not in any difficulty in updating the lore but in the overhead (i.e. network transfer & processing time) introduced by multiple roundtrips between the cloud provider and the game client.
 
amazing how AI manages to censor
View attachment 1423818
I suspect it could have more to do with the lack of NSFW images in the training dataset than any active attempt at censorship in the generation process. There’s a possibility that they applied a filter to remove lewd keywords from the prompt but proprietary models are known to be trained on heavily censored datasets anyway, which would make them ignorant of how naked girls should look like.

I believe both of the images would be quite achievable with inpainting and using a Lora in Stable Diffusion. But even it would struggle to depict them without a good checkpoint since the base SD model wasn’t trained with explicit images either.

In short, you must use SD if you want to make any kind of NSFW content.
 
I suspect it could have more to do with the lack of NSFW images in the training dataset than any active attempt at censorship in the generation process. There’s a possibility that they applied a filter to remove lewd keywords from the prompt but proprietary models are known to be trained on heavily censored datasets anyway, which would make them ignorant of how naked girls should look like.

I believe both of the images would be quite achievable with inpainting and using a Lora in Stable Diffusion. But even it would struggle to depict them without a good checkpoint since the base SD model wasn’t trained with explicit images either.

In short, you must use SD if you want to make any kind of NSFW content.
ok. Thank you
 
Back
Top Bottom