I have an image I’m using as a basis for my prompt here
, and I want Midjourney to modify this image so that the two characters (people) are depicted under the table, searching for a sheet of paper. The new image is intended for a 404 error. No matter what I do, Midjourney always draws both characters next to the table and always with a laptop, regardless of whether I add parameters like --no or --iw and similar. I’ve never been able to convince Midjourney to at least add a magnifying glass to the characters. Does anyone know how to achieve this?
I think you might be asking too much of the program. AI is powerful. But, it can only work with what it has. I don’t think anyone can input a random picture and tell it to output something completely different while keeping a certain element. At least I haven’t seen it done. You might need to try to generate the image you are looking for or upload something closer.
I could be way off base, but I just haven’t seen it in my AI dabblings.
As @RedKittieKat said, your expectations are exceeding what it’s currently capable of doing.
I’ve been doing a lot of reading on how Large Language Model (LLI) AIs work. They’re trained by exposing them to huge amounts of data.
First — For example, since there are probably millions of images on the internet of men and women with captions, CSS alt attributes, or surrounding text describing the images as containing a man or a woman, the AI can predict that a user requesting an image of a woman or a man is requesting something that corresponds to what it’s learned from millions of images of men and women. It can create those images in hundreds of thousands of different ways because of the huge amount of images it’s encountered of men and women.
The same is true of a table. The AI has probably encountered tens of thousands of images described as tables. If you ask the AI to place a man and a woman seated around a table, it has less, but still enough, information to draw on since there are fewer, but still many, images of that sort of thing.
When you ask it to create an image of two people looking beneath the table, it’s likely encountered very few images of people looking beneath a table, so it focuses on those things it does understand and ignores what it doesn’t understand.
It’s important to consider that the AI doesn’t really know what a table is or what a man and a woman are. It simply knows that when a user requests a man and a woman and a table, the request has keywords corresponding to images that are part of the data set it’s been exposed to.
Second — Large Language Models have trouble with negation statements, especially regarding images. A negation statement would describe an image in terms of what something is not rather than what it is.
For example, if you asked for an image of a waterfall, the AI would likely return an image of a waterfall with trees since trees surround waterfalls in most photos. The AI might not know what you mean when you ask for a waterfall without trees. The AI doesn’t actually understand what an image of a waterfall is, and it might be unable to differentiate the waterfall from the surrounding trees. Even after it’s learned the difference between the falls and the trees, it will likely be confused by the negation statement without because it’s never encountered a photo with a caption that says this is a waterfall without trees.
Third — The AI knows the kinds of images a user wants when asking for images of a man and woman and a table, for example. As I’ve mentioned, it doesn’t actually understand what a table is or what a man and woman are; it only knows that its gigantic database contains images with descriptions of men and women with tables. The AI is much less capable of looking at an image you upload and determining that a man, woman, and table are in the image. All it sees are shapes, colors, and lines in various combinations. All it can do is give you similar shapes, colors, and lines. From a conceptual viewpoint, it’s unlikely to recognize what’s in the image you uploaded. It’s even less likely that it could recognize the man and the woman and redraw them looking beneath the table.
As time passes and new generations of LLI AIs improve, they’ll probably be able to overcome some of these problems, but for now, they have their limitations.