Hello again. Missed the last 2 days because I was busy. Went to Providence to visit some friends, got the website published for this blog while on the train. Wanted to make some AI art based on the train ride, used SDXL (Mobius, Proteus, and Juggernaut XL), Dalle 3, and Midjourney. None of them were able to capture the vibe that I wanted, here is the best one I managed to get.
Image Details
- Model: Midjouney V5
- Prompt: looking out the train window in the city suburbs by alena aenami, pov, you can see the seats in front of you
- Aspect Ratio: 16:9
- Upscaled
AI Art model review
SDXL
For the local models that I ran, Fooocus was my tool of choice. Its optimizations are all very nice and allow for a user experience that is similar to what Midjourney and Dalle offer. The issue is that because you can run any open source SDXL models on it, it may not be optimized to run them correctly. This happened with Datavoid’s Proteus model, where the outputs were noisy and incoherent for all the different settings that I tried. Mobius worked the best, as it captures the warm colors that I like to have in my art and does fairly well with the Alena Aenami style that I tend to like. The Juggernaut model is meant for more realistic generations I feel like, or maybe I just don’t know how to prompt it well. All the models were unable to capture the framing that I wanted for the image (and to be fair, Dalle and MJ didnt really either).
I should also say that very often the art that I want to create has no people in it, since I am usually creating a piece where the viewer is the person in the scene, and I want to place the viewer in a particular location or evoke some feeling from them that feels very personal. Based on what the majority of the images generated by AI are, this means that I am in a minority, as most people seem to want people in their images, and those people in the image are the focal point of the piece. There is nothing wrong or bad about this, but I feel like models are almost overfit to have a person prominently displayed, and it makes generating what I want a bit harder because of it.
Dalle 3
I have never really liked Dalle 3. Dalle 2 was at least a bit interesting because of the abstract designs it would sometimes come up with, but Dalle 3 has always felt very plasticky to me. Everything was smooth and cartoonish, and not being able to use artists names in the prompt, or any IP whatsoever made it a pain to prompt. I don’t see myself going back to Dalle anytime soon, the only pro is that its free.
Midjourney
Got the Alena Aenami style down well, had the best instruction following and framing, still wasn’t right though, but it was the best that I got today.
Final thoughts
- I need to get better at prompting
- Use ComfyUI or raw dog diffusers when using open source models
- I feel that the open source models can be best if you use the fact that you can continually generate images with them for just the cost of electricity
- I want to be able to give an LLM the idea of what I want, and then it goes and creates 100 variations for me, that I can then go and rate the top versions of, and then it can go and create more until I have an image that I am satisfied by (kinda what chatGPT does)
- Can we use steering vectors to fine tune the model in real time?
- Preference classifier to automatically discard bad generations?
- Is there existing data for this yet?
- I want to be able to give an LLM the idea of what I want, and then it goes and creates 100 variations for me, that I can then go and rate the top versions of, and then it can go and create more until I have an image that I am satisfied by (kinda what chatGPT does)
- I am exhausted rn which may factor in to the low quality of all results