Six models. Each genuinely good at something different. The landscape shifts monthly, but in April 2026, these are the ones that matter and the jobs they are best at.

The quick version

If you want the answer before the explanation:

StrengthModelWhy
Highest arena scoreGPT Image 1.5Leads Artificial Analysis (Elo 1,265). Best instruction-following.
Best value + 4K resolutionNano Banana 2Near-parity quality at half the cost. Native 4K. 14 reference images.
Cinematic aestheticMidjourney V7 / V8 AlphaDramatic compositions no other model matches. V8 adds native 2K and better text.
Photorealism + anatomyFlux 2Skin texture, fabric detail, anatomical precision. Strong with camera prompts.
Text in imagesIdeogram 3.0Built specifically for legible, styled text in images.
Full control, self-hostedStable Diffusion 3.5 / Flux 2 KleinOpen weights, no per-image cost, deepest customisation ecosystem.

Now the detail.


Nano Banana (Google)

The name started as a joke. A Google DeepMind product manager submitted their model anonymously to a public arena at 2:30 in the morning and needed a pseudonym. “Nano Banana” stuck when the model topped the leaderboard. Google eventually embraced it.

There are two versions that matter:

Nano Banana 2 (built on Gemini 3.1 Flash Image) is the everyday workhorse. Fast — 4 to 6 seconds at standard resolution — and surprisingly capable. It has a unique feature called Image Search Grounding: during generation, it retrieves real images from Google Search and uses them as visual context. This noticeably improves accuracy for real-world subjects like landmarks and brand logos. It accepts up to 14 reference images (10 object + 4 character) in a single generation. Output goes up to 4K.

Nano Banana Pro (built on Gemini 3 Pro Image) trades speed for polish. Richer textures, more natural lighting, better spatial composition. Slower — 10 to 20 seconds at standard resolution — but the results look like they came from professional design software. Accepts up to 11 reference images (6 object + 5 character). Same 4K ceiling. Pro has text-based Search Grounding (pulling factual information from Google Search) but not the Image Search Grounding that Nano Banana 2 has.

Both are thinking models. This is the detail that matters most for prompt engineering. Unlike Flux or Stable Diffusion, which process your prompt as a single conditioning signal, Nano Banana Pro and NB2 have a reasoning chain between your prompt and the generated image. They interpret, plan, and make compositional decisions before rendering. Pro even generates up to two interim images internally to test composition before producing the final output. This is why structured, phased prompts — where you walk the model through sequential design decisions — produce dramatically better results with Nano Banana than with non-thinking models. We break this down in detail here.

The difference: Pro has an edge in absolute quality — richer textures, better lighting, deeper reasoning. Nano Banana 2 is 3 to 5 times faster, half the price, has Image Search Grounding, and accepts more reference images. For most creative work, start with Nano Banana 2 and move to Pro when the brief demands the polish.

Pricing: Via Google’s API, roughly $0.067 per image (NB2) or $0.134 per image (Pro). Free access through Google AI Studio with limits.

Where to use it: Google AI Studio (free tier available), Vertex AI API, the Gemini app, or multi-model platforms like Flora Fauna.


Flux (Black Forest Labs)

Flux is the photorealism model. Built by the team behind Stable Diffusion, Black Forest Labs raised $300 million in December 2025 and released Flux 2 a month earlier.

The current lineup:

  • Flux 2 Max — highest quality, includes web-grounded generation
  • Flux 2 Pro — production-grade, the one most professionals use
  • Flux 2 Flex — optimised for text rendering in images
  • Flux 2 Dev — open weights on Hugging Face, non-commercial licence
  • Flux 2 Klein (4B and 9B) — small, fast models for consumer hardware

What Flux does best: Photorealism. Skin texture, fabric detail, anatomical accuracy, product photography. If you need an image that could be mistaken for a photograph, Flux is the first choice. The LoRA ecosystem is deep — thousands of fine-tuned adapters for specific styles, characters, and aesthetics on Civitai and Hugging Face.

Where it falls short: Flux is not a thinking model. It processes your prompt as a single conditioning signal without reasoning through it. This means keyword-style prompts work fine, but complex structured prompts (phased workflows, persona assignment, brand intelligence) do not produce the same leap in quality that they do with Nano Banana. No consumer-facing interface. The best models are API-only. Running Flux locally requires serious hardware — the full model needs around 33 GB of VRAM, and even with quantisation you need at least 16 GB. The open-weights models have non-commercial licences (except Klein 4B, which is Apache 2.0).

Pricing: Credit-based through the BFL API. Roughly $0.10 per image for Flux 2 Pro. Klein starts at $0.014. Also available through hosted platforms like Replicate, Fal.ai, and Flora Fauna.


Midjourney

Midjourney V7 has been the default model since June 2025. V8 Alpha launched in March 2026 on alpha.midjourney.com — not yet available on the main site or Discord, but already showing significant improvements. The platform is no longer Discord-only — there is now a proper web interface at midjourney.com.

What Midjourney does best: Art. Cinematic compositions, dramatic lighting, emotional depth, editorial illustration, game concept art. No other model consistently produces images with this level of aesthetic intentionality. V7 introduced personalisation profiles that learn your visual preferences over time. V8 Alpha builds on this with 4 to 5 times faster generation, native 2K resolution (via the --hd parameter), and significantly improved text rendering — text placed in quotation marks within your prompt now renders with much greater accuracy.

Midjourney’s other great advantage is its style reference system. --sref codes are numeric addresses that carry a whole aesthetic — palette, light, grain, framing — so you can paste a number and skip the long stylistic paragraph that other models need. The Midjourney style reference pack is a curated set of thirty-six codes organised by aesthetic family with every preview — the fastest way to stop typing long stylistic paragraphs and start painting with a vocabulary.

Where it falls short: Even with V8’s improvements, text rendering still trails Ideogram and Nano Banana for complex multi-line typography. Character consistency degrades past 3 to 5 images in a series. And there is no free trial — you pay before you see results. Like Flux, Midjourney is not a thinking model — it does not reason through your prompt the way Nano Banana does.

There is also the aesthetic question: Midjourney outputs have a distinctive cinematic polish. Beautiful, but recognisable. If the brief calls for raw, documentary, or deliberately imperfect visuals, this aesthetic can work against you.

Pricing: Basic $10/month, Standard $30/month, Pro $60/month, Mega $120/month. Annual billing saves 20%.

Access: midjourney.com (web) and Discord. V8 Alpha at alpha.midjourney.com. No public API — Midjourney is not available on third-party platforms or multi-model tools.


GPT Image 1.5 (OpenAI)

DALL-E is gone. OpenAI deprecated DALL-E 3 on March 4, 2026. GPT Image 1.5, introduced in December 2025, is their current model.

It currently leads the Artificial Analysis Text-to-Image Arena with an Elo rating of 1,265 — the highest of any tested model. It also leads the Image Editing Arena.

What GPT Image does best: Following complex instructions. Multi-step compositions, spatial reasoning, in-image text. It is the most reliable model for “generate exactly what I described.” The multi-modal integration means it can reason about images, not just generate from text.

Where it falls short: The outputs have a characteristic commercial polish — clean, professional, but sometimes visibly artificial. Resolution caps below what Google offers. Multi-face complex scenes can produce degraded faces. Sequential character consistency is improving but not yet reliable enough for comic strips or storyboards.

Pricing: $0.009 per image (low quality) to $0.133 per image (high quality) through the API. Also available in all ChatGPT paid plans.


Ideogram 3.0

Released March 2025 and still current. Ideogram’s entire identity is text in images.

What Ideogram does best: Generating legible, styled, accurate text within images. Posters, social media graphics, advertisements, book covers — anything where words need to appear inside the image and look intentional. Version 3.0 added style references (upload up to 3 reference images), batch generation from CSV files, and inpainting.

Where it falls short: Portrait rendering. Skin textures can appear unnatural, proportions can be inconsistent. For non-text-heavy scenes, other models outperform it on photorealism and artistic quality. Complex fantasy compositions can be unpredictable.

Pricing: Free tier available (roughly 40 images per day). Plus at $15/month, Pro at $48/month.

Access: ideogram.ai (web), API available. Also available on Flora Fauna.


Stable Diffusion

Still relevant — but for a specific reason. Stable Diffusion’s advantage in 2026 is not raw output quality (Flux and GPT Image surpass it on benchmarks). Its advantage is control.

SD 3.5 is the current version. Fully open weights with a community licence — free for any use unless your company earns over $1M annually. SDXL (the previous generation) remains the most widely deployed version due to its ecosystem depth: tens of thousands of fine-tuned checkpoints, LoRAs, and ControlNets on Civitai and Hugging Face.

What Stable Diffusion does best: Customisation. If you need a model fine-tuned to your specific art style, trained on your product catalogue, or configured for a niche aesthetic that no general model handles — Stable Diffusion is the only practical choice. No per-image costs. Full control over the generation pipeline.

Where it falls short: The learning curve is real. ComfyUI’s node-based interface is powerful but intimidating. Local running requires a dedicated GPU. And the gap between SD 3.5’s base output quality and the leading proprietary models is visible without fine-tuning.

Pricing: The model is free. Hosted access through platforms like Replicate or Flora Fauna charges per-image.


What the benchmarks say

The Artificial Analysis Text-to-Image Arena ranks models by blind user preference votes:

RankModelEloApproximate cost per image
1GPT Image 1.51,265$0.13
2Nano Banana 21,258$0.07
3Nano Banana Pro1,215$0.13
4Flux 2 Max1,201$0.07

Midjourney and Ideogram do not participate in this arena, so their absence is not a quality judgment. The arena favours photorealism and prompt adherence. Midjourney’s artistic strengths and Ideogram’s text rendering are not well captured by this methodology.

How to choose

The honest answer: most of these models can do most things. The difference is in the details — and in how you prompt them. If you want the condensed decision tree — “use this model for this kind of job” — see Choosing Your Model, which is the practical lookup table I reach for when a brief lands on my desk.

What actually matters for commercial work:

Thinking vs non-thinking. This is the distinction most people miss. Nano Banana Pro and NB2 are thinking models — they reason through your prompt before generating. GPT Image 1.5 has similar multi-modal reasoning. Flux, Stable Diffusion, and Midjourney process your prompt as a single conditioning signal. The practical impact: if you use structured, phased prompts (persona assignment, sequential design phases, brand intelligence), thinking models produce dramatically better output. If you use keyword-style prompts (“beautiful sunset, 8K, cinematic”), the gap is smaller. Your prompting style should influence your model choice.

Resolution. Nano Banana (both Pro and 2) outputs at up to 4K natively. Flux 2 reaches approximately 4 megapixels (roughly 2000x2000). GPT Image 1.5 caps at 1536px on the long edge. Midjourney V7 produces approximately 1K native with 2x upscaling available; V8 Alpha adds native 2K with --hd. Ideogram 3.0 generates up to 1536px natively. If you need print-ready or large-format work, resolution matters more than any benchmark score — and right now Nano Banana leads.

Reference images. Nano Banana 2 accepts up to 14 reference images (10 object + 4 character) in a single generation. Nano Banana Pro accepts up to 11 (6 object + 5 character). Midjourney has character references and personalisation profiles that encode your aesthetic preferences. Flux has IP-Adapter workflows via ComfyUI. If your work requires consistency across a series — product shots, character designs, brand assets — reference support is the differentiator, not raw quality.

Prompting for realism. Any of the top models can produce photorealistic output with the right prompt engineering. A widely-used community technique: referencing specific film stocks (Kodak Portra 400, Fuji Pro 400H), camera types (iPhone 15 Pro, Leica M11), and shot descriptions (shallow depth of field, golden hour, handheld). This works because these terms connect to large bodies of training data with those aesthetic characteristics. Flux and Midjourney are the most community-tested for camera-optical references. No model maker officially documents this as a feature — it is practitioner knowledge, not a guarantee.

Consistency at scale. Generating one beautiful image is easy. Generating fifty that look like they belong to the same campaign is hard. Midjourney’s personalisation profiles help here. Nano Banana’s reference image support helps. Stable Diffusion’s LoRA fine-tuning is the most controllable option. There is no single winner — it depends on your pipeline.

Text in images. If you need legible, styled text inside the image — a poster, a social graphic, an advertisement — Ideogram 3.0 is still the most reliable. Nano Banana and GPT Image have improved, but Ideogram was built for this.

The practical approach: Most working creatives do not pick one model. They use two or three depending on the brief. Midjourney for mood and concept, Nano Banana for production-quality 4K output, Ideogram when text is involved. If you find yourself switching between platforms constantly, a multi-model workspace like Flora lets you access Nano Banana, Flux, GPT Image, Ideogram, Stable Diffusion, and dozens more from one canvas. Try Flora — 25% off for 12 months →

Copyright and commercial use is its own topic — licensing terms differ across models and change without notice. We will cover this in a dedicated guide.


Art & Algorithms publishes guides, tutorials, and prompt packs at the intersection of art and code. Subscribe for the full archive.