Type "a cat" into Midjourney, DALL-E, and Flux and you get three different cats in three different styles on three different backgrounds. Type a 6-part structured prompt and you get a portfolio-grade render that looks the way you actually imagined it. The gap between those two outputs is not the model — it is the prompt. This guide is the 6-part image prompt formula that closes that gap, with model-specific tuning for the three generators that matter in 2026.
Everything below is built on the official documentation from our general prompt engineering guide, Midjourney's Prompt Basics, the FLUX.2 Prompting Guide from Black Forest Labs, and OpenAI's DALL-E guidance. If you have ever pasted a one-line idea and gotten a one-line image back, the rest of this article is the fix. For a side-by-side comparison of how the same prompt lands on ChatGPT vs Claude vs Gemini, see our Claude vs GPT-5 vs Gemini breakdown — image generation is a separate but related skill, and the same model-specific thinking applies.
Why "A Cat" Is Not a Prompt
Image generators are not search engines. They do not retrieve "a cat" — they synthesize a cat from noise, conditioned on every word in your prompt. A vague prompt hands the model 95% of the creative decisions. A structured prompt hands the model exactly the 4-6 decisions that matter and lets it spend the rest of its capacity on rendering quality. The result of the second kind of prompt is sharper, more on-brief, and closer to what you actually pictured.
Midjourney's own docs make the point sharper: "Short and simple prompts typically generate the best images with Midjourney." The trick is that "short" does not mean "vague." It means "every word does work." The FLUX.2 Prompting Guide agrees: priority order is main subject → key action → critical style → essential context → secondary details. Every word in a good image prompt is on that priority list.
Black Forest Labs puts a hard cap on it: the FLUX.2 prompt length guidance is 2-3 sentences, with the most important elements first. OpenAI's DALL-E guidance, by contrast, tolerates (and rewards) longer natural-language descriptions. The three models want the same information, but they want it in three different shapes. That is the whole game.
The 6-Part Image Prompt Formula
Before we go model-by-model, here is the universal formula. It works on Midjourney, DALL-E, and Flux (with the model-specific tweaks in the next sections). Use it as a checklist whenever you sit down to write a prompt — most of the time, the difference between a weak output and a strong one is whether you filled in all 6 parts.
- Subject — the main object, person, animal, or scene. Be specific: "a calico cat" beats "a cat". "three children in raincoats" beats "kids."
- Action / Pose — what the subject is doing. "mid-leap," "looking over its shoulder," "hands in pockets." If you skip this, the model guesses and the guess is usually "standing still and looking at the camera."
- Environment / Setting — where the subject is. "on a wet cobblestone street at dusk," "in a glass terrarium," "floating in zero gravity."
- Lighting — how the scene is lit. "soft window light from camera left," "harsh neon underlight," "golden hour backlight." Lighting is the single biggest factor in mood and the easiest one to skip.
- Style / Medium — the artistic rendering. "shot on 35mm film," "watercolor illustration," "3D isometric render in Blender," "Studio Ghibli animation cel."
- Composition / Framing — where the camera is. "wide shot, low angle, rule of thirds," "extreme close-up of hands," "top-down flat lay."
💡 The 6-part rule: If you can fill in all 6 parts, you have a professional image prompt. If you can only fill in 2-3, the model will fill in the rest — and it will guess generic. Specify what you can, leave the rest to the model, and be honest about which parts matter most for the shot.
Here is the same 6-part formula applied to a real prompt. Read it left-to-right and notice how every part has a job:
A calico cat (Subject)
mid-leap off a wooden fence (Action)
in a sunlit cottage garden full of lavender (Setting)
lit by warm golden-hour backlight (Lighting)
shot on 35mm film with shallow depth of field (Style + Composition)
--ar 3:2 --v 7 (Midjourney parameters)
That single prompt would land well on all three models. The next three sections show you how to tune it for each one specifically.
How to Write the Best Midjourney Prompt
Midjourney is tuned for short, evocative, comma-separated prompts with a clear subject, medium, and mood. The official Prompt Basics doc is explicit: "Short and simple prompts typically generate the best images." But "short" means every word pulls weight, not that the prompt is underspecified. The trick is to think in modifiers, not sentences.
1. Lead with the Subject and Action
Midjourney weighs the first few words the most. Put the subject and what it is doing at the front, then layer on style and environment. The model treats modifiers as ranking cues — earlier words dominate the output, later words adjust.
A weathered sailor in a yellow slicker, steering a fishing boat through a storm, cinematic photograph, dramatic lighting, ocean spray, shot on 35mm film, --ar 16:9 --v 7 --style raw
The --style raw parameter reduces Midjourney's default beautification and gets you closer to the prompt you actually wrote. Use it for product shots, technical illustrations, and any time the default looks "too polished."
2. Use the --no Parameter Instead of "No" Words
The Midjourney docs are clear: "Describe what you do want instead of what you don't. If you mention a party with 'no cake,' a cake might still appear." Negations are unreliable in diffusion models. Use the --no parameter to exclude elements you do not want in the output.
Editorial portrait of a female CEO in a minimalist office, soft window light, 85mm lens, neutral expression, no smile --no smile teeth makeup
The --no parameter is a hard exclusion — Midjourney actively steers the image away from those tokens. The list should be short (3-5 terms) and specific. Do not try to exclude abstract concepts like "boring" or "generic." Stick to concrete visual elements.
3. Stack Style References with --sref for Visual Consistency
Midjourney's --sref (style reference) parameter lets you anchor a generation to the visual language of another image. For brand work, character sheets, and product families, this is the most underrated tool in the suite. Pick a single reference image whose style you want to imitate, upload it, and pass its URL with --sref URL.
A minimalist product hero shot of a ceramic coffee mug on a marble surface, soft natural light, lifestyle photography --sref https://example.com/your-brand-style.jpg --ar 4:5 --v 7
For a series, use the same --sref URL across 20-30 prompts and the output will hold a consistent visual language — same color grade, same lighting feel, same level of polish. This is the closest thing Midjourney offers to a "brand kit," and it is the highest-leverage tool for any work that ships in volumes.
4. Use Multi-Prompts and Weights for Fine Control
Midjourney's :: syntax lets you weight parts of a prompt. A part with ::2 is twice as influential as a part with no weight. This is the right tool when one element needs to dominate the output and another needs to be a background detail.
red fox in a snowy forest::2 autumn leaves on the ground::1 soft bokeh background::0.5 --ar 3:2
The first part (red fox) is the dominant subject, the second (autumn leaves) is mid-priority, the third (bokeh) is a subtle background modifier. Read the weights as a hierarchy, not as a recipe.
How to Write the Best DALL-E Prompt
DALL-E is the most "conversational" of the three image generators. It accepts natural-language paragraphs and tends to follow longer, more descriptive prompts better than Midjourney or Flux. OpenAI's guidance: be specific about the scene, the lighting, the camera, the mood, and the style, and let the model interpret the prose.
1. Write a Scene Description, Not a Tag List
The DALL-E 3 prompting community has converged on a clear pattern: paragraphs beat tag lists. A scene description reads like a film director's shot list — subject, action, environment, lighting, camera. The model interprets the prose, infers the relationships, and produces a coherent image.
A cinematic wide shot of a lone astronaut standing on the edge of a red Martian canyon at sunset. The astronaut's visor reflects the canyon walls. The sky is dusty pink and amber. Low camera angle, looking up at the astronaut. Photorealistic, shot on ARRI Alexa, anamorphic lens flare.
Notice the structure: opening framing (cinematic wide shot), subject (astronaut), setting (Martian canyon at sunset), specific detail (visor reflection), color (dusty pink, amber), composition (low angle, looking up), rendering (photorealistic, ARRI Alexa). DALL-E 3 handles this kind of paragraph better than a comma-separated tag list.
2. Specify Style and Reference Artists Carefully
DALL-E will produce images in the style of named artists, art movements, and even specific films — but the results depend heavily on the model's training data. Photography styles (35mm, polaroid, shot on iPhone, National Geographic) work very reliably. Art movements (Bauhaus, art deco, impressionism) work well. Named living artists are often rejected or produce inconsistent results. If you want a 1940s noir look, say so explicitly: "1940s film noir, high-contrast black and white, venetian blind shadows, femme fatale silhouette." That works. "Like that movie" does not.
3. Avoid "Don't" — Use "Do" Instead
Like Midjourney, DALL-E handles negations unreliably. Saying "no text" or "no watermark" often produces text and watermarks. The fix is to say what you want instead of what you do not want, and to make the positive instruction as concrete as possible.
A clean product photograph of a white ceramic vase on a seamless light gray background, soft even studio lighting, no visible text, no logos, no props, no reflections
Notice the trick: the prompt says "no text, no logos, no props, no reflections" — but those words are surrounded by positive framing ("clean product photograph," "seamless light gray background") that anchors the model in the do-state. Pure negation produces unpredictable results; positive framing with selective negative qualifiers is much more reliable.
4. For Text in Images, Use Quotation Marks and Specify the Font
One of the most-requested DALL-E features is reliable text rendering. The community guidance: use quotation marks around the literal text, specify a font style, and keep the text short (3-5 words is the sweet spot).
A vintage travel poster for Paris, the text "PARIS" in large bold art-deco serif letters at the top, "BONJOUR" in smaller letters below, soft watercolor background of the Eiffel Tower at sunset
For longer text, DALL-E still struggles. If you need a poster with a paragraph of body copy, the right workflow is to render the image without text, then composite the text in a tool like Figma or Photoshop. Treating DALL-E as the image layer and a design tool as the typography layer is the reliable production workflow.
How to Write the Best Flux Prompt
FLUX (Black Forest Labs) is the newest of the three, and its prompt format reflects that. The official FLUX.2 Prompting Guide is explicit: "No negative prompts. FLUX.2 does not support negative prompts." And: "FLUX.2 generates photorealistic images from simple, natural language prompts." The format is closer to DALL-E (natural language paragraphs) than to Midjourney (tag lists with parameters).
1. Use the Subject + Action + Style + Context Framework
Black Forest Labs' official guidance for FLUX.2 is a clean four-part framework: Subject + Action + Style + Context. Priority order is main subject → key action → critical style → essential context → secondary details. Put the most important elements at the start of the prompt; let the modifiers fall where they may.
A determined young woman (Subject) climbing a sheer rock face (Action), in cinematic wide-angle style with golden hour backlighting (Style), in a misty mountain valley at sunrise (Context)
The four parts are easy to write quickly, and the priority order is the same as the priority order of attention in the model. The result is a prompt that the model can render with high fidelity on the first try.
2. Use Hex Codes for Brand Colors
One of FLUX.2's most underrated features: it understands hex codes. The official docs are explicit: "Use hex codes for brand text: 'The logo text "ACME" in color #FF5733'." For brand work — logos, packaging, marketing assets — this is the cleanest way to lock in a specific color.
A minimalist product render of a stainless steel water bottle on a pure white background, the brand text "HYDRO" in bold sans-serif letters in color #2A9D8F on the side of the bottle, soft studio lighting, no shadows
Compare that to the DALL-E version of the same prompt, which would have to describe the color as "teal green" or "muted blue-green" and would produce variable results. The hex code removes ambiguity.
3. Do Not Use Negative Prompts (Use Positive Framing)
FLUX.2 does not support the --no style negative prompt. To exclude something, describe the output you want in positive terms — and the model will not render the unwanted element if the positive description is strong enough. "A clean product photograph on a pure white background, no visible shadow, no other objects, no people" works because the positive framing ("clean," "pure white," "no shadow") anchors the model. The negations reinforce; they do not lead. Same approach as DALL-E, same reason it works.
4. For Photorealism, Specify Camera and Lens
FLUX.2 is trained heavily on photographic data, and the model's strongest photorealism comes from prompts that name the camera, lens, and lighting setup. The community has converged on a clean pattern: "shot on [camera] with [lens] under [lighting]".
A candid street photograph of an elderly man feeding pigeons in a European city square, shot on a Leica M11 with a 35mm Summicron lens, late afternoon natural light, shallow depth of field, grain
The named camera (Leica M11) and lens (35mm Summicron) trigger specific photographic training data in the model. The result is a more "photographic" image than a generic "candid street photograph" would produce. This is the single highest-leverage technique for FLUX.2 photorealism, and it composes with the next section: same prompt, three different formats, three different results.
Side-by-Side: The Same Prompt, Three Models
Here is a real test. Same idea, three models, three different prompt formats. The differences are immediate and instructive.
The Test Idea
A tired barista in a small independent coffee shop at 6am, golden hour light through the front window, steaming espresso in hand, candid documentary photography style.
Midjourney Output
Tired barista holding a steaming espresso, early morning, golden hour sunlight through window, documentary photography, 35mm film, shallow depth of field --ar 3:2 --v 7 --style raw
Verdict: Short, comma-separated, parameters at the end. Midjourney will produce a moody, film-grain image with strong directional light. The --style raw keeps it from looking "too AI." The short format is exactly what Midjourney is tuned for.
DALL-E Output
A candid documentary-style photograph of a tired barista in a small independent coffee shop at 6am, holding a steaming espresso. Golden hour light streams through the front window, catching the steam. The barista looks directly at the camera with a quiet, end-of-shift expression. Shot on 35mm film, shallow depth of field, warm tones.
Verdict: Natural-language paragraph with explicit scene direction. DALL-E will produce a more "narrative" image — the model interprets the paragraph as a story beat and renders the relationships (steam in light, barista's expression, the window direction). The format is what DALL-E is tuned for.
Flux Output
A tired barista in a small independent coffee shop at 6am, holding a steaming espresso. Golden hour light streams through the front window, catching the steam. The barista looks directly at the camera with a quiet, end-of-shift expression. Shot on a Leica M11 with a 35mm Summicron lens, warm tones, shallow depth of field.
Verdict: Almost identical to the DALL-E prompt, with one key swap: "shot on 35mm film" → "shot on a Leica M11 with a 35mm Summicron lens." That single change unlocks FLUX.2's strongest photorealism. The rest of the paragraph structure is the same — natural language, four-part framework, explicit lighting.
🔍 Reading the test: The three prompts are 80% identical. The differences are the parts each model is tuned for: parameters and modifiers (Midjourney), paragraph-form scene description (DALL-E), camera + lens specifics (Flux). The skill is not writing three different prompts — it is writing one prompt with the right shape for the model you are using.
The Unified Template That Works on All Three
If you have to write one prompt that runs on any of the three models — a common situation in production apps that route to different providers for cost, latency, or capability — use this template. It is built on the 6-part formula from our Claude prompting guide (which applies to image prompts too — Claude and other LLMs are great for expanding a one-line idea into a structured image prompt), and it is tuned to avoid the model-specific failure modes above.
SUBJECT (1 line): [Main subject + key identifying detail]
ACTION (1 line): [What the subject is doing, in a specific verb]
ENVIRONMENT (1 line): [Where the subject is, with 1-2 sensory details]
LIGHTING (1 line): [Direction, quality, and color of the light]
STYLE (1 line): [Camera/lens, art movement, or rendering style]
COMPOSITION (1 line): [Framing, angle, depth of field]
PARAMETERS (optional, Midjourney only): --ar 16:9 --v 7 --style raw
NEGATIVE (optional, Midjourney only): --no [3-5 specific terms to exclude]
Fill in the six labeled lines, leave the last two for Midjourney. The result is a prompt that produces a strong output on all three models without rewriting for each.
3 Image Prompt Mistakes to Avoid
Even with a great template, three patterns consistently produce bad output. Spot them before you burn 50 generations.
Mistake 1: Vague Subjects
"A person" is not a subject — it is a category. The model has to invent the person from scratch, and the result is generic. A real subject has at least three identifying details: who they are, what they are wearing, and one piece of context. "A tired barista in a green apron, holding a steaming espresso." That is a subject. "A woman in a coffee shop." That is not.
Mistake 2: Skipping Lighting
Lighting is the single biggest factor in mood, and it is the easiest one to skip. A prompt that says nothing about lighting gets the model's default — flat, even, "product photography" lighting. If you want a mood (moody, dramatic, hopeful, warm), you have to describe the light. "Soft window light from camera left" is better than "moody." Specific lighting cues produce specific results.
Mistake 3: Trying to Get Text in the First Generation
Text rendering is the worst-supported feature in image generation. Midjourney is best at short words in stylized fonts. DALL-E handles short quotes in quotation marks. Flux is best with brand colors and hex codes. None of them are reliable for body copy. If you need a poster with a paragraph of text, the right workflow is to render the image first, then composite the text in a design tool. Trying to get 15 words right in a single generation is the fastest path to 50 wasted credits.
🚀 Test in 60 seconds: Pick one of the templates above, fill in the six lines for a real product or scene, then run it on Midjourney, DALL-E, and Flux side-by-side. Use the same subject and lighting across all three; the only difference should be the format. The model that produces the closest match to your reference is the one to standardize on — and the format that produced it is the one to save as a reusable template.
Putting It All Together
The 6-part formula — subject, action, environment, lighting, style, composition — is the spine. Model-specific tuning is the muscle. Midjourney wants short, comma-separated modifiers with parameters. DALL-E wants natural-language scene descriptions. Flux wants natural language plus specific camera and lens vocabulary. Pick the model by the use case, pick the format that fits the model, and run the 6-part checklist before you generate. One formula, three models, no magic keywords.
Try these techniques in PromptLab. Use the prompt builder to draft the 6-part formula, run it against multiple image generation models, and save the winning prompt as a reusable template.
Related: 9 Tips to Write a Claude Prompt That Actually Works — the deep-dive on Claude-specific prompting with 9 practical rules, bad/good examples, and a copy-paste template that works just as well for expanding a one-line idea into a structured image prompt.