Visual Quality Control (The Exorcist) 🧛‍♀️

Zirelia includes an advanced AI-based quality control system to prevent posting uncanny, distorted, or "cursed" images.

1. The Image Critic (`critic.py`)

Every image generated by Replicate is analyzed by GPT-4o-mini (Vision) before it is posted. The Critic checks for: * Anatomical Horrors: Heads turned 180° (Exorcist style). * Hand Distortion: Too many fingers, claw hands, or impossible grips. * Face Melt: Distorted facial features.

Workflow

Generate: Image is created via Replicate (FLUX).
Verify: URL is sent to OpenAI Vision API with a strict prompt.
- PASS: Image is approved for posting.
- REJECT: Image is discarded.
Retry: If rejected, the system waits 3 minutes (to reset API limits) and tries again (max 3 attempts).

2. Safety Injection (Prompt Optimization)

To save costs and reduce rejection rates, the system proactively modifies prompts that are known to cause issues.

The Problem: Hands & Cups

AI models struggle with hands holding objects like coffee cups. This often results in alien fingers or floating mugs.

The Solution: `_optimize_prompt_for_safety`

When the system detects keywords like coffee, cup, latte, or drink, it automatically injects a safety modifier to simplify the composition:

Original Concept	Safety Injection (Randomized)	Result
"drinking coffee"	`cup resting on table next to her`	No hands visible (Safe)
"holding a latte"	`drinking from cup, close up face, hands out of frame`	Hands hidden (Safe)
"morning coffee"	`holding cup with both hands, detailed fingers`	High Detail (Risky but better)

This reduces the "hallucination rate" by avoiding complex hand-object interactions when possible.

Configuration

This feature is enabled automatically if OPENAI_API_KEY is present. To disable it (not recommended), you can remove the key or modify core/image_gen/pipeline.py.

3. Creative Expansion (The Muse) 🎨

To avoid repetitive images (e.g., getting the same "Morning Coffee" shot every time), the system now uses an LLM to expand simple topics into unique, detailed visual descriptions before generation.

Example: * Original Topic: "Morning coffee" * Expanded Prompt: "Sienna sitting on a sunlit balcony wearing a silk robe, holding a ceramic mug with both hands, soft morning haze, ocean view in background, candid smile."

This ensures variety while maintaining the persona's vibe.