Skip to content

Visual Quality Control (The Exorcist) 🧛‍♀️

Zirelia includes an advanced AI-based quality control system to prevent posting uncanny, distorted, or "cursed" images.

1. The Image Critic (critic.py)

Every image generated by Replicate is analyzed by GPT-4o-mini (Vision) before it is posted. The Critic checks for: * Anatomical Horrors: Heads turned 180° (Exorcist style). * Hand Distortion: Too many fingers, claw hands, or impossible grips. * Face Melt: Distorted facial features.

Workflow

  1. Generate: Image is created via Replicate (FLUX).
  2. Verify: URL is sent to OpenAI Vision API with a strict prompt.
    • PASS: Image is approved for posting.
    • REJECT: Image is discarded.
  3. Retry: If rejected, the system waits 3 minutes (to reset API limits) and tries again (max 3 attempts).

2. Safety Injection (Prompt Optimization)

To save costs and reduce rejection rates, the system proactively modifies prompts that are known to cause issues.

The Problem: Hands & Cups

AI models struggle with hands holding objects like coffee cups. This often results in alien fingers or floating mugs.

The Solution: _optimize_prompt_for_safety

When the system detects keywords like coffee, cup, latte, or drink, it automatically injects a safety modifier to simplify the composition:

Original Concept Safety Injection (Randomized) Result
"drinking coffee" cup resting on table next to her No hands visible (Safe)
"holding a latte" drinking from cup, close up face, hands out of frame Hands hidden (Safe)
"morning coffee" holding cup with both hands, detailed fingers High Detail (Risky but better)

This reduces the "hallucination rate" by avoiding complex hand-object interactions when possible.

Configuration

This feature is enabled automatically if OPENAI_API_KEY is present. To disable it (not recommended), you can remove the key or modify core/image_gen/pipeline.py.

3. Creative Expansion (The Muse) 🎨

To avoid repetitive images (e.g., getting the same "Morning Coffee" shot every time), the system now uses an LLM to expand simple topics into unique, detailed visual descriptions before generation.

Example: * Original Topic: "Morning coffee" * Expanded Prompt: "Sienna sitting on a sunlit balcony wearing a silk robe, holding a ceramic mug with both hands, soft morning haze, ocean view in background, candid smile."

This ensures variety while maintaining the persona's vibe.