MovieLenz: An AI-powered framework for video quality evaluation and prompt optimization

Aman_Tyagi · November 21, 2025, 7:51pm

Authors: Aman Tyagi and and Hemanth Boinpally, Generative AI Specialists, Google Cloud

The challenge of evaluating generated videos

As text-to-video AI models become increasingly sophisticated, a critical question emerges: How do we objectively evaluate whether a generated video matches our prompt? While human evaluation works at small scales, it quickly becomes impractical when you need to iterate on prompts or evaluate thousands of generated videos.

This is where MovieLenz comes in — a comprehensive framework that combines video quality assessment, prompt optimization, and iterative refinement to help you generate better videos with AI.

What is MovieLenz?

MovieLenz is an open-source Python framework that provides three core capabilities:

Video Evaluation: Assess video quality against text queries using UVQ (Universal Video Quality) and VQA (Video Quality Assessment) metrics
Prompt Creation: Generate optimized prompts for video generation using AI-powered scene understanding
Prompt Optimization: Iteratively refine prompts based on actual video generation results

MovieLenz code base can be found here.

Think of it as a complete feedback loop for video generation: create a prompt, generate a video, evaluate how well it matches your intent, then refine and improve.

Fig: MovieLenz modes of operation.

The architecture: How it works

MovieLenz combines several AI and machine learning components into an elegant workflow. At its core, the system takes your prompt and video, generates structured evaluation questions, and produces both quantitative metrics and qualitative feedback.

Fig: MovieLenz architecture.

The beauty of this architecture lies in its modularity. The runner.py module orchestrates the entire workflow, managing temporary files and coordinating between components. The
prompt_optimizer_main.py module handles question generation, while evaluate_media.py brings together both technical (UVQ) and semantic (VQA) evaluation.

The secret sauce: DSG and CMQ questions

One of MovieLenz’s unique innovations is its use of two types of structured questions that break down your prompt into testable components:

DSG (Davidsonian Scene Graphs)

Hierarchical questions that break down your prompt into specific elements — agents, actions, objects, locations, and attributes. For example, if your prompt is “A chef cooking pasta in a professional kitchen,” DSG questions might include:
— “Is there a chef present?”
— “Is the chef actively cooking?”
— “Is pasta visible in the scene?”
— “Does the setting appear to be a professional kitchen?”

CMQ (Common Mistake Questions)

Questions designed to catch typical errors in video generation, like static scenes, incorrect actions, or missing elements. These help identify when the AI model has misunderstood or failed to execute parts of your prompt. For instance:
— “Is the chef moving (not static)?”
— “Is the chef cooking pasta (not other foods)?”
— “Is a kitchen environment visible (not outdoors)?”

This dual question approach ensures comprehensive evaluation from both positive (what should be there) and negative (what shouldn’t be wrong) perspectives.

Dual metrics for comprehensive evaluation

MovieLenz doesn’t rely on a single metric. Instead, it provides two complementary assessments:

UVQ (Universal Video Quality) (reference)

A PyTorch-based neural network that analyzes:
— Compression quality: How well the video maintains visual fidelity
— Content quality: The richness and detail of the scene
— Distortion metrics: Technical quality issues

VQA (Video Quality Assessment)

An LLM-powered evaluation that provides:
— Evaluation Score: A 0–1 ratio indicating prompt adherence
— Detailed Feedback: Natural language explanation of what works and what doesn’t
— Refined Prompt: An improved version of your original prompt

Together, these metrics give you both the technical quality (Is this a good video?) and semantic accuracy (Does it match my prompt?).

Three modes for every use case

Mode 1: Evaluate

Already have a video? Get comprehensive quality metrics:

python main.py evaluate \
 - query "An old man frustrated about cleaning lawn" \
 - video-path test.mp4 \
 - image-path oldman_image.png

Output:
 {
 "uvq": {
 "compression_content_distortion": 0.87,
 "compression_content": 0.92,
 "content_distortion": 0.85,
 "compression_distortion": 0.88
 },
 "vqa": {
 "Refined Prompt": "An elderly man with visible frustration…",
 "Evaluation Score": 0.85,
 "Feedback": "The video effectively shows an elderly man…"
 }
 }

Mode 2: Create — AI-powered prompt enhancement

Generate an optimized prompt before video generation. This is where MovieLenz really shines — transforming simple ideas into production-ready prompts.

Fig: Prompt enhancement example used in MovieLenz.

Usage:
python main.py create — prompt “A person running in a park”

The system uses AI to enhance your simple prompt into a detailed, production-ready description complete with camera angles, lighting, movement patterns, and scene composition.

Mode 3: Optimize — The complete feedback loop

The most powerful mode: analyze your generated video and get an improved prompt for the next iteration.

Fig: Iterative prompt optimization loop used in MovieLenz.

Usage:
python main.py optimize
— prompt “Dancing”
— video-path dance.mp4
— start-frame start.png

This mode evaluates your video and provides a refined prompt that addresses any shortcomings, creating a systematic improvement cycle.

Why this matters?

As AI video generation tools like Runway, Pika, and others become mainstream, the bottleneck shifts from “Can we generate video?” to “How do we generate the RIGHT video?”

MovieLenz addresses this by:

Reducing iteration time: Get objective feedback instead of relying solely on human judgment
Improving prompt quality: Learn what makes an effective video generation prompt through concrete feedback
Scaling evaluation: Assess hundreds or thousands of videos programmatically
Closing the feedback loop: Use evaluation results to systematically improve your prompts

Real-world applications

Content creators: Iterate faster on AI-generated video content with measurable quality improvements
Research teams: Benchmark different video generation models objectively with standardized metrics
Production studios: Quality-check AI-generated assets at scale before final delivery
ML engineers: Build automated pipelines for video generation and evaluation workflows

Getting started

MovieLenz is designed to be developer-friendly with both CLI and Python API support.

Installation

git clone <repository-url>

cd movieLenz

python -m venv venv

source venv/bin/activate # On Linux/Mac

pip install -r requirements.txt

Configuration

# Set up Google Cloud authentication
 gcloud auth application-default login
 gcloud config set project YOUR_PROJECT_ID

Programmatic Usage

import runner
results = runner.run_evaluate_media_for_video_path(
 video_query="Cooking demonstration",
 video_path="cooking.mp4",
 input_image_path="chef.png",
 duration=10
 )
print(results)

The framework handles the complexity of video processing, LLM interaction, and quality assessment — you just provide the video and prompt.

The future of video generation evaluation

As AI video generation continues to evolve, frameworks like MovieLenz represent a critical piece of infrastructure. They transform video generation from a trial-and-error process into a systematic, measurable workflow.

The combination of neural network-based quality metrics (UVQ) and LLM-powered semantic understanding (VQA) provides both technical and creative evaluation — a dual approach that mirrors how humans assess video quality but at machine scale.

Open source and ready to use

MovieLenz is open source and ready for you to explore, extend, and integrate into your own projects. Whether you’re building the next great video generation tool or just trying to get better results from existing ones, MovieLenz provides the evaluation infrastructure you need.

Check out the project: Github Link

Sakshi_Sahu · November 25, 2025, 7:53am

@Aman_Tyagi
Movie Lenz is a framework designed to evaluate and optimize AI-generated videos, addressing challenges like quality assessment and prompt optimization

Topic		Replies	Views
Unlocking GenAI excellence: Why automated evaluation is your secret weapon Community Articles googler-article , ai-ml	0	620	May 23, 2025
From idea to viral video: the GenAI workflow you need Community Articles googler-article , best-practices , thought-leadership , ai-ml	1	1221	August 11, 2025
Introducing Adaptive Benchmarks for Evaluating Your RAG Systems on Vertex AI Community Articles gemini , googler-article , vertex-ai-platform , evaluation	5	119	January 6, 2026