Framework v2.1Last updated April 2026

Methodology

A structured way to measure whether AI-generated images and video are actually deliverable — technically, perceptually, and commercially. Nine dimensions, three gates, automated scorecards.

In video QA, “artifact” often names a defect. HarteFact scores outputs anyway — assets, streams, pixels, facts.

Local-first. No cloud dependencies. Designed to run on Apple Silicon using open-source components. The framework is incremental — each phase produces infrastructure consumed by later phases.

Core principles

Model-agnostic by design

Most metrics measure properties of the output file — resolution, texture, temporal stability, color accuracy, identity consistency — regardless of which model produced it. Scoring does not require recalibration when models change.

Algorithmic vs. AI-evaluated

Every score is labeled algorithmic or ai_evaluated. VLM scores are reported with mean and variance and are never presented as equivalent to deterministic metrics.

Tiered gating

Three gates avoid wasting compute on content that has already failed. A clip with the wrong codec never consumes GPU cycles on identity-drift analysis.

Versioned, reproducible

Every run logs framework version, calibration version, and model versions. Re-evaluations are new runs, not silent replacements. Score history is queryable per asset.

Pipeline architecture

Three gates separate fast, cheap checks from expensive deep analysis. Each gate is a checkpoint — content that fails Gate 1 never gets the expensive Gate 2 analysis, which keeps costs low and feedback fast. Failed content gets immediate, specific feedback identifying the failure dimension — without the cost of downstream scoring.

Gate 1Technical specs
Dimension 1
Pass / fail on file specs, codec, resolution, audio packaging.
Gate 2Spatial quality
Dimension 2
Pass / fail on catastrophic spatial failures (severe artifacts, banding).
Gate 3Temporal & audio basics
Dimensions 3 + 4 (parallel)
Pass / fail on flicker, scene-cut sanity, audio levels, sync offset.
DeepIdentity, lighting, brand, prompt adherence
Dimensions 5 – 9
Per-character analysis, scene integrity, client-compliance scoring.
OutputVersioned scorecard
Pass/fail summary, per-dimension detail, annotated frame thumbnails, timeline visualization, per-frame metric trends, client threshold reference.

The nine dimensions

Each dimension owns a distinct axis of output quality. Build phases follow the dependency map: each phase produces infrastructure later phases reuse, so no work is thrown away.

D01

Technical Delivery Compliance

Phase 1

File specs, codecs, container, color space, VMAF, audio packaging. The non-negotiable foundation.

Resolution / frame rate
Codec & container
VMAF score
Color space

D02

Spatial & Texture Integrity

Phase 2

Per-frame visual quality. Compression artifacts, texture noise, banding, VAE seam detection.

BRISQUE / NIQE
Laplacian sharpness
Color banding
Wavelet noise analysis

D03

Temporal Consistency & Motion

Phase 3

Stability across frames. Background flicker, optical flow consistency, scene-cut detection.

Background SSIM
Optical flow
Flicker detection
Scene cuts

D04

Audio Quality

Phase 4

Loudness, clipping, sync offset. Runs in parallel with the temporal pipeline.

LUFS measurement
Clipping detection
Sync offset
Spectral integrity

D05

Lip Sync Precision

Phase 5

Combines mouth aspect ratio (MAR) with audio phoneme timing via DTW alignment.

MAR extraction
DTW alignment
WhisperX phonemes
Sync drift over time

D06

Character & Identity Integrity

Phase 6

Face identity drift, hand failures, body proportions, teeth, clothing consistency.

InsightFace cosine similarity
Hand failure logging
Body proportions
Skin tone stability

D07

Lighting & Scene Integrity

Phase 7

Shadow coherence, luminance tracking, color temperature stability, reflection plausibility.

Shadow masking
Luminance per region
Color temperature drift
Reflection flagging

D08

Brand & Client Compliance

Phase 8

Per-client palette, talent reference, logo placement, LUT comparison, typography.

Brand HEX Delta-E
Talent face match
LUT comparison
Logo / wordmark presence

D09

Prompt & Action Adherence

Phase 9

VLM-evaluated framing, composition, physics plausibility, object/spatial flagging.

VLM scene description
Framing & composition
Physics flags
Slideshow detection

Includes ai_evaluated scores; reported with mean + variance.

What this framework is not

—Not a scoring rubric for taste, creativity, or commercial appeal. Aesthetic judgment remains human.
—Not a model leaderboard. The framework benchmarks output properties; model comparisons are a separate activity built on top of the same infrastructure.
—Not a SaaS dashboard. Phase 1 ships a local pipeline and a versioned scorecard format, not a hosted product.
—Not a substitute for human QC on edge cases. The system is designed to scale review, not to replace the final sign-off on high-stakes deliverables.

Print-on-demand extension

A separate addendum extends the framework with print-specific quality metrics: CMYK gamut warnings, ink coverage limits, transparency edge fringing, design placement safety, and pre-generation input validation.

Read the POD addendum

Pilot engagements

Phase 1 (Technical Delivery) and Phase 1b (Identity Consistency) are in active build. We're scoping a small number of pilot engagements with production studios, agencies, and POD operators for the second half of 2026.

Get in touch