Framework v2.1Last updated April 2026

Methodology

A structured way to measure whether AI-generated images and video are actually deliverable — technically, perceptually, and commercially. Nine dimensions, three gates, automated scorecards.

In video QA, “artifact” often names a defect. HarteFact scores outputs anyway — assets, streams, pixels, facts.

Local-first. No cloud dependencies. Designed to run on Apple Silicon using open-source components. The framework is incremental — each phase produces infrastructure consumed by later phases.

Core principles

Model-agnostic by design

Most metrics measure properties of the output file — resolution, texture, temporal stability, color accuracy, identity consistency — regardless of which model produced it. Scoring does not require recalibration when models change.

Algorithmic vs. AI-evaluated

Every score is labeled algorithmic or ai_evaluated. VLM scores are reported with mean and variance and are never presented as equivalent to deterministic metrics.

Tiered gating

Three gates avoid wasting compute on content that has already failed. A clip with the wrong codec never consumes GPU cycles on identity-drift analysis.

Versioned, reproducible

Every run logs framework version, calibration version, and model versions. Re-evaluations are new runs, not silent replacements. Score history is queryable per asset.

Pipeline architecture

Three gates separate fast, cheap checks from expensive deep analysis. Each gate is a checkpoint — content that fails Gate 1 never gets the expensive Gate 2 analysis, which keeps costs low and feedback fast. Failed content gets immediate, specific feedback identifying the failure dimension — without the cost of downstream scoring.

  1. Gate 1Technical specs
    Dimension 1

    Pass / fail on file specs, codec, resolution, audio packaging.

  2. Gate 2Spatial quality
    Dimension 2

    Pass / fail on catastrophic spatial failures (severe artifacts, banding).

  3. Gate 3Temporal & audio basics
    Dimensions 3 + 4 (parallel)

    Pass / fail on flicker, scene-cut sanity, audio levels, sync offset.

  4. DeepIdentity, lighting, brand, prompt adherence
    Dimensions 5 – 9

    Per-character analysis, scene integrity, client-compliance scoring.

  5. OutputVersioned scorecard

    Pass/fail summary, per-dimension detail, annotated frame thumbnails, timeline visualization, per-frame metric trends, client threshold reference.

The nine dimensions

Each dimension owns a distinct axis of output quality. Build phases follow the dependency map: each phase produces infrastructure later phases reuse, so no work is thrown away.

D01

Technical Delivery Compliance

Phase 1

File specs, codecs, container, color space, VMAF, audio packaging. The non-negotiable foundation.

  • Resolution / frame rate
  • Codec & container
  • VMAF score
  • Color space
D02

Spatial & Texture Integrity

Phase 2

Per-frame visual quality. Compression artifacts, texture noise, banding, VAE seam detection.

  • BRISQUE / NIQE
  • Laplacian sharpness
  • Color banding
  • Wavelet noise analysis
D03

Temporal Consistency & Motion

Phase 3

Stability across frames. Background flicker, optical flow consistency, scene-cut detection.

  • Background SSIM
  • Optical flow
  • Flicker detection
  • Scene cuts
D04

Audio Quality

Phase 4

Loudness, clipping, sync offset. Runs in parallel with the temporal pipeline.

  • LUFS measurement
  • Clipping detection
  • Sync offset
  • Spectral integrity
D05

Lip Sync Precision

Phase 5

Combines mouth aspect ratio (MAR) with audio phoneme timing via DTW alignment.

  • MAR extraction
  • DTW alignment
  • WhisperX phonemes
  • Sync drift over time
D06

Character & Identity Integrity

Phase 6

Face identity drift, hand failures, body proportions, teeth, clothing consistency.

  • InsightFace cosine similarity
  • Hand failure logging
  • Body proportions
  • Skin tone stability
D07

Lighting & Scene Integrity

Phase 7

Shadow coherence, luminance tracking, color temperature stability, reflection plausibility.

  • Shadow masking
  • Luminance per region
  • Color temperature drift
  • Reflection flagging
D08

Brand & Client Compliance

Phase 8

Per-client palette, talent reference, logo placement, LUT comparison, typography.

  • Brand HEX Delta-E
  • Talent face match
  • LUT comparison
  • Logo / wordmark presence
D09

Prompt & Action Adherence

Phase 9

VLM-evaluated framing, composition, physics plausibility, object/spatial flagging.

  • VLM scene description
  • Framing & composition
  • Physics flags
  • Slideshow detection

Includes ai_evaluated scores; reported with mean + variance.

What this framework is not

  • Not a scoring rubric for taste, creativity, or commercial appeal. Aesthetic judgment remains human.
  • Not a model leaderboard. The framework benchmarks output properties; model comparisons are a separate activity built on top of the same infrastructure.
  • Not a SaaS dashboard. Phase 1 ships a local pipeline and a versioned scorecard format, not a hosted product.
  • Not a substitute for human QC on edge cases. The system is designed to scale review, not to replace the final sign-off on high-stakes deliverables.

Print-on-demand extension

A separate addendum extends the framework with print-specific quality metrics: CMYK gamut warnings, ink coverage limits, transparency edge fringing, design placement safety, and pre-generation input validation.

Read the POD addendum

Pilot engagements

Phase 1 (Technical Delivery) and Phase 1b (Identity Consistency) are in active build. We're scoping a small number of pilot engagements with production studios, agencies, and POD operators for the second half of 2026.

Get in touch