Back to Features
AnalysisAI Coaching

Per-Metric Severity: Four-Level Health Assessment for Your Mix

Learn how per-metric severity ratings transform raw measurement values into intuitive four-level health assessments through industry-standard thresholds.

10 min read
Per-Metric Severity: Four-Level Health Assessment for Your Mix

Raw LUFS readings, crest factor values, and true peak measurements mean little without reference knowledge. An engineer checking their first master sees "−8.2 LUFS" and must consult charts, recall streaming guidelines, or compare against reference tracks to know whether that number signals a problem. Per-metric severity ratings eliminate this friction by converting raw measurements into immediate four-level health signals: Good, Caution, Warning, or Critical. Each rating is color-coded, standards-based, and requires no memorization.

What per-metric-severity reveals (and why it matters)

Per-metric severity is an interpretive layer that sits between raw measurement data and user decision-making. Each of the five core metrics—integrated loudness (LUFS), true peak (dBTP), crest factor (dB), loudness range (LU), and stereo correlation—produces a numeric value. Per-metric severity takes that value and classifies it into one of four health levels through hardcoded threshold logic (Source: inputs/articles/per-metric-severity/brief.md#Core message).

The four levels are:

  • Good (green): The measurement falls within healthy commercial range. No action needed.
  • Caution (yellow/amber): The measurement sits outside ideal but does not yet present an urgent issue. Worth monitoring.
  • Warning (orange): The measurement indicates a real issue affecting streaming playback or commercial viability. Needs attention.
  • Critical (red): The measurement reveals a serious problem that must be fixed before distribution.

This system solves a specific problem: engineers should not need to memorize whether −12 LUFS is loud or quiet, whether a crest factor of 6 dB is competitive or problematic, or whether a true peak of −0.8 dBTP risks clipping on streaming platforms. The severity rating provides that context instantly.

Each metric card displays the raw value prominently alongside its severity badge. The border and background color change to match the severity level. An engineer sees "−12.4 LUFS" with a green border and "Competitive level" label and knows immediately that integrated loudness is healthy. Another mix shows "−6.1 LUFS" with a red border and "Critical: Too loud for streaming" and the issue is clear without consulting guidelines.

How per-metric-severity works: technical methodology

Each metric has its own interpret* function containing hardcoded threshold checks. These functions receive a raw measurement value and return a severity level based on a series of boundary conditions. There is no machine learning, no scoring model, and no probabilistic weighting. The system runs pure threshold logic: if the value is above X, return Critical; if between X and Y, return Warning; if between Y and Z, return Caution; otherwise return Good (Source: inputs/articles/per-metric-severity/brief.md#Key accuracy requirements).

The thresholds are not arbitrary. They derive from published industry standards: EBU R128 for integrated loudness, streaming platform guidelines for true peak headroom, engineering convention for crest factor and dynamic range, and phase correlation principles for stereo width. Where standards exist, the thresholds align with them. Where standards are less formal—such as crest factor ranges for genre-appropriate punch versus over-compression—the thresholds reflect commercial practice documented in mastering literature and streaming platform recommendations (Source: inputs/articles/per-metric-severity/brief.md#Page structure sections).

The implementation follows a deterministic model. The same input value always produces the same severity rating. There is no training data, no contextual adjustment based on genre or reference library, and no drift over time. If a mix measures −14.2 LUFS today, it will receive the same severity rating tomorrow and next year unless the threshold definitions themselves are updated to reflect changes in industry standards.

This design choice prioritizes transparency and auditability. An engineer can verify the rating by checking the raw value against documented thresholds. A student learning mix analysis can understand exactly why a measurement received a particular severity rating. A producer reviewing a mastering engineer's work can trace the health signal back to the specific threshold that triggered it.

Interpreting per-metric-severity values and outputs

Each metric card displays five pieces of information:

  1. Raw value: The actual measurement (e.g., "−12.4 LUFS")
  2. Severity badge: A status label indicating health level (e.g., "Competitive level")
  3. Color-coded border and background: Visual signal matching the severity level
  4. Plain-English interpretation sentence: Brief explanation of what the value means
  5. Benchmark note: The target range or threshold for reference

The severity badge and color coding provide the fastest signal. An engineer scanning multiple mixes can identify problems at a glance by looking for orange or red borders. The interpretation sentence provides context for less experienced users or when a measurement sits near a threshold boundary. The benchmark note anchors the rating to a specific industry standard, making the system educational rather than opaque.

The thresholds differ by metric because each measurement represents a different aspect of mix health:

LUFS (integrated loudness)

  • Critical: Below −20 LUFS or above −6 LUFS
  • Warning: −20 to −16 LUFS, or −9 to −6 LUFS
  • Caution: −16 to −12 LUFS
  • Good: −12 to −9 LUFS

The Good range aligns with competitive streaming loudness without risking normalization penalties. The Critical range flags mixes that are too quiet for commercial playback (below −20) or too loud for streaming platforms (above −6), which will trigger aggressive normalization. The Caution range catches mixes that are quieter than ideal but still usable, while the Warning range identifies mixes approaching normalization thresholds (Source: inputs/articles/per-metric-severity/brief.md#Page structure sections).

True peak (dBTP)

  • Critical: Above −0.5 dBTP
  • Warning: −1.0 to −0.5 dBTP
  • Good: Below −1.0 dBTP

True peak measures the highest sample value after reconstruction of the analog waveform. Streaming platforms and broadcast standards require headroom to prevent clipping during lossy encoding. Peaks above −0.5 dBTP risk distortion on most platforms (Critical), peaks between −1.0 and −0.5 leave insufficient headroom for encoding overhead (Warning), and peaks below −1.0 meet industry guidelines (Good).

Crest factor (dB)

  • Critical: Below 4 dB
  • Warning: 4–8 dB
  • Good: 8–14 dB
  • Caution: Above 14 dB

Crest factor measures the difference between peak level and average level, indicating dynamic range and compression intensity. Values below 4 dB signal severe over-compression that flattens transients and reduces impact (Critical). Values between 4 and 8 dB indicate heavy compression that may suit some modern commercial genres but risks listener fatigue (Warning). Values between 8 and 14 dB preserve punch and dynamics appropriate for most commercial music (Good). Values above 14 dB may indicate under-compression or excessive dynamic range that reduces perceived loudness (Caution).

Loudness range (LU)

  • Critical: Below 2 LU
  • Caution: 2–5 LU
  • Good: 5–12 LU
  • Caution: Above 12 LU

Loudness range measures variation in loudness over the track duration. Ranges below 2 LU indicate extreme dynamic compression that removes expression (Critical). Ranges between 2 and 5 LU are tight but may suit high-energy genres where consistency is prioritized (Caution). Ranges between 5 and 12 LU preserve dynamic variation typical of well-mastered commercial tracks (Good). Ranges above 12 LU may indicate inconsistent loudness that disrupts playlist cohesion (Caution).

Stereo correlation

  • Critical: Below 0
  • Warning: 0–0.4 or above 0.92
  • Good: 0.4–0.75
  • Caution: 0.75–0.92

Stereo correlation measures phase relationship between left and right channels. Values below 0 indicate phase cancellation that causes mono compatibility issues and weak low-end (Critical). Values between 0 and 0.4 show excessive stereo width that weakens center image and risks translation problems (Warning). Values between 0.4 and 0.75 balance width and solidity for commercial playback (Good). Values between 0.75 and 0.92 indicate narrow stereo imaging that may lack space (Caution). Values above 0.92 approach mono and lose stereo information (Warning).

These thresholds provide quick-reference boundaries, but the full metric documentation offers deeper context on why each range matters and how to adjust a mix when a measurement falls outside the Good range.

How per-metric-severity integrates with other systems

Per-metric severity ratings function as independent health signals, but they inform two downstream systems: the overall quality tier calculation and the AI coaching pipeline (Source: inputs/articles/per-metric-severity/brief.md#Page structure sections).

The quality tier is a composite score that classifies a mix into one of several production quality categories (e.g., Demo, Rough Mix, Competition-Ready). Four of the five metrics directly influence the tier calculation: integrated loudness, true peak, crest factor, and stereo correlation. Each metric's severity rating contributes to the tier determination. A mix with multiple Critical or Warning ratings will receive a lower tier classification than a mix with all Good ratings. Loudness range informs AI coaching recommendations but does not affect the tier score directly, because dynamic range choices vary significantly by genre and artistic intent.

The AI coaching system uses per-metric severity ratings as diagnostic flags. When a metric receives a Warning or Critical rating, the coaching pipeline generates targeted recommendations explaining why the value is problematic and suggesting specific adjustments. For example, a Critical crest factor triggers advice on parallel compression or transient shaping, while a Warning true peak triggers guidance on limiting headroom and inter-sample peak management.

This integration allows per-metric severity to act as both a standalone interpretive tool and a foundational input for higher-level analysis. An engineer can understand a single metric's health at a glance, or use the severity ratings as entry points into deeper diagnostic workflows.

Practical application and workflow

Per-metric severity ratings fit naturally into three common workflows:

During mixing: Engineers can check severity ratings on individual metric cards as they adjust levels, EQ, and dynamics. A quick glance confirms whether recent changes have pushed a measurement into Warning or Critical territory. This provides real-time feedback without interrupting the creative process. If crest factor drops into the Warning range after adding a limiter, the engineer knows immediately that compression is becoming aggressive and can pull back or adjust attack/release settings.

During mastering: Mastering engineers use severity ratings to verify that a mix meets commercial standards before applying final processing. A mix arriving with Critical integrated loudness or true peak requires correction before mastering can proceed. A mix with Caution ratings in loudness range or stereo correlation may benefit from targeted adjustments but does not block the mastering chain.

During quality review: Producers and mix engineers reviewing multiple mixes can scan severity ratings to prioritize which tracks need attention. A mix with all Good ratings moves forward. A mix with one or two Caution ratings gets flagged for a closer listen. A mix with Warning or Critical ratings returns to the mixing stage with specific issues identified.

The color-coded visual design supports these workflows by making health status scannable. An engineer reviewing ten mixes in a session can identify problems by looking for orange or red borders rather than reading numeric values or consulting threshold charts.

What is per-metric severity? Per-metric severity is a four-level classification system (Good, Caution, Warning, Critical) that interprets raw metric values through hardcoded industry-standard thresholds, providing immediate visual health signals for integrated loudness, true peak, crest factor, loudness range, and stereo correlation.

How does per-metric-severity work? Each metric has an interpret* function containing hardcoded threshold checks based on EBU R128, streaming platform guidelines, and engineering convention. The function receives a raw measurement value and returns a severity level through deterministic logic with no machine learning or probabilistic weighting.

How to interpret per-metric-severity outputs? Each metric card displays a raw value, severity badge, color-coded border (green for Good, yellow for Caution, orange for Warning, red for Critical), plain-English interpretation, and benchmark note. Scan for color-coded borders to identify issues quickly, then read the interpretation sentence and benchmark for context.

Summary and key takeaways

Per-metric severity ratings transform raw measurements into actionable health signals through hardcoded threshold logic anchored to industry standards. The four-level system—Good, Caution, Warning, Critical—provides immediate visual feedback without requiring engineers to memorize target ranges or consult reference charts (Source: inputs/articles/per-metric-severity/brief.md#Core message).

Each metric has metric-specific thresholds reflecting published standards and commercial practice. Integrated loudness thresholds align with streaming normalization guidelines. True peak thresholds match encoding headroom requirements. Crest factor, loudness range, and stereo correlation thresholds reflect mastering convention and genre-appropriate dynamics.

The ratings function as standalone interpretive tools and as inputs to downstream systems including quality tier classification and AI coaching recommendations. They integrate into mixing, mastering, and quality review workflows by making health status scannable through color-coded visual signals.

Key threshold values to remember: LUFS Good range is −12 to −9, true peak must stay below −1.0 dBTP to avoid Warning, crest factor should sit between 8 and 14 dB for most commercial music, loudness range between 5 and 12 LU preserves dynamic expression, and stereo correlation between 0.4 and 0.75 balances width and solidity.