In an extended review of the new CLSI EP21 standard, Dr. Paulo Pereira discusses how this guideline reaffirms total analytical error as the primary principle for quantitative performance assessment.
Clinical stakeholders do not experience “bias” and “imprecision” as separate phenomena - they experience a result that is either accurate enough for a medical decision - or not. EP21 is built around that decision-facing reality by operationalizing total analytical error (TAE) as the integrated expression of analytical deviation for a quantitative measurement procedure (1) and by linking the evaluation to predefined acceptability criteria (2) (see Guest Essay: New edition of CLSI EP46: best practices for determining allowable total error).
This framing is not “philosophy”; it is the practical translation of the two classical analytical pillars - systematic effects and random effects - into an actionable performance conclusion. EP21 therefore functions as a synthesis standard that allows laboratories, manufacturers, and assessors to answer the question that matters most: does this procedure meet clinical needs across the measuring interval?
EP21’s integrated view is fully consistent with metrology’s core vocabulary. In the International Vocabulary of Metrology (VIM), measurement precision is defined as closeness of agreement among indications or measured quantity values obtained by replicate measurements under specified conditions, and measurement bias is an estimate of systematic measurement error (3). These concepts remain the foundation for method validation and verification in laboratory medicine, and they map directly onto the way performance evidence is generated through established CLSI protocols for precision evaluation, verification, and method comparison (4-6).
EP21’s “reaffirmation” is therefore not a return to earlier paradigms; it formalizes the need to translate bias and imprecision into an integrated, decision-relevant statement of analytical acceptability.
A key (and sometimes underappreciated) point in CLSI EP21 (3rd ed.) is that its recommended evaluation of TAE is naturally implemented as a non-parametric (distribution-free) estimate derived directly from the empirical distribution of paired patient-sample differences between a candidate and a comparator measurement procedure (7). In EP21, TAE is commonly defined as the central 95% region of observed differences, operationalized via the 2.5th and 97.5th percentiles (Plow = 0.025; Phigh = 0.975) (1), and then judged against a pre-set allowable total error (ATE) goal/limit (2).
EP21 non-parametric model (percentile-based)
For each unique patient sample i (i = 1…n), compute the paired difference:
di = ycand,i - ȳcomp,i
where ȳcomp,i is the mean of R replicate comparator results for sample i (EP21 explicitly allows R to depend on the imprecision ratio between candidate and comparator).
Then define the TAE interval estimate for a chosen central region (most often 95%):
TAElower=Q0.025(d), TAEupper=Q0.975(d)
EP21 notes that TAE is often reported either as this interval (TAElower,TAEupper)or as a single conservative magnitude such as:
TAE=max(∣TAElower∣,∣TAEupper∣)
and emphasizes that, by design, the EP21 protocol does not aim to separately estimate bias and imprecision from this experiment - it estimates their combined effect as it manifests in patient-sample result differences (while bias/precision can still be evaluated independently in separate studies).
EP21 links the minimum n needed for a non-parametric estimate of the central (1-α)⋅100% region of differences to a simple bound:
n≥2/α
So, for the central 95% region (α=0.05), n ≥ 40 is the minimum; for the central 99% region (α=0.01), n ≥ 200. EP21 also makes clear that percentile estimates based on small n (e.g., 40) are more sensitive to extreme observed values, and therefore provides practical recommendations for typical studies (e.g., 120 unique patient samples across the AMI, and per-subinterval minima such as 60 for 2 subintervals or 40 for ≥3 subintervals). This “minimum/robust” framing is one reason EP21 is implementable under manufacturer constraints while remaining clinically anchored.
Many laboratorians and manufacturers are familiar with the classical parametric total-error formulation that models the analytical error distribution using separately estimated components - typically bias (systematic error) and within-laboratory SD (random error) - and then derive an error limit under normality assumptions (8).
EP46 summarizes the historical Westgard approach as an error interval driven by bias and imprecision, with a chosen normal-distribution multiplier z(eg, 1.96 for 95% two-sided; 1.65 for 95% one-sided) (2):
(|Bias|-z⋅SDWL, | Bias|+z⋅SDWL )
In practice (e.g., with positive bias), the “worst-case” magnitude for the central 95% is often taken as:
TAE≈|Bias|+z⋅SDWL
with analogous handling for negative bias. EP46 also explicitly notes key limitations of this parametric model in real use, e.g., it assumes normal errors and may not reflect additional real-world error sources such as rare gross outliers, interferences, drift, lot effects, and other contributors.
EP21’s percentile-based method is different in the statistics (non-parametric, distribution-free), but identical in the governing principle:
In other words: EP21 does not contradict the Westgard logic. It preserves the same clinical question (“How large can the analytical error be, with high probability, under intended-use conditions?”) while offering a robust estimation route that avoids imposing a parametric form when the empirical distribution of patient-sample differences does not behave ideally across the measuring interval.
EP21 explicitly provides a “minimum” approach alongside more robust options, which is one reason it is so implementable across stakeholders. Practicality is not a minor feature: IVD-MD manufacturers often face feasibility constraints (specimen availability, timelines, multi-site logistics, and iteration during development). The minimum design guidance - while still anchored in patient specimens and measuring-interval coverage - makes EP21 realistic to apply under manufacturer conditions, without requiring complex distributional modeling or very large datasets.
That feasibility advantage is amplified when EP21 is used as part of a structured CLSI evidence chain: precision characterization using EP05/EP15 (4,6), method comparison, and bias considerations per EP09 (5) consistent with EP32 (9) - then integrated into a TAE conclusion per EP21.
TAE is only actionable when compared with an allowable specification. EP46 is designed to support the determination of allowable total error goals and limits for quantitative measurement procedures, providing the conceptual and practical basis for defining ATE/TE goals that are fit for purpose. EP21 then provides the evaluation pathway to estimate TAE and compare it to those goals/limits.
This pairing is strategically important for all stakeholders:
Even where EP21 is not directly referenced in regulation, its analytical logic aligns closely with what regulators expect manufacturers to demonstrate, document, and communicate about quantitative test performance.
Implication for global manufacturers: EP21 can function as a “best-practice backbone” across jurisdictions - compatible with the analytical-performance vocabularyembedded in EU regulation and simultaneously aligned with US regulatory expectations for performance characterization and communication - while offering a standardized, stakeholder-legible method to judge whether quantitative performance is acceptable for intended use.
EP21 is most persuasive when it is presented not as a standalone “total error slogan,” but as the decision layer built on established evidence components:
This is precisely why EP21 is a strong tool for manufacturers and agencies: it converts a set of technical performance descriptors into a decision-relevant performance statement that can be communicated and defended.
Measurement uncertainty (MU) is indispensable to measurement science, but it answers a different primary question than EP21.
EP21/TAE is a performance assessment framework: it is fundamentally about deciding whether a procedure is fit for intended clinical purpose by integrating the effects of systematic and random error and comparing them to allowable limits. MU is fundamentally a result property: it describes the dispersion around a reported value under defined conditions and supports result interpretation and decision-making, particularly near clinical decision levels.
So, MU and EP21 do not compete. A coherent, scientifically sound picture is:
EP21 reaffirms that TAE is the primary principle for quantitative performance assessment because it expresses what stakeholders need to know: the combined effect of bias and imprecision on results, judged against allowable limits, across the measuring interval.
Its recommended non-parametric, percentile-based implementation provides a robust statistical route to the same core principle long used in parametric total-error practice - while often being simpler to apply and more resilient to distributional irregularities. The inclusion of a feasible “minimum” approach makes EP21 particularly actionable in IVD-MD manufacturer settings, where study logistics and development cycles demand practicable yet defensible designs.
Measurement uncertainty remains essential in ISO 15189 frameworks - but as a property of results and as an input to decision rules in conformity and compliance, not as a replacement for EP21’s non-parametric method-performance or Westgardian parametric acceptability framework.