QC Design
Assessing an instrument by allowable u:Rw, MAU, and Sigma-metrics
Now that new performance specifications have been set for measurement uncertainty, how do instruments in the real world stack up? A well-designed mu study of a Roche cobas 6000 in Turkey provides a unique opportunity to benchmark.
Measuring uncertainty: Benchmarking a cobas 6000 in Turkey using u:Rw, MAU, and Sigma-metrics.
Sten Westgard, MS
November 2024
2024 will go down as a major milestone for measurement uncertainty. New specifications for performance have been issued, broken down into allowable uncertainties for reference, calibiration, and laboratory imprecision. Now, it remains to laboratories to figure out if these specifications are realistic, impossible or somewhere in-between.
One major drawback for benchmarking measurement uncertainty is that the vast majority of laboratories do it wrong. They estimate measurement uncertainty by using their intermediate reproducibility (a long-term CV) from their controls. While this is convenient, since labs are already running those controls, Panteghini et al have insisted that those estimates are not correct:
Lucky for us, there is a study that attempted to do its best at estimating laboratory mu, even if it did not generate the preferred "top-down" estimates:
Evaluation of Measurement Uncertainties of Immunoassays Analytes According to Biological Variation Databases. Muhammad F Kilinckaya, Turan Turhan, Clin. Lab 2022;68:1784-1791.
This study took place at the Ankara Nummune Training and Research Hospital in Ankara, Turkey. It measured performance on the immunoassays of a Cobas 6000 Modular series. IQC data was collected from the Roche-provided controls from August 2017 to January 2018. EQA data was also collected from the RIQAS EQC program from June 2017 to January 2018 to provide bias data.
This study is further distinct by the fact that they turned bias into imprecision. It calculated root-mean-square-bias, and added that to the u:Rw bias to calculate a standard uncertainty. Double that, and you have the expanded uncertainty (U), which can be compared to the MAU benchmarks listed in the EFLM biodatabase.
All that said, here is the data from the study combined with the various uncertainty benchmarks:
Analyte | u:Rw | u:Rw min | u:Rw des | U | MAU min | MAU des | Verdict |
AFP | 2.71 | -- | -- | 14.78 | 6.9 | 4.6 | fails MAU |
CA 125 | 2.3 | -- | -- | 7.94 | 13 | 8.7 | passes MAU |
CA 15.3 | 2.74 | -- | -- | 11.29 | -- | -- | nothing to benchmark against |
CA 19.9 | 2.23 | -- | -- | 8.54 | 6.4 | 4.3 | fails MAU |
CEA | 2.08 | -- | -- | 8.4 | 10.2 | 6.8 | passes MAU |
Cortisol | 2.24 | -- | -- | 8.67 | 24.2 | 16.1 | passes MAU |
Ferritin | 1.96 | 3.24 | 2.16 | 8.26 | 19.4 | 12.9 | passes both u:Rw and MAU |
Folate | 3.69 | 6.00 | 4.00 | 10.11 | 16.4 | 10.9 | passes both u:Rw and MAU |
FSH | 1.64 | -- | -- | 7.39 | 14.3 | 9.5 | passes MAU |
Insulin | 2.28 | -- | -- | 19.11 | 38.1 | 25.4 | passes MAU |
LH | 1.76 | -- | -- | 5.46 | 37.8 | 25.2 | passes MAU |
Estradiol | 2.44 | -- | -- | 6.93 | 22.5 | 15 | passes MAU |
fPSA | 1.61 | -- | -- | 6.95 | 10.6 | 7.1 | passes MAU |
tPSA | 1.55 | 2.55 | 1.7 | 11.98 | 10.2 | 6.8 | passes both u:Rw, fails MAU |
PTH | 2.23 | 5.9 | 3.925 | 27.47 | 22 | 14.7 | passes both u:Rw, fails MAU |
fT3 | 1.35 | 1.765 | 1.175 | 7.18 | 7.6 | 5.1 | passes u:Rw min, passes MAU |
fT4 | 1.15 | 2.1 | 1.4 | 9.2 | 7.2 | 4.8 | passes both u:Rw, fails MAU |
Testosterone | 2.17 | -- | -- | 9.67 | 21.8 | 14.5 | passes MAU |
TSH | 1.74 | 2.17 | 1.45 | 9.68 | 26.8 | 17.9 | passes both |
One notices quickly that the new measurement uncertainty specifications for u:Rw don't exist for most of these immunoassays, making a true test out of reach. However, the MAU specifications exist for almost all of the analytes, and we can see that MAU is a relatively easy bar to pass. 14 out of 19 Analytes failed the MAU test, leaving 5 failures. Notice that in those few assays where we have a u:Rw specification as well as the MAU specification, there are a few discrepancies. Sometimes the assay will pass the u:Rw, but fail MAU, this is where the impact of bias is being felt. u:Rw does not take bias into account; in the calculation of this study, bias is taken into account with U (although it is arguable whether this is the correct way to take bias into account).
From this we can speculate that MAU is more forgiving as an uncertainty benchmark, except in the case of significant bias. We must also realize that u:Rw specifications are going to be less demanding on immunoassays than they are on biochemistry, particularly electrolyte, analyes.
The study also provided CV data from two levels, as well as an EQA-calculated bias, so we have some other ways to assess the performance. If we use the EFLM minimum TEa specifications, we can generate the following Normalized Method Decision chart:
The verdict of analytical Sigma-metrics using the minimum EFLM goals is much more approving. Over 70% of the QC levels measured are 4,5, and 6 Sigma, with the bulk of that in the Six Sigma bull's-eye. Of course there are some unhappy analytes in the 2-Sigma and less-than-2-Sigma zones, which would indicate major effort needed. The biggest problem is CA 15.3, which has no other mu goals to compare against. Then fT3 , fT4, folate and AFP ruond out the most concerning assays. That means fT3 and fT4 are judged more harshly with Sigma-metrics than with u:Rw and MAU.
If we benchmark against the new CLIA 2025 benchmarks, we see a more stringent verdict.
The number of assays in the bull's-eye drops dramatically, the majority of performance is down at the 4 Sigma zone. Note there are fewer assays benchmarked here because CLIA 2025 doesn't cover all the immunoassays in the study. (For example, CA 15.3 gets neglected again). Unlike the EFLM minimum assessment, here we see about one-third of the performance is 3 Sigma or lower.
Five or ten years ago, if you had told someone that CLIA goals would end up being the most difficult targets to hit, they would have been incredulous. But this is how goals have evolved. EFLM switched away from desirable to minimum goals, CLIA got tighter, and measurement uncertainty generated a set of specifications easy to hit, and another set of specifications that are not so easy to hit.
Imprecision and Bias data
In case you want to make your own calculations, using some other set of specifications, here is the "raw" CV and Bias:
Analyte | Bias | CV |
AFP | 6.1 | 5.48 |
AFP | 6.1 | 5.36 |
CA 125 | 2.5 | 4.59 |
CA 125 | 2.5 | 4.60 |
CA 15.3 | 3.8 | 5.81 |
CA 15.3 | 3.8 | 5.13 |
CA 19.9 | 3.3 | 4.70 |
CA 19.9 | 3.3 | 4.21 |
CEA | 3.3 | 4.12 |
CEA | 3.3 | 4.18 |
Cortisol | 3.1 | 4.20 |
Cortisol | 3.1 | 4.75 |
Estradiol | 2.2 | 5.67 |
Estradiol | 2.2 | 3.94 |
Ferritin | 2.6 | 3.83 |
Ferritin | 2.6 | 4.01 |
Folate | 3.2 | 8.34 |
Folate | 3.2 | 6.26 |
FSH | 2.8 | 3.36 |
FSH | 2.8 | 3.21 |
Insulin | 7.9 | 4.67 |
Insulin | 7.9 | 4.43 |
LH | 1.9 | 3.81 |
LH | 1.9 | 3.20 |
Parathyroid Hormone | 11.3 | 4.85 |
Parathyroid Hormone | 11.3 | 4.05 |
PSA, free | 2.7 | 3.41 |
PSA, free | 2.7 | 3.02 |
PSA, Total | 5.0 | 3.20 |
PSA, Total | 5.0 | 3.01 |
T3, Free | 2.0 | 3.16 |
T3, Free | 2.0 | 2.13 |
T4, Free | 3.5 | 2.06 |
T4, Free | 3.5 | 2.50 |
Testosterone | 3.7 | 4.04 |
Testosterone | 3.7 | 4.63 |
TSH | 3.6 | 3.72 |
TSH | 3.6 | 3.23 |
Only one estimate of bias was made, so it is listed twice for each analyte. Note that since the controls used were those provided by the manufacturer, the imprecision estimates may actually be optimistic.