A recent paper by Marrington, Sinclair and MacKenzie raises important questions: why do methods that are not fit-for-purpose stay in the marketplace? Given all the institutions and regulations, why is the ability to remove a method from use so weak?
A recent paper from Annals of Clinical Biochemistry, by Marrington, Sinclair, and MacKenzie raised some provocative questions:
The missing piece: Who is responsible for ensuring clinical chemistry assays used in the UK are fit for purpose? Rachel Marrington, Gordon Sinclair, Finlay MacKenzie. Annals of Clinical Biochemistry: International Journal of Laboratory Medicine. 2025;0(0). doi:10.1177/00045632251367288
The authors write from the perspective of UKNEQAS and the regulatory environment of the United Kingdom, but the questions apply to every laboratory around the world.
In the paper, the authors share multiple examples of method problems that were not only pervasive from specific diagnostic manufacturers, but also persistent across time. Failures of EQA that were not a result of individual laboratory issues, but were caused by the diagnostic manufacturer themselves. Biases that lasted for months, clearly identified but nevertheless uncorrected, often left unsanctioned. The clear implication: labs are using methods that are not fit for purpose, with EQA lacking power to compel a remedy.
While there is a long-established regulatory architecture in place to approve methods into service, conduct mandatory EQA/PT, and implement mandatory QC in laboratories, even in combination, these programs are failing to prevent poorly performing methods from impacting patients.
The truth is that the laboratory quality infrastructure has become ossified, comfortable in soft-gloved mandates, complacent in the status quo. QC and EQA in many ways have become theater, conspicuous compliance, motions that we go through, without applying any demanding standards. Even if there is a failure, controls can be repeated (and repeated and repeated), EQA surveys give multiple chances, and sometimes have only minor consequences for failure, particularly for “educational” surveys. Will a series of stern letters really improve a laboratory locked into a 5-year instrument contract?
Even when the method is to blame, it’s the laboratory that bears the brunt of the consequences. While the lab gets the failure, the manufacturer might only get a letter of concern from the EQA/PT provider. The lab must do better, while the manufacturer is gently encouraged to improve. The EQA provider rarely has any authority to compel a correction by the manufacturer, other than inflicting bad publicity and embarrassment upon the manufacturer (sometimes this is enough, but it depends on the willingness of a manufacturer to brazen out a problem or admit it and make the correction).
The global view is murkier still. While Marrington, Sinclair, and MacKenzie characterize the problems of the UK system and seek a remedy, there is at least a chance that within the UK system, a solution could be found. Globally, there simply isn’t a unified way to hold methods accountable.
Traceability, Standardization/standardisation, and Harmonization/harmonisation are all scientific ways to drive more reliable results. Difficult, laudable work, which measures progress in years, sometimes decades. More assays are becoming more traceable every day, but this work has not scaled quickly enough to provide solutions for most assays. And as with EQA/PT, the traceability architecture can only scold manufacturers and exert pressure on them to improve. They posess no legal ability to remove a bad method from the marketplace.
On the other hand, optics remain important in a free marketplace. If a single country or organization provides a valid criticism of a particular diagnostic manufacturer, the greater danger the instrument maker faces is not from regulators, but from customers worldwide that may hesitate to purchase an instrument tainted with issues.
Marrington, Sinclair, and MacKenzie note that the barrier to entry into the laboratory marketplace is not as tough as many assume.
“CE marking may be incorrectly viewed to be the same as an accreditation certificate and both further extrapolated to mean that an assay is fit for purpose and will always give the correct results. We as healthcare professionals know that this is not necessarily the case[.]”
In the US, there is a similar assumption that any method cleared by the FDA is automatically “fit for purpose” and will only generate correct results. History has shown this to be a bad assumption. FDA clearance prevents many bad methods from reaching the market, but it mainly serves as a requirement for label/claim adherence (your method must perform as you claim it will perform, not that it must perform correctly for useful patient diagnosis) and does not actually compare performance against quality standards, but instead against predicate devices. For a time, as long as the method was just as bad as an older method already on the market, FDA would clear it. With the Laboratory-Developed-Test (LDT) loophole, there are also more and more methods being offered that have never been through any clearance process, leading to such spectacular frauds as Theranos.
As with many modern phenomena, taking in all of the problems that exists can lead to despair. At the very moment we need strong leadership to impose demanding standards, we are facing a world populated by institutions and politicians bending knees to the wealthy and power, bending rules to ingratiate the haves and ignore the nots.
One easy, but inconvenient, step for every laboratory to take: stop assuming every test offered to you is fit for purpose. Ignore the CE mark or FDA clearance. The larger the menu you are being offered, the higher the probability that a few assays on that box are not great – there is so much incentive to add assays to an instrument menu, just to check the same number of boxes as the competitors. There’s far less scrutiny of how good each individual assay performs.
The regular reader will not be surprised to learn that Westgard advocates the use of analytical Sigma metrics to judge the performance of methods and instruments. In the right context, they provide an objective benchmark. Six Sigma assays are easy to QC, easy to PT/EQA, and provide highly reliable results.
Within a single country, it’s easier to impose a standard. The UK, NHS, and NEQAS can impose one set of performance goals for participating laboratories. Not so easy to do this when the labs are spread out all over the world. The Germans have their Rilibaek, the Australians their RCPA, the Chinese, and the US has CLIA. All of these goals do not agree, almost by design. Because Rilibaek and CLIA have more severe financial and legal consequences for failure, the goals are more permissive. In contrast, educational programs – where consequences are minimal – have more stringent performance standards.
There’s little incentive for EQA/PT programs to agree on goals. Competitive voluntary surveys want to differentiate themselves, sometimes by being more permissive, other times by being more stringent. For goals written directly in legislation, the challenge is worse: getting the German, Chinese, and US legislative bodies to agree on the same set of standards, well that would take some kind of miracle.
Plus, these goals are not going to be fixed in stone. Just as CLIA updated their goals from 1992 to 2025, China updated their goals in 2022, the EFLM biological goals switched recommendations from desirable to minimum, from TEa to uncertainty, and appears to be continuously tightening goals due to every more stringent protocols on how the studies must be conducted. We cannot expect that we will achieve one agreement on goals and then that will be the end of it. New uses for old tests, new interpretations that become possible with method improvements, and of course new tests being introduced into the marketplace, will mean that the debate over goals can be expected to be perpetual.
Caveat Emptor. Do your own research. You can ask for Sigma metrics from the manufacturer, but you’ll need to know all the details (what TEa goals? How was imprecision determined? How was bias determined? What controls were used over what time period? Get any of these things wrong and the metrics are distorted, probably purposely in favor of the manufacturer.) Find independent, objective data and evaluations. Make sure the goals being used are the ones you will use in your own laboratory (for US labs, knowing that an instrument achieves Rilbaek goals is not helpful, but in reverse, it could be).
Find a study of analytical Sigma metrics where you can see all the data and details, to make sure the calculations meet your needs. Want further due diligence? Find a customer who’s currently using the instrument/method and get their performance data directly. Then you can do the calculations yourself without worrying they were nudged, tweaked, or massaged by the manufacturer so eager to close the sale.
Finally, make the analytical performance part of the contract, if you can. Write performance standards into the contract, with hard penalties for the manufacturer if they fail to achieve them. If you see a manufacturer unwilling to guarantee their analytical performance claims, that’s a sign those claims aren’t really what’s being experience in the real world.
Marrington, Sinclair, and MacKenzie conclude that there is an urgent “need [for] a mechanism in place that cuts through the legal/institutional niceties and gives the general public the correct results (within the usual boundaries of experimental error and the like and at a cost that will not bankrupt the NHS) because that is the expectation patients have when visiting their GP [general practitioner, i.e. their clinician] and having their illnesses correctly diagnosed. The British public deserves something better.”
This appeal echoes globally. We all deserve better performance than what our current diagnostic landscape is giving us.