Trends
Three Books to make you Think Twice
In an age of mushrooming measurements, we must apply the same vigilance to our selection and use of indicators as to our best implementation of those metrics. Three books provide an opportunity to assess the downside of too many measurements, metrics, and indicators.
Three books to make you Think Twice about quality measurements.
Sten Westgard, MS
April 2019
At an IHI meeting recently, the highly respected quality advocate, Dr. Don Berwick, lamented some of his earlier faith in measurement. He’s changed his mind, and he no longer agrees with the phrase “If you don’t measure it, you can’t manage it.” In the current era, where indicators proliferate, where metrics are metastasizing across healthcare, where endless articles, conferences, and committees sit in constant judgment over daily practices, he instead now believes that “[W]e need to stop excessive measurement. I vote for a 50 percent reduction in all metrics being used.” This is a strong statement, coming from a longtime proponent of quality measurement in healthcare.
In the spirit of self-scrutiny, rigorous self-examination, I wanted to review three recent books in the literature that propose a contrarian view: that quality measures are not an unalloyed good, that regulations and checklists and indicators, when misused, do not help patients or productivity. The common wisdom that we can “metric” our way to success may not be true.
Book 1: Nassim Nicholas Taleb, Skin in the Game.
Taleb, an iconoclast if there ever was one, is most famous for his book, The Black Swan. His books on the failures of wall street, financiers, banks, and economists made him extremely popular in the era of market crashes. But his withering gaze is not restricted to the traders and hedge fund managers. He suffers no fools, and in his eyes most of us are foolish. There is really no aspect of society he won’t critique. It can be uncomfortable reading Taleb, as you’ll find one page where you are cheering him on, as he excoriates other professions for their short-comings, but on the very next page, you’ll realize now his caustic wit is trained mercilessly upon you.
In his latest book, Skin in the Game, he presents a simple thesis: that those who do not have a stake in the outcome of a process cannot reliably be trusted on the advice they give. Coining his own Yogibearism, Taleb states, “in academia there is no difference between academia and the real world; in the real world, there is.” [page 3]. The point of Skin in the Game, to be simplistic, is that whenever we suggest a solution, if we’re not impacted by that solution, we really have no motivation to get that suggestion right. When we have a direct stake in the outcome of a process, failures are costly, and if we make big failures we will be removed from the system (a Darwinian necessity, for Taleb). This has a strong evolutionary impact and provides a useful incentive as well as a check against hubris. In academia and other fields where experts have no skin in the game, Taleb elaborates, those who create nonsensical theories and impractical approaches are allowed to survive, thrive and pollute the marketplace of ideas. Another way to look at it is to think about how pilots have what’s known as “maintenance compulsion” – they inspect their planes vigorously before take-off, because if they fail, they go down with the plane. For doctors and laboratorians, we don’t have that same consequence when our patients get harmed by our results. If every time we sent out a wrong number, our own health was impacted, you’d see a much, much stronger interest in quality.
Taleb notes that our era of progress has been marked by Intellectualism “[T]he belief that one can separate an action from the results of such action, that one can separate theory from practice, and that one can always fix a complex system by hierarchical approaches, that is in a (ceremonial) top-down manner.” [p.25] This in turn has given rise to “scientism”: “a naïve interpretation of science as complicated rather than science as a process and a skeptical enterprise. Using mathematics when it’s not needed is not science but scientism.” As we watch the endless parade of new indicators, new models, new theories, we need to regard them with a skeptical eye. We must make sure we don’t just value a new theory because it is new, or value an new approach simply because it is more complex. Just in our own experience, we have seen over and over again with the Westgard theories that they must be accompanied by practical tools (and even when they are, if there is no real-world need to adopt them, most laboratories won’t). As our professional organizations organize task-never-finished committees, perpetual conferences, chains of publications devoted to prolonging the problem rather than finalizing a solution, we risk falling into that trap: seeking more complex solutions simply out of an academic taste for novelty. Taleb states “Academia has a tendency, when unchecked (from lack of skin in the game), to evolve into a ritualistic self-referential publishing game.” Oh how familiar that sounds to my ears.
Closer to the healthcare pathway, Taleb notes that regulatory efforts can actually be counter-productive, something that comes as no shock to us:
“The legal system and regulatory measures are likely to put the skin of the doctor in the wrong game….A doctor is pushed by the system to transfer risk from himself to you, and from the present into the future, or from the immediate future into a more distant future.”[p.44]
As we become compliance-oriented, risk-averse, we lose our patient-centeredness. We in the lab are peculiarly distant from the patient, despite the fact that our results play such a strong role in their care. Our temptation is to chase process outcomes, rather than invest the effort in tracking our true connection and impact to patient outcomes. We can lose ourselves in efforts to sustain “in control” workarounds rather than actual improvement and excellence in our methods.
2. The Safety Anarchist, Sidney Dekker
This book comes from yet another giant in the field of quality and safety, this time Sidney Dekker, who wrote the seminal, The Field Guide to Understanding ‘Human Error’. As one of the pioneers who helped midwife the movement for quality and safety, it is telling that he is turning against the movement, or, to be more specific, against the corporatization and bureaucratization of the movement. His thesis is that the over-bureaucratization of safety has paralyzed business, government and healthcare. The proliferation of rules and regulations are starting to suffocate organizations. Regulations, once they are put in place, are often difficult to dislodge or amend, partly because “[b]ureaucracies….have replaced churches.” Following the rules has become an organizational religion.
Dekker follows the thinking of Almaberti in differentiating the types of safety present in the workplace:
“Controlled safety is imposed by regulations, rules and procedures. It follows the desire for standardization of technologies, behaviors, and cultures. It comes at the cost of increased rigidity and workers who are less capable of adapting to surprises.
“Managed safety is based on the experience and expertise of the workers, which not only allows them to adapt any kind of guidance to local circumstances but also has developed in them a nuanced understanding of when to adapt, improvise and innovate their routines, and when not.”[p.20]
Those familiar with the history of laboratory medicine understand that at the dawn of the testing age, there was only managed safety. Controlled safety didn’t come into play for years. Regulations didn’t fall into place for decades. We are now firmly in the controlled safety era, however, where workers have less and less freedom to maneuver, as new checklists, requirements, and protocols are put into place.
Dekker calls this the Bureaucratization of safety, and lists a number of unintended consequences, such as “an inability to predict unexpected events, structural secrecy and a focus on bureaucratic accountability, quantification and ‘numbers games.’ Bureaucracy has hampered innovation and created its own new safety concerns. It has imposed both real and perceived constraints on organization members’ personal expertise for how to do work.”[p.72]
Dekker posits that once the Safety Bureaucracy is in place, an important coda must be added to our earlier maxim about quality:
“Once a measure becomes a target, it is no longer a measure…. What gets measured, gets manipulated.” [p.76]
“Focus on behaviors, make people comply, and control their every move, and you might actually make people less adaptive and responsive to unexpected events, thereby increasing risk of major injury and fatality." [p.94]
3. The Tyranny of Metrics, By Jerry Z Muller
Our final book, The Tyranny of Metrics, focuses on the manipulations and insidious corruptions of measurement and metrics. While Muller outlines the symptoms of “metric fixation,” what the reader may find more interesting is the description of the myriad ways that people and organizations manipulate their metrics. [By adding too many metrics to work,] “Professionals tend to resent the impositions of goals that may conflict with their vocational ethos and judgment, and thus morale is lowered. Almost inevitably, many people become adept at manipulating performance indicators through a variety of methods, many of which are ultimately dysfunctional for their organizations. They fudge the data or deal only with cases that will improve performance indicators. They fail to report negative instances. In extreme cases, they fabricate the evidence.” [p.19]
These manipulations are pervasive enough that they have become informal “laws” like the infamous “Murphy’s Law”:
Campbell’s Law: “[t]he more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”[p.19]
Or, more succinctly, Goodhart’s Law: “Any measure used for control is unreliable.”[p.19]
With respect to healthcare, pay-for-performance programs, which have been touted as a cure for our ails, are increasingly being found to be counter-productive. Instead of improving healthcare, Muller calls the effect that metric fixation has on us, “Goal Diversion”, “P4P programs ‘can reward only what can be measured and attributed, a limitation that can lead to less holistic care and inappropriate concentration of the doctor’s gaze on what can be measured rather than what is important.’” [pp.116-117]
Statistician, Measure Thyself?
What must be done? Shall we abandon all measurement and return to an era of improvisational management, running work by gut instinct, hoping that experienced workers will cure what ails modern work? Of course, there is no such extreme solution. Cutting off all metrics and indicators would be impossible. But there must be a balance of measured and managed work. Dekker calls this solution “Vernacular Safety,” a clever way to say that the safety that comes from the bottom-up should be valued above the imposition of top-down protocols.
None of the authors seriously suggest eliminating measurement or metrics, but they are all in favor of measuring the right things and in the right way. Right now we’re witnessing an explosion of measurement, with little discussion about which measurements are useful. Muller states that, “[T]here are immediate benefits to discovering poorly performing outliers. The problem is that metrics continue to get collected from everyone. And at some point the marginal costs exceed the marginal benefits.” [Tyranny, p 118]
After this litany of ills, one may wonder if we should be measuring at all. One wonders if we should cease our Sigma-metrics and rule out the Westgard Rules. Would it be better to rely on bench level instincts for QC? Unfortunately, we can’t rely on today’s training and experience. The early generations of medical technologists, back in the day when regulations were scant, had to rely more on their own skills and experience, and the results were pretty good for labs and patients. As the laboratory has progressed technologically, some professional skills have actually atrophied – partly because of stagnant training curricula in the area of quality control, but also because of lower pay, which means there is more incentive to find a different healthcare profession than medical technology. As we narrow the tasks we assign to technologists, converting them more into factory workers than professionals, we see the loss of their capability to act thoughtfully, to respond to unknown or unusual challenges, to innovate. [That’s a broad brush, I know, certainly there are still amazing people at all levels of the profession, but the days when every worker understood the “Westgard Rules” or could perform a method validation study are long past.]
Our own experience with the use of Sigma-metrics is that we’re still at a stage where the use of the metric is uncovering a lot of “poorly performing outliers.” It’s possible to foresee a time when analytical quality becomes so uniformly high across segments of the diagnostic industry that the utility of the Sigma-metric will decline. If everyone hits 6 Sigma all the time, we won’t need to measure Sigma-metrics anymore. But as long we are still discovering some labs at 3 Sigma and some at 6 Sigma, some methods below 3 Sigma and some above 6 Sigma, there’s a practical, indeed urgent, benefit to the use of this metric.
The other attractive feature about the Sigma-metric is that it is not mandatory – it’s a choice. If and when it becomes mandatory, all the incentives to manipulate and corrupt the metric will come into play. But for now, you only use the Sigma-metric if you want to, not because you have to. Therefore, why measure it just to fool yourself? If you’re seeking self-delusion, you don’t need to use a Sigma-metric to achieve it.
When should be measure? When is it appropriate to add another statistic?
Muller provides a useful checklist to diagnose whether or not you should add a new metric to your lineup. Some (but not all) of the points are germane to the laboratory:
- “What kind of information are you thinking of measuring?” If the process is influenced by the act of measurement (i.e. you are watching a staff member and they see you watching them), the metric is less reliable. Here we are in good territory, as we are monitoring analytical methods, which may be operated by humans, but the instruments themselves are reliably un-self-conscious. The instrument doesn’t change its behavior when we measure CV and trueness.
- “How useful is the information? If this information is not a useful proxy for what you’re aiming at, don’t do it.” There are two outcomes of Sigma-metric measurement, one which optimizes the QC within the laboratory, and that’s a very concrete benefit. The second outcome is not often directly quantified, and that’s the relationship of high quality results with high quality patient outcomes. It’s extremely difficult to conduct a study which evaluates that linkage. We can see anecdotal evidence of the opposite effect – how very poor methods can result in misdiagnoses, for example. So overall, we need more research, but if we base our decision on what we know now, the Sigma-metric information should be considered useful.
- “How useful are more metrics?” “Remember that measured performance, when useful, is more effective in identifying outliers, especially poor performers or true misconduct. It is likely to be less useful in distinguishing between those in the middle or near the top of the ladder of performance.” [Tyranny, p.178] The Sigma-metric is unique in the laboratory, providing a definitive benchmark of analytical performance. It complements other metrics currently in use, but it can also replace such things as simple total error measurement. So we can reduce our measurements a bit if we would like, or at least break even.
- “What are the costs of not relying upon standardized measurement?” Would it be better to trust the bench experience of the technologists who operate the methods and the instruments? I’m afraid that technologists don’t experience the range of performance available in instrumentation during their careers, and as a result, they may become habituated to the quirks and deficiencies of a particular instrument, without realizing that other instruments exist where these problems don’t occur. So we need the standards of Six Sigma to help them realize when their instrument, however familiar and comfortable, is not up to par.
- “To what purposes will the measurement be put?” “Measurement instruments, such as tests, are invaluable, but they are most useful to internal analysis by practitioners rather than for external evaluation by public audiences who may fail to understand their limits.” [Tyranny, p.179] Here we can see that Sigma-metrics and “Westgard Rules” are tools that help the internal operation of the laboratories, while outside the laboratory, few, if any, even know that these things exist.
“If…the scheme of reward and punishment is meant to elicit behavior that the practitioners consider useless or harmful, the metrics are more likely to be manipulated.”[Tyranny, p.179] The good news again is that the Sigma-metric and “Westgard Rules” are tools that help simplify, where possible, the QC process. And when the metrics highlight an outlier, a poor performing process, method, or instrument, there are ready alternatives: labs have the ability to either invest in serious QC, method improvements, or replacements of the instrument itself. - “What are the costs of acquiring the metrics?” In many organizations, the additional effort to collect the metrics are a burden they cannot afford. In the laboratory, we are already collecting all the data we need – imprecision, trueness – in order to utilize Sigma-metrics. The Sigma-metric approach simply leverages the data we already have to develop more advanced insights. Fortunately, we also see that software is rapidly automating the collection and even the calculation of Sigma-metrics. This is a tool that is becoming easier and faster to use.
- “Remember that even the best measures are subject to corruption or goal diversion.” The good news here is that it’s difficult to fake the SD, CV, bias, or trueness of a method. These things are calculated for the laboratory, and if the laboratory ignores them, if they set their charts up with different ranges than the actual performance, that’s already a warning flag – it occurs before even before the Sigma-metric is calculated. Since peer group data and EQA/PT data are handled externally, there’s an additional check on the corruption of bias estimation if those programs are used. Without question, no statistic is immune from corruption or manipulation, but there are more reasons to be confident in the calculation of Sigma-metrics than other indicators in the laboratory.
- “Remember that sometimes, recognizing the limits of the possible is the beginning of wisdom. Not all problems are soluble, and even fewer are soluble by metrics. It’s not true that everything can be improved by measurement, or that everything that can be measured can be improved.” [Tyranny, p.182-183] Here we agree that there are definitely cases where perfection is the enemy of the good. There are well-meaning efforts to develop the perfect measurement of measurement uncertainty, or allowable total error specifications, that result in completely impractical, unachievable, un-implementable “solutions.” One of the most useful findings of the Milan 2014-2015 consensus is that it recognized the limitations of biological variation-based performance specifications, and conceded that some analytes must be monitored with goals that are more akin to “state of the art”, like CLIA, RCPA, and other EQA/PT goals. With Sigma-metrics, there are some tests where the performance that cannot be calculated or counted – because the test may be qualitative, or there is no practical way to determine a performance specification, or the method is so novel that no performance specification yet exists. We don’t recommend applying Sigma-metrics to every test type, only the quantitative tests where we can most usefully establish performance specifications.
The best news is that Sigma-metrics can solve problems – particularly with improper QC practices. In our view, it passes the “Muller Test”. It efficiently provides useful solutions and improvements for the laboratory. Other metrics, particularly measurement uncertainty, do not pass this test, and should be considered carefully before they are implemented in the laboratory.