Tools, Technologies and Training for Healthcare Laboratories

Questions and Answers from the 40th Birthday of Westgard Rules

At the 40th birthday party of the "Westgard Rules", a generous serving of questions and answers were served, along with some virtual cake.

Questions and Answers from the 40th Westgard Rules Celebration

Sten Westgard, MS
March 2021

At the "Westgard Rules" 40th birthday party, there was a lively question and answer session. Many questions were still left at the end of the party, and we have tried to answer some here.

Question: Operating with labs own Target Mean and SD sometimes for a certain test, the SD gets larger than planned (resulting in higher CV). What is statistically acceptable limit for the maximum rise of a targeted SD? 0.5 times? or some other multiplier? From Turkey

We would prefer to use the actual SD/CV to figure out the Sigma-metric. If the SD rises but the Sigma-metric remains in an acceptable zone (you can judge what is acceptable, whether you need it to stay above 6, or 5, or 4 or even 3 Sigma), then the SD is also acceptable. Thus, designing QC with the right TEa is very important.

If you are not using Sigma-metrics, perhaps you would consider leaning on the biological database from EuBIVAS or Ricos, using the calculations for minimum acceptable precision. If your CV exceeded the minimum performance specification for precision from one of those sources, you could use that as a rule of thumb and judge that SD as unacceptable.

Using an arbitrary multiplier (1.5 times the package insert CV) should be avoided. If we accept that the package insert SDs and CVs are already going to be larger than what an individual laboratory experiences (variation of many labs > variation of 1 lab), then having a single lab with a CV that is 1.5 times the CV of a group of laboratories is a very bad sign.

 

Question: When two controls are run, a low-level control and a high-level control. When the low-level is 3s and the high level is within 1s, can one reject only the samples of low-level analytes and accept the samples with high-level analyte? In the example in your website, a similar situation of control, the answer states Reject, refering to all sample set. Anonymous

When we typically use the Westgard Rules with multiple controls, we take advantage of the measurements across the control levels to interpret our rules. When one level is “out” and the other level is “in”, the run is still out of control. You should still reject the run and halt patient samples. During your troubleshooting, however, you may determine the source of the error and the impact of the error was confined to results in the lower range, therefore allowing you to release the results that were in the upper range, and also therefore lowering the number of patient samples you need to retest.

But the short answer is, when you use the control rules, if one control is out, the run is out.

 

Question: Do you have any advice on setting SD limits for multi analysers as SD is still the fundamental basis of the rules. Setting an appropriate SD that is suitable for multiple analysers, sites, reagent lots etc can be difficult whilst ensuring only true errors are investigated. From the UK

Question:
We have 2 identical analyzers and create one mean/SD averaging data from both analyzers. I'm not comfortable with this process and would like to know how we can use the Westgard rules better. From the USA.

These two questions really epitomize the challenge of the 21st century for many laboratories. Many laboratories are now massive factories of results, and the struggle to harmonize and simplify the operations is real. One longing desire is the use of the same mean, the same SD, the same rules, on all tests and all instruments. It would be so efficient.

Actually, it may only be mentally efficient to impose a one-size-fits-all approach. It’s definitely easier to think about if you only have one mean, SD, and rule for everything. But if you think any further about it, you will undoubtedly recognize that one-size-fits-all is inevitably an oversimplification the becomes inefficient. One size never actually fits all. Your one-size-fits-all mean will be incorrect for some instrument, and you may therefore have built in a systematic error to the results coming off that box. Your one-size-fits-all SD will be too wide for one instrument, meaning you will miss real errors, and too tight for another instrument, meaning you will cause outliers that have no actual error present. Your one-size-fits-all rules will be overkill on one instrument, and be too lenient on another instrument.

One size fits all, in practice, may be that one size fits none.

If only there was a way to actually determine the "right size” and then fit the QC to that size.

Oh wait, we can do that now.

Instead of clining to a one-size-fits-all fantasy, embrace a higher simplification: one-design-approach-for-all. Adopt a single approach: Six Sigma. Six Sigma gives you a unified set of tools and benchmarks, then allows you to be flexible and customize the necessary QC for each test and each clinical use (TEa) of the test.

True simplification can occur when you have determined that, in fact, two instruments are the same “size”. If you know that two instruments are Six Sigma, that gives you the flexibility to consider a unified mean or unified SD or a unified QC procedure. You can always model the impact of that using Sigma-metrics. If shifting the mean (adding a bias) drops the instrument down a Sigma category (i.e. it was 6 Sigma along with the other instruments, but the bias results in a drop to 5 Sigma), then you know that you probably shouldn’t make that change. If you are in the blissful state of having a bunch of 6 Sigma methods that don’t change Sigma when you “unify” them, you can be confident of treating them the same way).

But even when you have proof of great performance to support your use of common means, SDs, or QC rules, you will still need to be able to access the underlying individual performance data. You won’t ever shed the need to know the individual mean and SD of every method on every instrument. Knowing that performance is what allows you to treat them the same, and knowing that performance also lets you know when an instrument has strayed too far from the herd to be part of the flock. Some instruments and methods inevitably will deteriorate or chance and you will need to separate them into an individual mean, SD, and QC procedure. We just aren’t in an era of perfect performance reproducible across methods, instruments and laboratories. Maybe not ever.

Question: Can you give us some insight in how you use QC data with EQA data. e.g. if you get a flag in one but not in the other. Furthermore, how do you use patient means data?

Try to think of your QC and your Peer Report or EQA as a way to triangulate the source of an error. But also realize that the QC and the Peer Group are best at detecting different kinds of error. It’s entirely likely that your QC may have a flag while your EQA is fine. Or vice versa.

Running controls is at heart about imprecision and reproducibility. But it can tell you something about trueness and bias. A random error is likely to be caught by QC, but not by the peer group, and certainly not by the EQA. Your random error won’t show up in a 3-times-a-year proficiency testing sample (although your peer group has a better chance of catching that individual event, but then it will likely lose it in the rest of the group noise).

A creeping bias may not show up in your QC, but become apparent in the Peer Group or EQA.

What’s particularly helpful is to use all these sources when you are trouble-shooting. Are you seeing random errors AND there is an increase in the group SD of the Peer Group? That might be a reagent issue that everyone is experiencing. Do you see a higher mean on your QC, and that same bias also showing up in the EQA results? You have confirmation that your individual lab may have a bias, but not the group. But if you see an individual shift, and the whole EQA or Peer Group has also shifted in the same direction, again that points to an issue beyond a single laboratory, something more likely in the reagent, the calibrator, or the instrument.

Just standing on your own with your QC results, you never quite know, “Is it Just Me? Or is it everyone else, too?” That’s why EQA and PT participation is often mandated, and why Peer Group programs are so popular.

Question: How do you see the future of the Westgard Rules? Will Artificial Intelligence be a part in integrated algorithms of these rules to monitor and predictive capabilities to help laboratories achieve that Quality Utopia?

Let’s start by unpacking what “Artificial Intelligence” means, because my definition of it is rather narrow. If we build something out of software that is intelligent and sentient, I hope we don’t condemn it to just running quality control. I’m afraid an AI tasked (sentenced) to QC charts would seek out a silicon form of suicide. But if we lower the bar to something like “expert systems”, then certainly the algorithms for Westgard Rules and Sigma-metrics have a future. We are already seeing more and more QC capability being absorbed into the instrument or middleware itself. As we face greater and greater staffing challenges, finding a way to have software run more and more of the QC task is an efficient choice. I don’t believe that humans will be eliminated from QC, but they move further upstream; in the design of the instrument and the algorithms, there will be a moment where laboratory professionals and engineers and clinicians can express the performance specifications for the test and then design the system that can meet those needs.

Westgard Rules is a tool, and as long as the tool is valuable, it will continue to be used. Some tests in the future may reach a level of performance where Westgard Rules will no longer be routinely necessary. But as we have all seen, more and more tests are coming to the laboratory, and new and more demanding clinical uses are being found for existing tests. As long as there are new methods that are unsteady on their feet, as long as there are clinical expectations pushing the envelope of method performance, the Westgard Rules will provide a useful tool in monitoring the performance of those methods.

Question: We use an interlaboratory comparison platform with built-in SQC and frequency run design, we routinely to "over-rule" Westgard Rules (Not all), because results of measurements for a test are frequently close to the comparison target, or just because they are well within the Total error for that test, which means it has a good sigma value, and risk of bad results is weak. Only when I see a great random error or a clear systematic error I do start to looking for corrections in the test system. Is this acceptable?

First, it’s great that you have a QC program that includes peer group program and QC frequency design. We wish all labs had access to those resources.

Second, if you have the correct QC design, you may not need to “over-rule” the rules. If you design QC using Sigma-metrics, that will cut out any rules that are unnecessary. So you won’t have anything to overrule anymore, because you will only be alerted when a significant error has occurred and it’s unlikely it’s a false alarm.

Third, it’s hard to know what you mean by results are within the Total Error for the test – some software programs will draw “Total Error limits” directly on the chart, which is not how Total Allowable Errors goals were meant to be used. We saw one example where these “Total Error limits” resulted in limits that were effectively 11 SDs from the mean. That was way too wide. So while you extrapolate that you have tests with good sigma-metrics, it’s better to actually calculate the sigma-metric and know for sure. Then the Sigma-metric can guide your QC selection.

Didn't see the question you wanted answered? Please feel free to submit more questions. We're always happy to answer.