Sigma Metric Analysis
A POC Chemistry Device
Now that we know how to translate the manufacturer's performance claims into Six Sigma metrics, let's take a hard look at some realworld data. With a performance study supplied by a "nearpatient" chemistry anlayzer, we find out just how good (and how bad) performance of tests are when they're at the POC.
FROM METHOD PERFORMANCE CLAIMS TO SIX SIGMA METRICS: A POC CHEMISTRY ANALYZER
 Recap: What do you need to go from Method Validation to Six Sigma?
 What calculations do you perform and in what order?
 Estimate Bias at the same levels where the Precision studies were performed
 What's a Quality Requirement and where do I find it?
 Calculating Sigma Metrics from Bias, CV and Quality Requirement
 Calculating Sigma Metrics at the Critical Medical Decision Level
 Conclusion
 Postscript: How would you QC this instrument?
[Note: This QC application is an extension of the lesson From Method Validation to Six Sigma: Translating Method Performance Claims into Sigma Metrics. This article assumes that you have read that lesson first, and that you are also familiar with the concepts of QC Design, Method Validation, and Six Sigma. If you aren't, follow the link provided.] 
Recap: What do you need to go from method validation to Six Sigma?
From the Method Validation study provided by the manufacturer:
From other sources:
 Quality Requirement (CLIA, clinical, biologic or otherwise)
 Calculators (Six Sigma)
 Medical Decision Levels
 A QC Design tool (optional, but useful)
What calculations do I have to perform, and in what order?
 Use the regression equation to estimate bias at the levels where precision studies were performed
 Find the quality requirement for those levels.
 Calculate Six Sigma metrics.
Here is the Method Validation study data from our anonymous instrument:
Test Name  Control/Level  CV  Slope  YInt  R with Comments 
Glucose  I: 217.9  0.79  1.0377  5.37  correlation of the instruments is extraordinary at 100% 
II: 81.5  0.93  
BUN  I: 11.3  4.11  1.0219  2.8  correlation between instruments is almost perfect at 99. 
II: 43.3  1.07  
Creatinine  I: 0.63  25.3  1.0523  0.09  correlation is almost perfect at 99. 
II:3.28  2.92  
Creatine Kinase  I:176.5  2.82  1.0419  44.11  the correlation coefficient between the analyzers is excellent at 9. 
II: 514.3  1.68  
Sodium  I:140.6  1.14  1.1193  4.82  correlation is, again, almost perfect at 99. 
II: 118  0.64  
Potassium  I:6.18  2.08  1.0055  0.70  correlation between the two instruments is outstanding at 98. 
II:4.23  1.79  
tCO2  I:25.4  10.52  0.7339  3.54  (94.4%) The accuracy data shows how noisy the method is in both instruments by the scattering of the data points….This is inherent to the methodology of measuring tCO2. 
II:12.6  12.66 
On first glance, the report contents are clearly favorable. It’s hard to understand the real meaning of the numbers, but the words used by the report about the correlation are clear: almost perfect, excellent, and outstanding. When the correlation coefficient isn’t that great, it’s not the new instrument’s fault; it’s the fault of all tCO2 methods.
Now, let’s take this manufacturer supplied data and work with it.
Estimating Bias at the same levels where the Precision studies were performed.
How do you do this? By using the Regression Equation:
Yc = a + b Xc where Yc and Xc represent the test and comparison values, respectively at a concentration level of interest, b is the slope, and a is the yintercept. The slope and yintercept are given from the comparison of methods experiment.
Use a level close to the mean of the data where your imprecision study was performed as your Xc value. For instance, for Glucose level I at 217.9, use 220 as the Xc value. And then solve the Regression Equation for Yc. This will estimate what the value of the reference method will be at that level.
Next, take the value of YcXc, and divide it by Xc. This gives you a % bias measurement at that level.
At the end of these calculations, you have estimates of bias and CV at the same level.
Here’s what our example data looks like after we’ve performed these calculations:
Test Name  Control/Level  CV  Bias %  Slope  YInt  Level used for Xc calculations 
Glucose  I: 217.9  0.79  6.2  1.0377  5.37  220 
II: 81.5  0.93  10.5  80  
BUN  I: 11.3  4.11  27.6  1.0219  2.8  11.0 
II: 43.3  1.07  8.7  43.0  
Creatinine  I: 0.63  25.3  20.2  1.0523  0.09  0.6 
II:3.28  2.92  8.0  3.2  
Creatine Kinase  I:176.5  2.82  29.4  1.0419  44.11  175 
II: 514.3  1.68  12.8  510  
Sodium  I:140.6  1.14  8.5  1.1193  4.82  140 
II: 118  0.64  7.5  110  
Potassium  I:6.18  2.08  11.1  1.0055  0.70  6.0 
II:4.23  1.79  16.9  4.0  
tCO2  I:25.4  10.52  12.4  0.7339  3.54  25.0 
II:12.6  12.66  2.9  12.0 
Note that even after those calculations, it’s still difficult to judge the quality of these methods. Certainly, we can look at methods that have high CV and high bias and wonder about them, but we really don’t have an intuitive feel for what the best values for those quantities should be. That’s why we need a quality requirement for each test.
What’s a quality requirement and where do I find it?
Finding or defining quality requirements is a critical step in the QC Design Process. We refer you to those articles on the website for more explanation. Since we are working with a chemistry instrument, we are in luck. CLIA has defined the quality requirements for all the tests on our new instrument. Let’s add those to our table:
Test Name  Control/Level  Q.R.  CV  Bias %  Slope  YInt  Level used for Xc calculations 
Glucose  I: 217.9  10  0.79  6.2  1.0377  5.37  220 
II: 81.5  10  0.93  10.5  80  
BUN  I: 11.3  18.2  4.11  27.6  1.0219  2.8  11.0 
II: 43.3  9  1.07  8.7  43.0  
Creatinine  I: 0.63  50  25.3  20.2  1.0523  0.09  0.6 
II:3.28  15  2.92  8.0  3.2  
Creatine Kinase  I:176.5  30  2.82  29.4  1.0419  44.11  175 
II: 514.3  30  1.68  12.8  510  
Sodium  I:140.6  2.8  1.14  8.5  1.1193  4.82  140 
II: 118  3.6  0.64  7.5  110  
Potassium  I:6.18  8.3  2.08  11.1  1.0055  0.70  6.0 
II:4.23  12.5  1.79  16.9  4.0  
tCO2  I:25.4  20  10.52  12.4  0.7339  3.54  25.0 
II:12.6  41.66  12.66  2.9  12.0 
One important thing to note is that the CLIA quality requirements are sometimes in absolute percentages, but other times the requirement varies depending on the level. That’s why the table presents different quality requirements at different levels.
Now that we’ve added quality requirements, you can already see where there are some tests that aren’t performing so well. For instance, if Potassium has an 8.3% quality requirement at a level of 6.18, having a CV of 2.08 and a bias of 11.1 probably isn’t good. How can you fit the simple addition (2.08 + 11.1) into 8.1?
In any case, we’re ready to get Six Sigma metrics! Now we’ll really be able to see how the tests stand up.
Calculating Sigma Metrics from Bias, CV and Quality Requirement.
Again, the website has already covered the relationship between Six Sigma Metrics and bias, CV, and quality requirements. There is even a free online calculator on Westgard Web to perform the caculations.
Let’s see the Sigma Metrics:
Test Name  Control/Level  Q.R.  CV  Bias %  Sigma Metric  Slope  YInt  Level used for Xc calculations 
Glucose  I: 217.9  10  0.79  6.2  4.56  1.0377  5.37  220 
II: 81.5  10  0.93  10.5  negative  80  
BUN  I: 11.3  18.2  4.11  27.6  negative  1.0219  2.8  11.0 
II: 43.3  9  1.07  8.7  0.28  43.0  
Creatinine  I: 0.63  50  25.3  20.2  1.18  1.0523  0.09  0.6 
II:3.28  15  2.92  8.0  2.39  3.2  
Creatine Kinase  I:176.5  30  2.82  29.4  0.21  1.0419  44.11  175 
II: 514.3  30  1.65  12.8  10.2  510  
Sodium  I:140.6  2.8  1.14  8.5  negative  1.1193  4.82  140 
II: 118  3.6  0.64  7.5  negative  110  
Potassium  I:6.18  8.3  2.08  11.1  negative  1.0055  0.70  6.0 
II:4.23  12.5  1.79  16.9  negative  4.0  
tCO2  I:25.4  20  10.52  12.4  0.72  0.7339  3.54  25.0 
II:12.6  41.66  12.66  2.9  3.05  12.0 
At this point, we expect that there may be some shock and incredulity. There are some wild and wideranging numbers here, and not many of them are high. Can this data really reflect the performance of an actual method? Remember, this is method validation performance data supplied by the manufacturer of the instrument itself. The manufacturer gave us these numbers. But the manufacturer clearly doesn’t understand how those numbers convert into Sigma metrics.
What does it mean when a test has a NEGATIVE Sigma metric?
Once you’ve got less than a zero Sigma metric, the actual value is unimportant. By going below zero, in effect you’ve got far more variation than is allowed by your quality requirement. Just looking at the table explains it: for Potassium, when the quality requirement is 12.5, you can’t have a 16.9% bias and a 1.7% CV. Those two numbers don’t add up to less than 11.8.
The final meaning of a negative Sigma metric for a test is this: there is so much variation in that process it can’t provide quality results of any kind. Find a better method.
What does it mean when a test has 2 widely different Sigma metrics?
To those more comfortable with Six Sigma, it is probably disconcerting to find that a single test process has two different Sigma metrics. We are used to encountering just one metric associated with one process. However, it’s certainly not surprising that a test performs differently at different levels. It would be far more unusual if a test performed the same at all the levels of concentration.
For some of the tests, the two different values are close enough to give an overall feeling about the test. Both Sigma metrics for Potassium are negative; that’s bad. For Creatinine, the metrics are 1.18 and 2.46. That gives you a range of performance and an idea that this isn’t a great method, either. But for a method like Creatine Kinase, you’ve got a 10.2 Sigma metric and then a 0.21 Sigma metric. One is great. The other is bad. What does that mean?
Calculating Sigma Metrics at the Critical Medical Decision Level
Remember that these Sigma metrics are calculated at the levels where controls are being run. Are those the best levels to judge the performance of the test? Or are there better, more appropriate levels to use? If you think about it, ultimately, the Sigma metrics of where the controls are run matter less. We are more interested in finding the Sigma performance at the level where medical decisions are being made, and where patients are being most affected by the test results.
Dr. Bernard Statland has a critical reference for this area. He has graciously allowed us to post some of those values on the website. Using those medical decision levels, we can recalculate the Sigma metrics at medically important levels.
The process for working with the critical medical decision levels is similar to our earlier calculations. We use the regression equation again to estimate Yc and YcXc, by which we obtain a bias estimate. However, for CV, we will need to rely on the precision studies. The practice here is to use the CV estimate which is closest to the critical level. So for glucose, where the known CV values are found at levels of 217.9 and 81.5, and the critical medical decision level is 120, we would use the CV value from the study at 81.5, since that is the closest.
Otherwise, the process is identical. We find quality requirements for that critical level, then we recalculate the Six Sigma metric.
To summarize the steps here:
 Find a critical medical decision level.
 Use the regression equation to estimate bias at that level.
 Pick the closest precision study to estimate CV at that level.
 Find the quality requirement for that level.
 Calculate Six Sigma metrics.
Having completed this process for all the tests, here are the final results:

Conclusion: We wouldn’t want this PointofCare device anywhere near us.
Based on the final calculations, at all critical levels, for all tests on the instrument, the Sigma metrics are below 3. As you may recall, in industry, any process below 3 sigma is considered too unstable for routine use. Therefore, your final judgement on this instrument should not be positive, to put it mildly. These tests have far too much variation. The quality required by the tests is not being met by the performance of the instrument.
Postscript: How would you QC this instrument?
For a moment, let’s assume that you already have this instrument and you’re stuck with it – there’s no money in the budget to get a new one for quite some time. If this instrument is the only method to provide test results, you’ll still have to use it, no matter how bad the performance is.
If the Sigma metrics were above 3 sigma, we would recommend using a QC Design or QC Planning tool like the Normalized OPSpecs charts available on the website, or the software programs QC Validator® 2.0, or EZ Rules®. But in this case, performance is so poor that a blanket recommendation will suffice.
For methods below 3 sigma, you want to use the "full Westgard Rules" with as many controls as you can afford. 13s/22s/R4s/41s/8x for example, with 4 control measurements or more.