Tools, Technologies and Training for Healthcare Laboratories

Troubles with Tracking Tests

The New York City Board of Health and Mental Hygiene has amended the city’s Health Code to require laboratories to report all hemoglobin Alc test results in order to monitor diabetic patients. Will monitoring approximately 100,000 results each month improve quality for anyone?

  • NYC plans to track HBA1C results
  • What quality is required?
  • How would test results be interpreted?
  • What’s wrong with this approach?
  • What happens if ADA guidelines are followed?
  • What if labs use the same analytic method?
  • What if methods were calibrated to a uniform standard?
  • What’s the point?
  • References
  • Mach 2006

    “The government wants to know” was the headline of the Washington Report by Linda Kohl in the February 2006 issue of Medical Lab Observer [1]. That got my attention because there are some things I want the government to know, like the lunacy of “equivalent QC,” the impracticality of unannounced laboratory inspections, the irrelevance of current recommendations for laboratory quality indicators, and the lack of quality criteria for such important tests as PSA, troponin, and glycated hemoglobin (or Hemoglobin A1c). However, this did NOT turn out to be a request for knowledge, but rather another crazy proposal that displays the lack of knowledge about the quality of laboratory testing today!

    NYC plans to track HbAlc results

    That is the real subject of the MLO article and also the subheading of another article, “Putting Diabetes Under Surveillance,” authored by Julie McDowell in the March 2006 issue of Clinical Laboratory News[2]. The New York City Board of Health and Mental Hygiene has amended the city’s Health Code to require laboratories to report all hemoglobin Alc test results in order to monitor diabetic patients. This sounds like a noble effort to improve healthcare by monitoring diabetic patients, who constitute the largest chronic disease group and one of the most expensive treatment groups. Regardless of the good intentions, it will be big trouble for all who are involved – laboratories, patients, physicians, and the NYC Health Department.

    Government officials acknowledge concerns about the large number of HbAlc test results involved (approximately 100,000 per month) and the needs for patient privacy and data security, but there is another issue that should be of even more concern – the mistaken assumption that these test results will actually be meaningful! There is an expectation that patient treatment can be improved, which requires that the HbAlc test results be “medically useful,” which in turn requires they be analytically correct. How do we know that the test results will be medically useful unless we have carefully assessed the analytical quality of today’s methods?

    What quality is required?

    Getting back to information that the government needs to know, the most important thing is the quality that is required for the HbAl test? Given that HbAlc does not appear on the list of regulated analytest, that means there is no regulatory criterion for acceptable performance and also that only 2 specimens are required per PT event, rather than the 5 specimens required for regulated analytes. Therefore, it is necessary to turn to guidelines from the American Diabetes Assocation (ADA) to understand the quality needed for this test. According to ADA, the desirable level for HbAlc is 7.0% or lower, and patients with levels of 8.0% and higher should have their treatment plans re-evaluated. Information from CDC suggests that 20% of the diabetic population has HbAlc values of 9.0% or greater. Thus, the critical medical decision level is 7.0 % and changes of 1.0% to 2.0% are medically important. A laboratory test for a patient whose true homeostatic set point is 7.0 %Hg should never give a result as high as 8.0% or 9.0% because that would cause the patient to be mistreated.

    How would test results be interpreted?

    Here is a set of HbAlc results: 9.58, 8.88, 8.13, 8.65, 7.94, 8.83, 9.50, 8.09, 7.06, 10.17, 8.72, 8.83, 9.14, 8.90. Consider what to do if applying the clinical decision interval from 7.0% to 9.0%.

    • 9.58: Change treatment
    • 8.88: Continue same treatment
    • 8.13: Continue same treatment
    • 8.65: Continue same treatment
    • 7.94: Continue same treatment
    • 8.83: Continue same treatment
    • 9.50: Change treatment
    • 8.09: Continue same treatment
    • 7.06: Pay for Performance
    • 10.17 Change treatment
    • 8.72: Continue same treatment
    • 8.83: Continue same treatment
    • 9.14: Continue same treatment
    • 8.90: Continue same treatment

    While a computer database makes it possible to store and analyze all these numbers, a graphical output may still be very helpful for purposes of interpretation, especially if people are trying to make sense of this data. If these data were displayed graphically on a control chart with a limit of 9.0, there would be 4 points above the limit representing “out-of-control” situations where corrective actions are required (i.e., changes in treatment).

    What’s wrong with this approach?

    This looks relatively easy and straightforward, but there is a problem - all these results are for the same specimen! The differences represent different analytical methods, not differences in the concentrations of different specimens. These values are the method means for 14 different methods tabulated in a proficiency testing survey (MLE 2004 M1 survey specimen GH-1) for a specimen whose average concentration is about 9.0 %Hg. In other words, given a patient whose true value for HbAlc is 9.0 %Hb, any of these results might be obtained by current analytical methods. It depends on the laboratory that did the testing and the accuracy of their particular method. Any decision to change treatment is wrong! Any decision to continue the same treatment may also be wrong because it depends on a previous value which could have been wrong, or the current value which could also be wrong!

    What happens if ADA guidelines are followed?

    Let’s look at another example, this time for a proficiency testing specimen whose average value is close to 8.0 %Hb (AAB 2004 3rd Survey, Specimen 2): 8.16, 7.85, 8.10, 7.75, 8.31, 7.93, 9.26, 8.28, 7.73, 8.00, 9.04, 8.61, 8.13, 7.56, 7.87, 7.54. Consider applying the tighter clinical decision interval from 7.0 to 8.0 that follows the ADA interpretative guidelines.

    • 8.16: Change treatment
    • 7.85: Continue same treatment
    • 8.10: Change treatment
    • 7.75: Continue same treatment
    • 8.31: Change treatment
    • 7.93: Continue same treatment
    • 9.26: Change treatment
    • 8.28: Continue same treatment
    • 7.73: Continue same treatment
    • 8.00: Change treatment
    • 9.04: Change treatment
    • 8.61: Continue same treatment
    • 8.13: Continue same treatment
    • 7.56: Continue same treatment
    • 7.87: Continue same treatment
    • 7.54: Continue same treatment

    It gets worse, doesn’t it! And if you change the order in the list of methods, the treatment actions may also change. Again, if the patient goes to a different laboratory that uses a different analytical method, the treatment decision may change – not because the patient changed, but because the laboratory method gives a different answer.

    What if labs use the same analytic method?

    One would think that having all the tests done by the same method would solve the problem, but there still is lab-to-lab variation that must be considered. The test results discussed in the two examples above are actually the mean values obtained from laboratories that employ the same analytic methods. Individual test results would show even more variation. The proficiency testing results also present the SD for each of the method groups, which can be used to assess the range of variation expected for each of these methods.

    For example, in the first set of numbers, the value 9.58 is the mean of a group of 6 laboratories that utilize the same method. The observed SD for that group of laboratories is 0.66, which demonstrates that the 95% range of expected values (mean -2SD to mean +2SD) is from 8.26 to 10.9. If we consider the high and low values in the first group of values, the high of 10.17 represents the mean for a group of 14 laboratories for which the observed SD is 1.37, which indicates that the expected range of values would actually be from 7.43 to 12.94. The lowest value in the first example is a mean of 7.06 for a group of 5 methods for which the observed SD is 0.90, which leads to an expected 95% range from 5.26 to 9.86. Thus, even when the same method is used by different laboratories, there is tremendous random variation in addition to the systematic bias of that method group. Taken together, a patient whose true value is 9.0 might get a test result as low as 5.26 and as high as 12.94, due to analytical variation of the laboratory methods.

    What if methods were calibrated to a uniform standard?

    Good point! The problem is that these methods have already been certified to be in agreement according to the National Glyhemoglobin Standardization Program (NGSP). These results represent the current “state of the art,” where there is tremendous variation in HbAlc test results from method to method and laboratory to laboratory. There is still a great need for improvement of calibration and standardization!

    What’s the point?

    It is pure folly to think that HbAlc values are consistent from method to method and lab to lab! To attempt to monitor the diabetic population with such poor analytical methodology is a waste of time and money! To make such a recommendation is an indication of the lack of scientific methodology and understanding in today’s supposedly “evidence-based laboratory medicine.”

    As we have observed on the national political scene today, good intentions do not necessarily lead to good actions. Here is a glaring example in our own field of laboratory medicine and it points out the need for critical thinking based on real data from the field!


    1. Kohl L. The government wants to know. Medical Laboratory Observer 2006;February:48.
    2. McDowell J. Putting diabetes under surveillance: How NYC plans to track HbAlc results. Clinical Laboratory News 2006;March:1,7-8.

    James O. Westgard, PhD, is a professor of pathology and laboratory medicine at the University of Wisconsin Medical School, Madison. He also is president of Westgard QC, Inc., (Madison, Wis.) which provides tools, technology, and training for laboratory quality management.