Questions
Method Validation: Class Questions
This group of questions come from a set of students in Chem 555, a graduate course in Clinical Chemistry at West Chester University, West Chester, PA Dr. Al Caffo was the course instructor and an old friend from DuPont/Dade/Behring days (the good old days). It's been great to have this opportunity to discuss method validation practices.
A: The adequacy of the regulatory approach seems to be challenged by the recent FDA/Abbott consent decree, which illustrates the need for laboratories to maintain a quality system that is independent of the manufacturer. If the regulations were completely implemented, including the FDA implementation of a QC clearance process to review manufacturer's QC instructions, the FDA-Abbott consent decree would still pose a significant problem for laboratories. I would hope that professional organizations would have standards that require laboratories to establish and maintain independent quality systems, in which case the standards of deemed organizations will need to go beyond the CLIA requirements.
Q: What statistics should be applied to compare methods that are reported in different units? (For example, CK-MB assays that measure mass or activity, where one uses ng/mL and the other U/L) (from Jody Williams, Chem 555, WCU/Caffo)
A: In this situation, a comparison plot and regression statistics will still be useful to demonstrate the relationship between the methods. The slope and intercept are still important to quantitatively describe the relationship. While the correlation coefficient may also be useful, remember that the value will depend on the analytical range that is studied. The fit of the data to a straight line is actually best described using the standard deviation of the points about the regression, which will provide a quantitative estimate of the scatter or random error in units of the new method (i.e., y-axis units). The comparison plot should be carefully inspected for linearity and outliers, which should be investigated further if necessary. More effort should be expended in establishing reference intervals since these will be critical for the clinical use and interpretation of the data.
Q: How important is an initial replication study, since an estimate of random error can be obtained from the duplicate determinations done as a part of the method comparison study? (from Ruth Mortimer, Chem 555, WCU/Caffo)
A: The short-term replication study is very useful. First of all, it gives immediate information on your technique or the best performance of the analytical system. If you can't reproduce results under these conditions, there's no need to go any further. If you can reproduce results, then you should have some confidence that your technique or the instrument is working properly. You should also remember that the estimates of imprecision from the patient duplicates in a comparison of method study wil only include short-term variation, essentially within-run variation. It's still important to do a long-term replication study using control materials.
Q: Is there a way to account for lot-to-lot reagent variation when establishing a reportable range? (from Nicole Bethke, Chem 555, WCU/Caffo)
A: You could do this if you had several different lots of materials available to you, however, that is seldom the case in a real laboratory situation. It's more likely that you will test each new lot of materials to be sure the reportable range remains adequate. Over time, you will accumulate data that demonstrates the stability of the reportable range. That information may help you optimize the way you perform the linearity experiment or maybe establish a way to monitor the limits of reportable range as part of routine QC.
Q: How can a Reportable Range start at zero, since the detection limit will restrict the reportable results to some value greater than zero? (from Peter Szczerba, Chem 555, WCU/Caffo)
A: In principle, you could argue that you need to characterize the detection limit as the bottom of the "zero" for any reportable range. However, for many tests, there's no need to determine "zero" so exactly. The clinically important test values are always considerably above zero. Take glucose for example. Low values (less than say 50 mg/dL or so) are important, but it's not critical to know if "zero" really means 0.0 mg/dL or 3 mg/dL or 5 mg/dL.
Q: From your experience with clinical chemistry analyzer systems, what is the primary source of imprecision: the design of the method or the manufacturing consistency of the reagents? (from Maria Gonzalez, Chem 555, WCU/Caffo)
A: I think precision is very much related to the level (or generation) of automation. Highly automated systems, such as today's 4th and 5th generation chemistry analyzers, tend to be very precise because operator variability has been almost completely eliminated, environmental variability has been controlled, and instrument variability has been reduced with improved components. With these systems, manufacturing consistency might be more of an issue for accuracy than precision. With manual methods, operator variability would certainly be important, as might reagent variability.
Q: In the past, we have been able to cross over to a new lot of QC material by running it in parallel with the old lot over 20 days. With the economic situation in our lab now, we don't have the time to do this any longer. Is there a minimum number of days for cross over? Can we run more testing on fewer days? (from Vivian Anton, Chem 555, WCU/Caffo)
A: It's still good practice to obtain at least 20 measurements as a starting point for new lots of materials. However, you might get those measurements in a shorter period of time if necessary. A two week period might be more practical, but then you should be sure to calculate cumulative means, SDs, and limits, and update these calculated values with each week's additional data. If you go to a one week period, then it may be better to establish new mean values on the basis of new data (with a minimum of 10-20 measurements over a 5 day period), but use your old CVs to establish preliminary control limits, initiate the use of the actual control limits after collection of two weeks data, then update the cumulatives with each additional week's data.
Q: How can acceptable statistical limits for a patient correlation be defined when the reference range of the test method differs from the reference range of the comparison method (there is a known bias between methods)? (from Sandy Krakowsky, Chem 555, WCU/Caffo)
A: The reason for doing a method comparison study is to see if the new method can replace the old method without changing any of the test values. If there are systematic errors between the methods, the first issue is to determine which method is correct. If the "correct" method is the new method, then you will have to carefully establish the reference intervals for the new method. It may also be necessary to investigate clinically important populations to demonstrate the range of values expected for clinical applications.
Q: When developing a new method in the research laboratory (e.g. for a new investigational drug), replication, linearity, interference experiments can be done, but how can a method comparison be performed if this is the only method in existence? (from Heather Bonner, Chem 555, WCU/Caffo)
A: When no comparison method is available, you have to depend on the interference and recovery experiments to assess systematic errors. This means much more extensive testing by these experiments, much more effort in establishing reference intervals, and most likely considerable effort in establishing the range of values expected in different clinical populations. When a comparison method exists, the advantage is to transfer what is known about the usefulness of that test by demonstrating comparability with the known.