Interviews
Interview with NRL scientists Dr. Wayne Dimech and Joe Vincini
Serology testing is a different animal. Gaussian statistical quality control does not work, state Wayne Dimech and Joe Vinicini of NRL. Thus, not only will traditional QC not apply, neither will Westgard Rules, nor analytical Sigma metrics. Only a non-Gaussian approach is appropriate.
Where traditional rules cannot tread: an interview with NRL scientists about Serology Testing QC.
June 2025
Interview conducted by email over several months in early 2025.
Both Dimech and Vincini have reviewed and approved the text.
Recently, a series of arguments and counter-arguments have appeared in the scientific literature over the design and implementation of QC for serology testing.
Specifically, two of our papers have proposed adapting the analytical Six Sigma metric approach for serology QC:
- Applying sigma metrics to assess quality control processes in the transfusion transmissible infection screening laboratory of a blood center. Sonu Bhatnagar, Sten Westgard, Nguyen Thi Thanh Dung, Tran Ngoc Que, Bach Quoc Khanh, Nguyen Ha Thanh. October 2024. PLOS One 19(10). DOI: 10.1371/journal.pone.0312422
- Sigma Metrics Analysis of Serology Screening Assays to Enhance Quality and Efficiency in New Zealand Blood Services. Greg Scheurich, Sonu Bhatnagar, Sten Westgard. July 2024. Diagnostic Microbiology and Infectious Disease 110(4):116451. DOI: 10.1016/j.diagmicrobio.2024.116451
These papers have elicited counter-arguments from Dr. Wayne Dimech and Joe Vincini, from NRL:
- Evidence-based assessment of the application of Six Sigma to infectious disease serology quality control. Wayne Dimech, Giuseppe Vincini. February 2025. Clinical Chemistry and Laboratory Medicine 63(6):1228-1236. DOI: 10.1515/cclm-2024-1455
- Cost benefit analysis of two quality control approaches for infectious disease testing. Wayne Dimech, Patricia Mitchell, Giuseppe Vincini. March 2025. Pathology. DOI: 10.1016/j.pathol.2025.01.002
In an effort to get more detail on the conflict between NRL limits and the Westgard approach, we decided to reach out to Dimech and Vincini directly.
This interview is with Dr Wayne Dimech, Executive Manager – Scientific and Business Relations and Joe Vincini, QC Services Manager at the National Serology Reference Laboratory, Australia (NRL). NRL is an Australian not-for-profit, scientific organization that exists to promote the quality of infectious disease testing, globally. NRL is also a WHO Collaborating center and has conducted quality control (QC) programs for infectious diseases for more than 30 years. Wayne, Joe and others have developed, and support a QC concept called QConnect designed for infectious disease testing.
Westgard: Are the controls you provide for NRL QConnect Gaussian? That is, do they demonstrate a Normal, Gaussian distribution? A related follow-up question: is it possible to apply traditional statistical control techniques to the results of these controls? And a final follow-up question: can you plot the results on a Levey-Jennings chart and expect a certain percentage of the results to fall within 1 sd, 2 sd, etc.?
Dimech and Vincini: DiaMex (Heidelberg, Germany) has been producing quality control materials for infectious diseases for 12+ years. The samples are manufactured using real human plasma. Each QC is designed to be reactive at a level that is optimized for the assays for which it was designed, in accordance with the “QConnect” concept. NRL was instrumental in the design and development of the DiaMex Optitrol QC samples. We employ a manufacturing process that ensures minimal lot to lot variation. DiaMex distributes the Optitrol range of products globally (except North America) through a series of distributors.
Laboratories are encouraged to use the NRL QConnect approach to monitoring the Optitrol QC results. NRL has published extensively on QC monitoring processes for infectious disease testing [1-9].
The question as to whether the QC samples demonstrate Normal or Gaussian distribution - the answer is tricky. Yes, they would demonstrate Normal distribution providing the test system was able to allow this. Generally, the distribution of the QC results is a factor relating to the test system (the assay) rather than the QC samples.
So we believe it is the wrong question. The question should be "can the assays that any infectious disease serology QC sample is used on provide an environment that is Normally distributed", to which the answer is – “only within the one controlled reagent lot environment”. The moment you move out of this environment, such as changing reagent lots, Normal distribution goes out the window. The QC results have a bimodal distribution. So we cannot use Gaussian distribution concepts, i.e. 95% of Normal results fall within mean+/-2SD when the mean is established using a small data set of 20 or 30 results. It's a case of confusing what we want an assay to do with what it can actually do.
Therefore, asking a laboratory to apply traditional statistics is setting them up to fail. This is why we firmly believe in the QConnect concept, which uses historical data to set a range, a range that effectively represents the allowable error of an assay/QC combination.
Westgard: If the controls are non-Gaussian, is there any value in using a traditional Levey-Jennings chart? There are no standards for 1s, 2s, 3s lines to use. Calculating a mean or SD for any individual lot also seems unhelpful and unnecessary in QConnect.
You make a point in your letter that there is a negative downside to traditional statistics after an out-of-control event is identified using traditional methods. You state that this problem occurs after a Westgard Rule failure, but more correctly, this problem occurs after any control rule failure that is used for rejection rule purposes (the “fault” is not confined to the Westgard Rules). After an erroneous run, the previous statistics no longer reflect the stable performance of the method, they must be reset. In the QConnect approach, the limits do not change, even after an erroneous condition identified by traditional methods. If you fail the QConnect limits, you must reject the run, then you return to using the same limits.
Dimech and Vincini: We would like to stress that we have nothing against the Westgard rules per se. They have served medical pathology well for decades (and still do for chemistry) but are demonstrably not fit for purpose for infectious disease serology (and possibly other immunochemistry assays but we don’t play in that field and don’t have evidence to support this hypothesis) [3]. This is because when manufacturers bring out new reagent lots, the reactivity of the QC changes when those new lots are used [10]. This does not mean that the assay is "out of control" as the assay is designed to identify the presence or absence of antibodies, not to measure a quantity of something [1]. It really doesn’t matter if a patient’s result for HIV is S/Co is 2.0 or 5.0 or 10.0. The result is still positive, and they still have the disease. Obviously, we don’t want to have assays that vary uncontrollably, but the differences we see in lot-to-lot variation is not clinically significant. We use the historical range that the QC moves, including lot to lot variation, to establish QConnect limits [5].
Unfortunately, due to the ubiquitous nature of the BioRad Unity software in clinical chemistry (which have Westgard Rules as a default), and the fact that infectious disease testing has moved onto the same instruments that are used by clinical chemistry, these QC rules have been adopted for serology without thought or validation. Our mission in life is to highlight that each time a reagent lot changes, the QC rules fail and the lab staff are spending heaps of time and expense to "investigate". We see your posts on Linked In and other social media calling out the futility and the waste in time on multiple repeats, recalibrations etc. with which we agree, as there is a simple explanation – it’s just Normal lot-to-lot variation of qualitative assays that are based on antibodies causing the sift in QC reactivity. The outstanding question we are trying to answer is "how much change is acceptable" as there comes a point when too much change does reflect an "out of control" episode.
We do this using historical data and the QConnect concept [5]. Basically, labs use the same QC on the same assay. We collected 100,000+ measurements from tens of labs over a period of months or years. In this way, you get a picture of what variation is inherent in that assay, accounting for the reagent lot-to-lot variation over time. The QCs that DiaMex manufacture are made so that the lot-to-lot variation of the QC is minimal, so essentially each QC lot reacts the same on a particular assay and the variation identified is due to the assay or test system variation. The QConnect limits are just using simple variance of these data to determine an upper and lower limit.
The other concept that we support is that the manufacturer has defined what is acceptable and what is not acceptable when setting their kit control limits. An assay "run" is valid if the kit controls are within the manufacturer's limits as specified in the Instructions for Use. They have presented dossiers of evidence to government regulators (FDA, EU, TGA etc) and their sensitivity and specificity data are based on this assumption (that the tests are valid if the kit controls are valid). So, we believe that third party QC results should not override the kit controls. If a result is outside the QConnect limits, then this is an unexpected and statistically different finding and should be investigated [6].
Note: We understand that in chemistry, the labs can use third-party QCs to validate their assays. However, in infectious disease testing, generally the manufacturer of the assay provides kit controls and validation ranges for those kit controls. In many countries, the use of these kit controls for the validation of the assay is mandatory. Third party QCs such as Optitrol are used in addition to the kit controls.
Westgard: In the case where we mention the waste of out-of-control events as detected by traditional methods, the reasons we cite are not because of lot-to-lot variation, not because of anything in the control material or the instrument. It's because of limits that are inappropriately too tight. It's a statistical effect, not an effect caused by any condition of the test or reagent.
In a previous email, I asked if the controls you provide could support Gaussian statistics. I don't feel like I got a full answer. "That's not the right question to ask" is one answer, but I sense that in fact the controls are Gaussian and could provide traditional QC - it's just that the reagent changes and the ranges being provided are non-Gaussian.
One traditional approach to solving this is to strip out the units and normalize everything. Yes, there's a shift between lots, but the inherent standard deviation of the controls is stable. A CV% of the control on a previous lot is still expected to be the CV% on the new lot. So the mean gets re-established, it takes but a handful of measurements to do this, but you can still build out a standard deviation over time. You are saying that this doesn't hold with your serology controls?
In the paper you have also mentioned that the controls don't stabilize over time. That the one-month CV does not match the 3-month CV, does not match the 10-month CV. In other controls, we do indeed see performance stabilize over time. Dr. Panteghini recommends taking a 6-month period to establish the lab measurement uncertainty. As you know, our own recommendations, some of the previous CLSI recommendations, are/were to take 3-6 months of data to build the most appropriate estimate of precision. Is there a magic number of measurements or time for your controls where they will stabilize? Or they never do?
The question of 20 measurements is a thorny one. It's not enough, but it represents a minimum practical starting point for many regulations and guidelines. I would love to tell labs to run 100 controls at the start of every lot. It's just that this will never happen in our resource-constrained world. If 20 measurements aren't enough for serology, how is it that this minimum is fine for chemistry? The implications of your approach appear to be universal – implying that our chemistry QC is as deeply flawed as the serology QC, and we should convert over to QConnect limits for all tests and all systems.
Dimech and Vincini: We would like to run through the following scenarios to provide further explanation in response to your email questions. First, some background points:
- We believe that the Westgard rules, and more recently Sigma Metrics, have served clinical chemistry testing for decades and are still the preferred option for the monitoring of QC for clinical chemistry and some other discipline testing. This is because, along with other reasons, the lot-to-lot variation within those tests can be controlled by (re)calibration and that Total Allowable Error and bias are relatively easy to calculate or define.
- Note that we do not directly benefit financial from advocating QConnect. Although we have developed the concept, we have no personal income derived from its use. We would receive the same salary from NRL irrespective of our QConnect activities. Elon has no worries about being taken over as the world’s richest man.
- Both of us are medical laboratory scientists majoring in microbiology. We both have extensive experience in diagnostics laboratories and can relate to what senior scientists in laboratories must cope with. We also audit and talk to lab staff regarding QC frequently and best practices.
- NRL does obtain revenue from the sale of Optitrol QCs and the delivery of EDCNet software to Australian laboratories and receives a small royalty from sales outside ANZ. NRL is a charitable organisation, and any surplus is channelled into providing support to public health laboratories in the Asia Pacific region or supporting basic research by our parent company St Vincents Institute of Medical Research. No individual at NRL receives financial benefits directly.
With this in mind, we present two graphs below. These are two examples from testing the same QC on the Abbott Architect anti-HCV and anti-HIV assays over a seven-to-nine-month period. They are representative of what happens with each assay from each manufacturer of serology assays. We can see multiple changes in reactivity of the same QC sample over time for each of the two assays labelled A-D for anti-HCV and 1-3 for anti-HIV.
In response to one of your questions, the population of the results of testing the QC sample for each individual reagent lot are Normally distributed (Gaussian). That is, the data from lot A is Normal as is the data from lot B, and so on. So, to clearly answer your question “the controls you provide could support Gaussian statistics” the answer is, yes if the results are obtained from a single reagent lot. The answer is generally “no” if the results are from two lots (they are bi-modal) but, as we increase the numbers of lots represented by the population then the answer tends towards Normal again. So, taking the data in the graphs above, each single reagent lot data is Normal (Gaussian). Data from lots A and B; or 1 and 2 are bi-modal, data from A-D or 1-3 combined tend towards Normal.
It is clear that, if you take 20 consecutive data points (anywhere in the data presented in the graphs) and calculate the mean and standard deviation and apply mean +/- 2SD or 3SD as your acceptance limits, then when a new reagent lot is introduced, you will (highly likely) get failures when applying Westgard rules, if not for a 1:3S rule then ultimately a 10:x rule. If the assay is Six Sigma and we use just mean +/- 3SD, we may not see failures from Lot C to Lot D, or from Lots 1 to 2, but will see failures from Lots B to C or 2 to 3 as examples. Therefore, as you have presented in a recent paper, we agree that use of mean +/- 3SD in a six-sigma assay will decrease the numbers of “false rejections.”
If the 20 results were obtained from a mixture of two lots (e.g. 10 results from Lot A and 10 results from Lot B), then the SD will be much wider and the failure maybe not as frequent.
If we took 100 data points (and by assumption results from multiple lots) we again have a much wider SD and less frequent failures. We have mapped this situation out in a previous publication in 2015.
QConnect uses up to hundreds of thousands of results from testing the same QC on the same assay by many labs over a long period of time. This data set includes many (hundreds) of different lot numbers. We strongly believe using these historical data represents the behaviour of a QC on a specific assay.
You mention that “…. there's a shift between lots, but the inherent standard deviation of the controls is stable. A CV% of the control on a previous lot is still expected to be the CV% on the new lot. So the mean gets re-established, it takes but a handful of measurements to do this, but you can still build out a standard deviation over time. You are saying that this doesn't hold with your serology controls?”
We agree (and you can see from both graphs) that the imprecision (CV or SD) is generally stable between reagent lots. It is the mean (bias) that changes. So, definitely a lab can re-set the limits for each new lot, and from our discussions with labs, many/most that use Westgard rules/Unity program do this. But there are a number of considerations necessary for labs taking this approach.
- When do they re-establish their limits? Ideally this is before it is introduced into general use. However, the barrier to doing this is that the instruments have limited space for reagents. Some labs will be testing up to 30 different analytes, so space in the instrument may not allow testing two different lots (old and new) in parallel.
- If they do have space and can re-establish limits before the new lot is put into use, they are still having to test QCs 10 or 20 times to set the limits. If they test fewer, then the limits will be extremely tight. There is a cost to this testing.
- If they cannot re-establish prior to use and add the new lot when the old lot is exhausted, then we have multiple new problems. First, the lab will go through a series of re-testing of the QC to ensure that the reason for the change is the new lot. Sometimes they will re-calibrate the instrument and only then, re-establish the limits. Again, many episodes of re-testing and associated cost. Our new paper tries to quantify the cost of this approach.
- Then we have the issue of delay to the patient samples while all this is going on, especially if there are multiple re-tests, calibrations and re-setting. Again, we quantify this in the new paper.
- We have previously reported in a comparison of QC methods “The mean number of QC test results reported for each reagent lot varied from assay to assay but ranged from 19 to 83. However, of the 638 reagents lots analyzed, 373 (58.5%) had ≤30 QC test results and 504 (79.0%) had ≤50 QC test results, meaning that setting new limits based on the first 20 results obtained for each new reagent lot was impractical, as the reagent lot would be exhausted before, or soon after, the 20 data points were collected.”[3]
- The change in bias is the cause. But labs ignore the bias and just reset the limits, which is imprecision. As you point out, the SD (imprecision) actually doesn’t usually change from reagent lot-to-lot. To what avail? If we re-set limits each new lot, then we are accepting all the results within a time period, irrespective of the mean (red lines in the graphs below). But we haven’t addressed the cause of the change (bias).
In the graphs above, over time, the lab has accepted all the results within the red lines. - Finally, and most importantly, by re-setting the limits using 10 -20 data points of the new lot ignores the reason for the change i.e. the bias. Why not just say that an assay has a CV of x% and run with it? Because the problem everyone is worried about is a change in reactivity of the patient samples that may be observed when lot variation is observed. If we re-set the limits without reference to how much change has occurred or any context as to whether that variation is acceptable, it is possible to miss a real issue.
To this point, let’s revisit our graphs with QConnect limits added.
We can see that the right-hand side graph (anti-HIV) all the data are within QConnect range, even though there is just as much lot-to-lot variation as in the left-hand graph (HCV). However, on the anti-HCV assay, reagent lot B has about half the results outside the QConnect limits. If the lab has just re-established their limits using CV, they would have accepted these data. We published on this situation previously and, in collaboration with Abbott found that the change in reactivity was due to a new source of a non-biological component of their assay [8]. Once the reagent was changed back to the initial supplier, the reactivity went straight back to expected levels. The graph below is the summary results of testing the same QC on the anti-HCV assay sorted by reagent lot number. You will see six lots in the red box. You can see that the 30+ lots prior all were within the QConnect limits and the few lots after the red box are also within the QConnect limits. This was a real issue identified by QConnect but was missed by labs that re-set their limits.
Another comment was “In the paper you have also mentioned that the controls don't stabilize over time. That the one-month CV does not match the 3-month CV, does not match the 10-month CV. In other controls, we do indeed see performance stabilize over time.” The point we made was that in one month maybe one or two lots are used, so the CV is relatively small. After 3 months, we expect several lots of reagent to be used, so the CV potentially becomes larger and after 10 months we have a number of lots and potentially an even greater CV. But this is not universal. It depends on the number of lots used and how those lots behave.
Taking our examples, we will see that the first lots A and 1 will have a smaller CV than if we combine A&B or 1&2. But if we take the CV of the data from all of the lots A-D or 1-3, we would expect that any future lots (Lots E onwards or 4 onwards), have a good chance of falling within the mean +/- 3SD. This is because A-D includes a larger amount of Normal variation experienced in a lab, whereas the mean +/- 3SD of lot A does not.
The final comment was “Dr. Panteghini recommends taking a 6 month period to establish the lab measurement uncertainty. As you know, our own recommendations, some of the previous CLSI recommendations, are/were to take 3-6 months of data to build the most appropriate estimate of precision.” This is exactly the point in the previous paragraph. By increasing the numbers of data points, you add more Normal variation, widen the CV. But isn’t this the opposite of re-setting limits using “a handful of measurements.” What rules should the lab follow? We agree that a larger number of data points is appropriate to set limits. This is exactly what QConnect does, using data not just from the single lab, but from all labs testing the same QC on the same assay over a long period of time.
You ask “Is there a magic number of measurements or time for your controls where they will stabilize? Or they never do?” The mean and CV of QC data stablise over time. We use a minimum of 2 QC lots (noting that QConnect QC lots have minimal lot to lot variation due to its design and manufacture) and 100 data points from each QC or we will not apply QConnect limits. This is not based in maths or statistics, however, 100 results are deemed by us to be significant enough to initially rely on. Prior to setting QConnect Limits, we use a cumulative mean +/- 3SD as a substitute until we have sufficient data (in line with Sigma Metric but derived independently) . Once we have the 100 results from two QC lots, we establish the QConnect limits but review them every few weeks. However, generally the QConnect limits do not change unless the manufacturer of the reagent changes something (which they do and sometimes don’t tell the labs !) or our manufacturer changes the QC (and then we get grumpy with them). Then the lack of 2 QC lots does not pose much of a problem for setting QConnect Limits when there are assay changes, as these QC lots already exist and 100 data points can quickly be established, as we have hundred of labs providing QC data. It is typically when a new QC product is introduced to the market, where only one QC lot exists, that NRL uses the ‘cumulative mean’ approach.
We feel we have answered the 20 vs 100 data point question, both above and through publication evidence. The next question was “If 20 measurements aren't enough for serology, how is it that this minimum is fine for chemistry?” First, let's define what “chemistry” is. We define it as testing for inert chemicals in serum (e.g. iron, urea, glucose, potassium). These analytes have certified reference materials and the tests are calibrated to an international standard. You can buy them in a jar from SigmaAldrich or any chemical supplier. The assay calibrators are traceable to SI units and can be used to “recalibrate” the assay. You want to be sure that a glucose test gets the same quantitative results irrespective of the lab or the assay or even self-test. Therefore, by definition, chemists can remove bias and they do not experience lot to lot variation, and if they do, they recalibrate and remove the bias. So the mean and SD of 20 QC results generally should represent future data [1].
If we define “chemistry” as including immunoassays, then we would postulate that they have the same issues as infectious disease testing – the technologies are in the serology realm even though the testing has traditionally been in the chemistry realm. But we are not experts in this area and leave it to others to “discover” this situation, but if asked, we would be very keen to explore this side of the coin.
In infectious disease testing, lot to lot variation cannot be removed from the manufacturing process, so 20 data points patently do not represent future results.
There is probably a longer discussion to be had on the technologies in use. Technologies used in qualitative testing such for infectious disease testing and whether they support Normal distribution or not, and subsequently whether this is important to either patient outcomes or QC data, should be explored.
Final comment was “The implications of your approach appear to be universal - our chemistry QC is as deeply flawed as the serology QC, and we should convert over to QConnect limits for all tests and all systems.” If this statement is implying that clinical chemistry should adopt QConnect, we would not agree with this statement. Westgard rules and Sigma Metrics are absolutely relevant for all assays that are not significantly impacted by lot-to-lot variation. We are not chemists but we expect that this includes the vast majority of clinical chemistry analytes, in particular analytes that have a chemical formula (iron, glucose, potassium, urea, creatine etc). It most likely is relevant for disciplines that count things (eg red blood cells, urine analysis). But we would strongly argue and have presented what we believe is irrefutable data that assays that experience lot to lot variation are not appropriate for Westgard rules.
Westgard: Whenever you show me your QConnect graphs, with multiple lots displayed, I can't help but see that within the experience of one of those single lot, there could be a significant outlier, something that would exceed perhaps 3 SD of that individual lot SD, but that point would still fall within the QConnect limits. I don't see how your limits can guard against a local outlier. An error significant within the context of a single lot, but it remains “in” compared to wider limits developed over multiple lots. This is essentially the same problem with manufacturer limits, your QConnect limits are just narrower than theirs.
Dimech and Vincini: The premise of the question is that a result within a set of QC results obtained from the same lot number can be "significantly different" to the other results obtained from that same lot. As an example, a QC result obtained from Lot 1 can be outside a mean +/-3SD obtained from the first 20 QC results obtained from testing the QC on Lot 1. It is completely possible that there can be results outside this statistically defined limit of 3SDs and this result is therefore "statistically significant" by definition. There are two remaining questions. The first, "is this result clinically significant" and the second, "does it tell us anything about the performance of the test kit.
To answer question 1, we would argue (with evidence) that it is highly unlikely to be clinically significant. Remember, we are looking for the presence or absence of antibodies. It doesn’t matter if there is a S/Co of 2.0 or 200. The person has antibodies, and the clinical interpretation remains the same (yes, they have HIV or no they don’t). The only time this may have a potential clinical impact is when a person is going through a seroconversion episode (ie having recently been infected and just starting to mount an immune response). The time an individual has a very low antibody response (less than say 2.0x cutoff) is for about 48-72 hours [8]. Even then it assumes that there is commutability between the QC sample (made of a dilution of chronically infected patient samples) and the infected person's antibodies (which are from a primary antibody response and often is comprised of different antibodies subclasses and directed at different antigens). In our paper investigating this very situation, we found that of 44 seroconverting samples, only 3 of 44 reported negative results when tested in a reagent lot number that had experienced a huge decrease in QC reactivity (detected by QConnect). All 3 samples had a S/Co of less than 2.0 when tested on unaffected reagent lots and the negative result were reported on only 3/6 affected reagent lots. In the same paper, we also reviewed EQAS samples made from blood donors and the reactivity of these chronically infected antibody actually increased in reactivity on the affected reagent lots [8].
We also refer you to a paper you referenced in your presentation on the Abbott Health Institute where we reviewed the S/Co of about 5 million donor results from Australian Red Cross. Of these only five donors had a S/Co of less than 2.0 on the HBsAg assay and no donor had a S/Co of less than 2.0 for HIV [2]. It is acknowledged that this is a very "clean" population with donors being deferred if they had any risk factors in addition to likely having already been deferred if they previously had a spurious result, so a lab testing a higher risk population may see more acutely infected individuals. But a true seroconversion sample in clinical setting is a very rare situation and most often testing in association with a molecular assay (esp in blood screening) which would detect the seroconversion.
The second question is more nuanced. If a QC result is outside mean +/- 3SD set on 20 data points from the same lot number, it is likely to be re-tested. If the subsequent result is back within the range, this is considered a random error. It is usually impossible to determine the root cause of this random error because, well, it’s random. Labs will usually retest and, if the results are OK upon retesting, just go on testing patients. But this ignores the fact that the random error may be occurring in patients as well. So, yes a random error that falls outside the mean +/- 3SD but within QConnect may be indicative of a testing issue. We would also include shifts and drifts of QC results that fall within QConnect, but may be indicative of something unexpected happening in the assay. We would agree that these may be signs to investigate what’s happening with the test, especially if it more frequent that expected. QConnect concept does not preclude these investigations. We would encourage labs to be vigilant regarding these situations. But the bottom line is that they are “warnings” and not clinically significant events.
There should also be thought given to the importance of diagnostic sensitivity vs analytical sensitivity. Results outside mean +/- 3SD but clearly still positive may be viewed as a possible change in the analyte concentration and so subsequently has the analytical sensitivity of the assay changed? Possible but unlikely. Retest the same sample and it typically goes back to normal – was there any change in antibodies in the original sample? Not likely. What is more important here is the diagnostic sensitivity – the ability to detect an analyte or disease if it is present, regardless of how consistently a serology assay reports a number. This is far and away more important than analytical sensitivity in serology testing. When challenging assays with low reactive samples, we have confidence that it will be detected due to well understood diagnostic sensitivity claims. To make conclusions using a QC sample on change in analytical performance of an assay due to such small S/Co value changes is not the purpose of an external QC.
Westgard: I also wonder about the historical accumulations, what allowable biases are accepted between reagent lots, and what allowable imprecision is accepted within a single reagent lot. If any lot is from the past, is all that data automatically acceptable? Just because data is old doesn't make the data correct. If all past data is accepted, over time, won't the QConnect limits widen?
Dimech and Vincini : Fair question. In our experience, as we add historical data to the population, the limits widen up until a point. Then, as more data are added, the limits remain stable (see image below). If a new lot with a major difference in reactivity was introduced, then two things will happen. Firstly, it will be identified by the QConnect limits as being different. If we add the QC results of these lots to the population, it will depend on the numbers of results in the population used to establish QConnect limits. Generally, we have many hundreds of thousands of results, so the addition of hundreds of QC results from the affected lot will not change the limits materially. We will make the call if they look like they will impact the limits, but this is in conversation with that manufacturer. What we find is that sometimes the manufacturer will make a change to the assay without informing the users. This can be a minor change (e.g. a change in buffer) or major change (i.e. different monoclonals). If the change will be ongoing, we will re-establish the QConnect limits.
The question regarding imprecision is a a good question. Generally, we see the imprecision of QCs within a lot as being similar from lot to lot. That is, Lot A will generally have the same CV as Lot B and C, even if the mean is different. What we do see sometimes is that a particular lot may have a significant change in CV but still be within QConnect limits. This may be an important indicator of something changed in the test process (again, not clinically significant but important to investigate). The NRL QC Services team have been working on a review of CVs for each assay to determine if we can add some metrics around CV acceptability even if within QConnect limits.
Westgard: Next, I think there is some confusion over the difference between re-setting the mean and re-setting the SD. If the imprecision is constant and reproducible across lots, but bias shifts the mean, then we could use the same %CV to establish limits across multiple lots, just using the CV% to multiple on the new mean, and thus, we have new limits, without the dreaded 20 measurements requirement.
Dimech and Vincini: On the face of this, it would seem a sensible approach. However, the concern is that using this approach does not determine what extent of change in bias is acceptable. An example we use in our presentations is below. Objectively they are the same. The CV for each lot is similar to each other. So we just re-set the limits on the mean of the results using historical CVs.
But the purple result on the left-hand side is the lot number with a significant change. The lab would have missed this if they relied solely on the precision. If we applied QConnect limits we can see that there are about half the results outside the range. (note same data set but different timelines). The graph on the LHS is the results from different batches of reagent showing 7 lots that were affected.
So, whereas we agree that the CV of QC results usually are the same between each lot, it is important to have context as to how different the mean is allowed to be (ie what change in bias is acceptable). Otherwise, a lab will automatically accept the difference in the mean inappropriately. Historical data provides that context. And with this context, perhaps a heavy reliance on precision is not so necessary. There should be some strong reservation to applying CV as a broadstroke approach – it will not be the panacea here, as it would definitely replace unnecessarily stringent traditional QC models with another stringent/tight range that adds questionable value. Case in point, choose any reagent lot in the previous chart and you can see that there are regular ‘peaks’ and ‘troughs’ when taken purely from a precision view (such as applying a CV range). And yet with context (application of historical data, standing back and looking at what a S/Co value change from eg 2.2 to 2.6 truly means) these are likely red herrings that would be investigated unnecessarily too.
Westgard: The bias between lots is an issue addressed by the CLSI guideline EP26. It recommends in particular that patient samples should be used to demonstrate that there is no significant difference between lots. If that is done, the QC mean can move up and down without a problem, because the QC mean shift is a result of the matrix issues.
Dimech and Vincini: This approach is not relevant to infectious disease serology. Take HIV as an example. The vast majority of samples will be negative i.e. no antibodies detected. Does it make sense to map the numerical value of negative patient samples (given that by definition there are no antibodies, so the numerical value is actually background noise). Alternatively, the lab could have a panel of positive samples (chronically infected patients) with a high S/Co. What would be the acceptance criteria. They will undoubtedly be positive irrespective of the change in bias due to a lot number. As we pointed out above, an affected lot actually increased the signal of chronically infected patients. Finally, they could test seroconversion panels. This is cost prohibitive and not at all practical.
Again, this is where the biochemistry approach just cannot be retrofitted to serology. It is a nonsense and should be strongly discouraged.
Westgard: I also was reading in your paper that of the 70 outliers, only 1 was considered a true failure by the laboratory manager. However, you list that there were 15 outliers detected by QConnect. How many of those were considered true failures by the same judge?
Dimech and Vincini: It was the same data set, so there was 1 “failure” over the 5-month period. That one “failure” was self-reported by the senior scientist as a deviation that they considered putting patients at risk. The traditional methods flagged 70 “alerts” and QConnect flagged 15 “alerts” within that 5-month period. So, the difference was that for 70 episodes the lab performed multiple QC replicates, recalibrations and resetting of limits. For the 15 QConnect failures, they would have recourse to NRL QC Services to investigate on their behalf. These will have consumed some effort, but the point is that there was about 5 x less effort than otherwise.
Westgard: Do you have an expectation of the error detection and false rejection capabilities of the QConnect limits? Is the error detection 100% and false rejection 0%?
Dimech and Vincini: Absolutely not. We expect that the detection rate for clinically significant errors is extremely high. Using QConnect without any other tools may result in some operationally important episodes being undetected (as we mention above – the random QC outlier, a trend of QC results within QConnect or a larger CV than expected). However, these are indicators of a change in the test system but not clinically significant changes. We do recommend that labs review their QC charts periodically (eg monthly or weekly) for trends and we are working on CV as a tool to monitor performance (watch this space). Many of the assessments we have discussed reflect an approach that laboratories would/should take when looking at infectious disease QC data. This is typical of our approach, where not just statistics are used to make a decision on QC, but also taking in consideration kit controls, patient result data, scientist/lab expert experience, peer review, IFU, understanding of how assays work, and/or manufacturer evidence to come to a conclusion on whether the QC result and/or patient result should be accepted/rejected.
References
- Dimech W. The standardization and control of serology and nucleic acid testing for infectious diseases. Clin Microbiol Rev 2021;34.
- Dimech W, Freame R, Smeh K, Wand H. A review of the relationship between quality control and donor sample results obtained from serological assays used for screening blood donations for anti-HIV and hepatitis B surface antigen. Accred Qual Assur 2013;18:11-18.
- Dimech W, Karakaltsas M, Vincini GA. Comparison of four methods of establishing control limits for monitoring quality controls in infectious disease serology testing. Clin Chem Lab Med 2018;56:1970-1978.
- Dimech W, Vincini G. Evaluation of a Multimarker Quality Control Sample for Monitoring the Performance of Multiplex Blood Screening Nucleic Acid Tests. Vox Sang 2017;112.
- Dimech W, Vincini G, Karakaltsas M. Determination of quality control limits for serological infectious disease testing using historical data. Clin Chem Lab Med 2015;53:329-36.
- Dimech W, Vincini G, McEwan B. External quality control processes for infectious disease testing. Microbiology Australia 2024.
- Dimech W, Walker S, Jardine D, Read S, Smeh K, Karakaltsas K, et al. Comprehensive quality control programme for serology and nucleic acid testing using an Internet-based application. Accred Qual Assur 2024;8:148–151.
- Dimech WJ, Vincini GA, Cabuang LM, Wieringa M. Does a change in quality control results influence the sensitivity of an anti-HCV test? Clin Chem Lab Med 2020;58:1372-1380.
- Dimech WJ, Vincini GA, Plebani M, Lippi G, Nichols JH, Sonntag O. Time to address quality control processes applied to antibody testing for infectious diseases. Clin Chem Lab Med 2023;61:205-212.
- Dimech WJ, Vincini GA, Plebani M, Lippi G, Nichols JH, Sonntag O. Response to Tony Badrick regarding "Letter to the Editor regarding the article by Wayne J. Dimech et al. Time to address quality control processes applied to antibody testing for infectious diseases. Clin Chem Lab Med 2023; 61(2):205-212 by". Clin Chem Lab Med 2023;61:e137-e139.