November 1, 2020

Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. Therefore: SnNout: if a diagnostic test, characterized by high sensitivity (Sn), returns the negative value (N), then it excludes the diagnosis (out) 2-3. Carl Heneghan and Douglas Badenoch: Evidence-based Medicine Toolkit, Second Edition. One could also compute the proportion of New Test+ subjects that are non-reference standard+ and get a different number. = 100% x 40/41 = 97.6%, negative percent agreement (new / non ref. FDA recommends you define the conditions of use under which the candidate test and the reference standard or comparative method are performed. These claims may be appropriate if the archived specimens are representative of specimens from subjects in the intended use population, with and without the target condition, including unclear cases. Last Updated on April 18, 2020 by Sagar Aryal. In order to demonstrate this phenomenon, start with the data from Table 6A.

Performance measures should be interpreted in the context of the study population and study design. Discrepant analysis: How can we test a test? Statistical Methods in Medical Research, 7, 354–370.

Dealing with discrepancy analysis part 1: The problem of bias. Comparison of a screening test and a reference test in epidemiologic studies. Weinstein S, Obuchowski NA, Lieber ML. Proceedings of the 2003 Joint Statistical Meetings, Biopharmaceutical Section, San Francisco, CA. Because the calculations of sensitivity and specificity from such a revised 2x2 table are not valid estimates of performance, they should not be reported. the limitations introduced through selective sampling. The purpose of the meeting was to obtain recommendations on “appropriate data collection, analysis, and resolution of discrepant results, using sound scientific and statistical analysis to support indications for use of the in vitro diagnostic devices when the new device is compared to another device, a recognized reference method or ‘gold standard,’ or other procedures not commonly used, and/or clinical criteria for diagnosis.” Using the input from that meeting, a draft guidance document was developed discussing some statistically valid approaches to reporting results from evaluation studies for new diagnostic devices. Assumption: You have a new rapid diagnostic test being evaluated for the screening of COVID-19, on the specific antibodies produces against the virus, SARS-CoV-2.

These are only estimates for sensitivity and specificity because they are based on only a subset of subjects from the intended use population; if another subset of subjects were tested (or even the same subjects tested at a different time), then the estimates of sensitivity and specificity would probably be numerically different. Agreement of a new test with the non-reference standard is numerically different from agreement of the non-reference standard with the new test (contrary to what the term “agreement” implies). This site uses Akismet to reduce spam. Therefore, FDA recommends the set of subjects and specimens to be tested include: If the set of subjects and specimens to be evaluated in the study is not sufficiently representative of the intended use population, the estimates of diagnostic accuracy can be biased. A practice called discrepant resolution has been suggested to get around the bias problem.

Similarly, the specificity of the test is estimated as the proportion of subjects without the target condition in whom the test is negative (see the Appendix for an example of this calculation). (1998). Learn how your comment data is processed. Unable to process the form. Results are typically reported in a 2x2 table such as Table 1. Kruskal, W., & Mosteller, F. (1979).

3 You can contact statisticians at the FDA Center for Devices and Radiological Health, Office of Surveillance and Biometrics, Division of Biostatistics, at (240) 276-3133. This section provides an explanation of the concepts relevant to this guidance. Miller, W.C. (1998a). use the terms “sensitivity” and “specificity” to describe the comparison of a new test to a non-reference standard, discard equivocal new test results when calculating measures of diagnostic accuracy or agreement, use outcomes that are altered or updated by discrepant resolution to estimate the sensitivity and specificity of a new test or agreement between a new test and a non-reference standard. Selection of the appropriate set of subjects and/or specimens is not in itself sufficient to ensure high external validity. when the original results (new test and non-reference standard) agree, assume (often incorrectly) that they are both correct and do not make any changes to the table. Two-sided 95% score confidence intervals for sensitivity and specificity are (74.3%, 93.2%) and (96.7%, 99.9%), respectively. Have a well-established gold standard test to determine the prevalence of disease or characteristic, e.g. To address this issue, one option is to report two different sets of performance measures.

Some respondents requested greater attention to the use of standard terminology. Shoukri, M.M. Subjects with the condition of interest are indicated as reference standard (+), and subjects without the condition of interest are indicated as reference standard (−). In this case FDA recommends you retest a sufficient number of subjects to estimate sensitivity and specificity with reasonable precision. Normally, when there is a disease outbreak, diagnostic tests are done to determine if an individual has the disease or not. Stage 1: Testing all subjects using the new test and the non-reference standard, Stage 2: When the new test and non-reference standard disagree, using a, when the original two results agree, you assume (without supporting evidence) that they are both correct and do not make any changes to the table. To show how condition prevalence affects agreement, suppose that the condition prevalence in the study population is much lower, but the agreement between the new test and non-reference standard in subjects with and without the condition remains the same. From Table 2, in truth, 51 subjects have the condition of interest, and 169 do not.

The draft of this document was issued on March 12, 2003. * All subject results incorrectly assumed to be correct (see Table 6A for the correct results for 40* and 171*). Information on the accuracy or “correctness” of the new test cannot be estimated directly. The opposite values are False Discovery Rate (FPV) for Positive Predictive Value (PPV) and False Omission Rate (FOR) for Negative Predictive Value (NPV). Hayden, C. L., & Feldstein, M. L. (2000, Jan/Feb). This document provides guidance for the submission of premarket notification (510(k)) and premarket approval (PMA) applications for diagnostic devices (tests). FDA recommends you provide a complete accounting of all subjects and test results, including: FDA recommends you provide the number of ambiguous4 results for candidate tests, stratified by reference standard outcome or comparative outcome. New York: John Wiley & Sons. If you determine that using a reference standard on all subjects is impractical or not feasible, FDA recommends you obtain estimates of sensitivity and specificity using the new test and a comparative method (other than a reference standard) on all subjects, and use the reference standard on just a subset of subjects (sometimes called partial verification studies or two-stage studies). The third consequence needs further explanation. Not all tests (physical or lab) are perfect: this is where sensitivity and specificity come in play 4. — the proportion of non-reference standard positive subjects in whom the new test is positive, predictive value of a negative result (sometimes called negative predictive value or NPV) — the proportion of test negative patients who do not have the target condition; calculated as 100xTN/(TN+FN), predictive value of a positive result (sometimes called positive predictive value or PPV) — the proportion of test positive patients who have the target condition; calculated as 100xTP/(TP+FP), prevalence — the frequency of a condition of interest at a given point in time expressed as a fraction of the number of individuals in a specified group with the condition of interest compared to the total number of individuals (those with the condition plus those without the condition of interest) in the specified group; pretest probability of the condition of interest in a specified group, reference standard — the best available method for establishing the presence or absence of the target condition; the reference standard can be a single test or method, or a combination of methods and techniques, including clinical follow-up, sensitivity — the proportion of subjects with the target condition in whom the test is positive; calculated as 100xTP/(TP+FN), specificity — the proportion of subjects without the target condition in whom the test is negative; calculated as 100xTN/(FP+TN), study population — the subjects/patients (and specimen types) included in the study, target condition (condition of interest) — a particular disease, a disease stage, health status, or any other identifiable condition within a patient, such as staging a disease already known to be present, or a health condition that should prompt clinical action, such as the initiation, modification, or termination of treatment, TN — the number of subjects/specimens with true negative test results, TP — the number of subjects/specimens with true positive test results, true negative result — a negative test result for a subject in whom the condition of interest is absent (as determined by the designated reference standard), true positive result — a positive test result for a subject in whom the condition of interest is present (as determined by the designated reference standard), Albert, P. S. (2006). There are different ways to describe diagnostic accuracy.

