The problem with specificities and false positive rates
By HUW LLEWELYN, May 29 2021 11:08PM
The concepts of 'sensitivity' and 'specificity' were used for radar during Word War II. If the 'receiver-operator' turned the detection knob one way, it increased sensitivity but tended to detect unwanted objects (e.g. sea birds) by decreasing the specificity. If it was turned the other way, it it increased specificity and decreased sensitivity (but was at risk of missing smaller enemy aircraft). However, this reduced the risk of detecting unwanted objects (e.g. sea birds).
It was assumed by some that this is how diagnosis works too. The 'sensitivity' of a finding with respect to a diagnosis is the frequecy with which that patient's finding occurs in those with the diagnosis e.g. the frequency of localised right lower quadrant (LRLQ) pain in patients with appendicitis e.g. 50/100. However the 'specificity' is the frequency with which those WITHOUT the finding occurs in those WITHOUT the diagnosis. In turn, the false positive rate is '1 minus the specificity': the frequency with which a finding e.g. LRLQ pain occurs in people WITHOUT appendicitis. It is also assumed that if a test occurs equally frequently in those with and without a diagnosis, then it is useless when used alone or in combination with other findings.
There is a big problem with 'specificity' and 'false positive rate (i.e. '1 minus the specificity'). It is the issue of who should we regard as 'those without a diagnosis' e.g. who should we regard as those 'without appendicitis'? Is it those in a ward,n in a whole hospital or in the whole community? In other words, these values depend on the population in which 'those without' the diagnosis or finding were counted. There is also another problem in that 'those without a diagnosis' will include patients with other diagnoses. To understand this, look at Figure 1 below and then read on.
If localised right lower quadrant (LRLQ) pain occurs in 50% of those with appendicitis and 50% of those without appendicitis, the likelihood ratio is 1 and it seems unhelpful. Similarly, if guarding occurs in 50% of those with appendicitis and 50% of those without appendicitis, the likelihood ratio is 1 and guarding also seems to be unhelpful. These two findings will also seem unhelpful if used in combination as the combined likelihood ratio assuming statistical independence is 1 x 1 = 1.
However, assume that 50% of those without appendicitis had ‘non-specific abdominal pain’ (NSAP) and all these patients with NSAP had LRLQ pain, the others without appendicitis or NSAP never having LRLQ pain. Also assume that of those without appendicitis who had NSAP, none had guarding (see Figure 1).
This means that if a patient has LRLQ pain, he or she must have appendicitis or NSAP. If the patient has guarding then as this never occurs in NSAP but often occurs in appendicitis, the diagnosis must be appendicitis. So despite all the likelihood ratios being 1 (and apparently being useless), the combination of LRLQ pain and guarding predict appendicitis with certainty (showing that they are very useful indeed). This is how reasoning by elimination between LRLQ pain and guarding works. It is an important method of reasoning in medicine and is different to applying Bayes rule, sensitivities and specificities.
This is a serious issue because 'sensitivity' and 'specificity' currently play a central role in deciding whether the results of new tests are going to be useful for diagnosis and therefore whether use of that test should be allowed. However, from this example, it can be seen that 'specificity' is unreliable and that in reasoning by elimination, only the 'sensitivities' (and false negative rates) are used: the frequency of patients with a positive or negative finding in those with a specified diagnosis (e.g. the frequency of guarding in those with appendicitis (50%) and NSAP (0%).