FormalPara Take home message

In this meta-analysis, we found that physical examination findings (fever, purulent secretions), chest radiography, endotracheal aspirate cultures, bronchoscopic cultures, and Clinical Pulmonary Infection Score (CPIS) have poor accuracy for diagnosing ventilator-associated pneumonia. These findings have important implications for patient management, antibiotic stewardship, and quality measurement.

Introduction

Ventilator-associated pneumonia (VAP) is the most common and fatal nosocomial infection in Intensive Care Units (ICU) [1, 2]. VAP is associated with prolonged duration of mechanical ventilation and ICU length of stay, increased hospital costs, and possibly an increased risk of dying [3,4,5,6]. VAP is also a major driver of antibiotic use in ICU patients [7].

Early identification of VAP is critical because delayed administration of antimicrobial therapy has been associated with increased mortality [8,9,10]. However, the importance of rapid antibiotic administration must be balanced against the risks of unnecessary antibiotics, including antibiotic resistance and superinfections [11], particularly in the ICU [12]. Finding the right balance is challenging because VAP is difficult to diagnose. As there is no practical reference standard for VAP, perceived VAP rates and outcomes vary widely depending on the definition applied [13], and up to two-thirds of patients treated for VAP may not actually have VAP [14, 15]. Improved methods to diagnose VAP and inform the initiation of empiric antibiotics are urgently needed.

Clinicians typically rely upon clinical, radiographic, and laboratory indicators to diagnose VAP and initiate empiric antibiotics. These include fever, purulent secretions, hypoxemia, new or progressive chest radiographic infiltrate, elevated white blood cell count, and positive cultures of endotracheal aspirates (ETA) or bronchoscopic sampling techniques (bronchoalveolar lavage [BAL] and protected specimen brush [PSB]). Some of these have been combined into clinical models, the most popular of which is the Clinical Pulmonary Infection Score (CPIS) [16]. However, despite widespread use of these signs and tests, their accuracy to diagnose VAP is poorly characterized [1, 17]. We conducted a systematic review and meta-analysis to evaluate diagnostic performance (including sensitivity and specificity) of these signs and tests, compared with either histopathology of lung tissue, or quantitative BAL cultures as reference standards. We hypothesized that these individual tests, in isolation, had poor diagnostic accuracy for VAP.

Methods

We structured this systematic review according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for Diagnostic Test Accuracy [18, 19], the Cochrane Handbook for Diagnostic Test Accuracy [20], and existing guidelines for reviews of diagnostic accuracy [21], as performed previously [22, 23]. We registered the study protocol with PROSPERO (CRD42019124907).

Search strategy

We searched six databases (MEDLINE, PubMed, EMBASE, Scopus, Web of Science, and the Cochrane Database of Systematic Reviews) from inception until September 1, 2019. An experienced health sciences librarian assisted in the development of the search strategy. The search was conducted using the terms “ventilator-associated pneumonia” and “ventilator-acquired pneumonia” (Supplemental Figure 1). We used the Science Citation Index to retrieve reports citing the relevant articles identified from our search. We conducted further surveillance searches using the ‘Related Articles” feature of PubMed [24].

Study selection

We included all English-language articles describing retrospective and prospective observational studies and randomized controlled trials (RCTs). We included studies meeting the following criteria: 1) ≥ 90% adult patients (≥ 16 years); 2) conducted in the ICU; 3) included patients with ≥ 48 h of invasive mechanical ventilation; and 4) evaluated one or more of the following characteristics: fever (defined as body temperature ≥ 38 degrees Celsius), purulent secretions, leukocytosis (any threshold), chest radiography, gram stain, and/or culture from ETA (≥ 105 colony-forming units [CFU]/mL), PSB (≥ 103 CFU/mL), BAL (≥ 104 CFU/mL), or CPIS for diagnosis of VAP. For our primary analysis, we used a reference standard of histopathology (obtained from lung tissue biopsy) for definitive diagnosis of VAP. However, because histopathology may not always be easily obtained and because this reference standard may limit generalizability of results, we performed a secondary analysis using BAL of the lung with ≥ 104 CFU/mL of a pathogen known to cause VAP as the reference standard.

We excluded case reports, case series, animal studies, and pediatric studies. For all included studies, we extracted a 2 × 2 table of true positive, false negative, true negative, and false positive counts. We contacted authors for further information when these values could not be obtained from study reports. If the corresponding author did not respond after three attempts, the study was excluded.

We screened studies using Covidence (Melbourne, Australia). Two reviewers (SMF and A. Tran) independently screened the titles and abstracts of all studies, in order to identify potentially eligible studies. The same two reviewers then independently assessed full texts of these potentially eligible studies. Disagreements were resolved by consensus.

Data extraction

One investigator (SMF) collected the following variables from included articles using a pre-designed data extraction sheet (Supplemental Table 1): Author name, year of publication, study design, eligibility criteria, and number of patients. Two investigators (SMF and A. Tran) independently extracted the true positive, false positive, false negative, and true negative counts for all included articles. Disagreements were resolved through consensus. A third investigator (WC) verified all extracted data.

Quality assessment

Two reviewers (SMF and A. Tran) independently assessed risk-of-bias of included studies, using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [25]. QUADAS-2 assesses four potential areas for bias and applicability of the research question: (1) Patient selection; (2) Index test: potential risk-of-bias noted if the index test results were interpreted without explicit blinding to gold-standard; (3) Reference standard: potential risk-of-bias noted if the reference standard could misclassify the target condition; and (4) Flow and timing: potential risk-of-bias noted if not all patients had the diagnostic test applied using the same criteria, if the diagnostic test was measured at an inappropriate time interval prior to the definitive VAP reference standard, or if patients were inappropriately excluded. Studies with potential risk-of-bias in any of these domains were judged as high risk-of-bias overall. We performed two separate risk-of-bias assessments, one for each reference standard.

Evidence synthesis

We presented individual study results graphically by plotting sensitivity and specificity estimates on one-dimensional forest plots(ordered by sensitivity) as well as on the Receiver Operating Characteristics (ROC) space, to visually assess for heterogeneity. To pool the bivariate results, we applied the Hierarchical Summary ROC (HSROC) model [26], which appropriately incorporates both within-study and between-study variability by defining separate models for each type of variability. The within-study variation is described using a binomial distribution for the number of positive tests as a function of patients’ true mortality status, while the between-study variation allows both the “positivity” and “accuracy” parameters to vary between studies. We obtained summary point estimates of the pairs of sensitivity and specificity, positive and negative predictive values, as well as diagnostic odds ratios and likelihood ratios, along with corresponding 95% confidence intervals (CI). Summary estimates of test accuracy were plotted in the ROC space together with the summary ROC curve. We conducted all analyses using the MetaDAS (version 1.3) [27] macro in SAS (SAS Institute), as recommended by the Cochrane Handbook for Systematic Review of Diagnostic Test Accuracy [20]. Univariate tests for heterogeneity in sensitivity and specificity are not recommended by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, as they did not account for heterogeneity explained by phenomena referred to as the "positive threshold effects" [20]. Instead, it is preferable to demonstrate heterogeneity graphically through the scatterplot and the confidence region of the bivariate summary point within the ROC plane and the descriptive forest plots. For our primary analysis using histopathology as the reference standard, we conducted a predefined sensitivity analysis excluding studies that did not also use tissue culture for the diagnosis of VAP.

We assessed the overall certainty in pooled diagnostic effect estimates using the Grading of Recommendations, Assessments, Development and Evaluation (GRADE) approach [28, 29]. The overall confidence in effect estimates was categorized as high, moderate, low, or very low. We created a GRADE evidence profile for each parameter using the guideline development tool (gradepro.org).

Results

Search results

We identified 1,464 potentially relevant citations (Fig. 1). Following duplicate removal, 1042 studies were screened, and 38 underwent full-text review. We included 25 studies in the meta-analysis [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54]. Of these, 15 used histopathology from lung biopsy as reference standard [30, 31, 33, 36, 37, 39, 40, 43, 45, 46, 49, 51,52,53,54], while 10 used positive BAL culture as reference standard [32, 34, 35, 38, 41, 42, 44, 47, 48, 50]. Unpublished data from one study were obtained from the principal investigators (A. Torres, OTR) [38].

Fig. 1
figure 1

Flow chart summarizing evidence search and study selection

Study characteristics

Study characteristics are displayed in Table 1, with detailed review in Supplemental Table 2. Of the studies included, 17 (68.2% of patients) were performed in Europe, 6 (21.3% of patients) in North America, 1 (8.5% of patients) in Asia, and 1 (2.1% of patients) in South America. 21 studies (61.8% of patients) were prospective cohort studies, while the remaining 4 (38.2% of patients) were retrospective cohort studies.

Table 1 Characteristics of the 25 included studies

Quality assessment

Quality assessments are summarized in Supplemental Figures 2–3. Three studies were considered to have high risk-of-bias: One because it included only patients diagnosed with acute respiratory distress syndrome (ARDS) [30], and two because patients who underwent bronchoscopy (at the judgment of the treating clinician) were retrospectively included, leading to potential bias in patient selection [35, 47]. Four retrospective studies were noted to have unclear risk-of-bias [30, 45, 51, 54], because it was not explicitly stated whether diagnostic adjudicators (i.e., those evaluating histology) were blinded to the test characteristics of the individual patients.

Results of synthesis

Primary analysis—reference standard of histology

Pooled sensitivity and specificity of clinical signs, relative to histopathology, are shown in Table 2. Corresponding forest plots and HSROC curves are shown in Supplemental Figures 4–11. GRADE evidence profiles are depicted in Supplemental Tables 3–10.

Table 2 Summary estimates of the performance of physical examination, chest radiography, laboratory values, and CPIS for the diagnosis of ventilator-associated pneumonia, relative to reference standard of histopathology from lung biopsy

Among physical examination features, the presence of fever had a pooled sensitivity of 66.4% (95% CI 40.7–85.0, low certainty), and specificity of 53.9% (95% CI 34.5–72.2, low certainty). Purulent secretions had a sensitivity of 77.0% (95% CI 64.7–85.9, moderate certainty), and specificity of 39.0% (95% CI 25.8–54.0, moderate certainty). Presence of infiltrate on plain chest radiography had a sensitivity of 88.9% (95% CI 73.9–95.8, low certainty) and specificity of 26.1% (95% CI 15.1–41.4, low certainty). Leukocytosis was only evaluated in three studies of which two defined leukocytosis as a white blood cell count ≥ 10 × 109/L [31, 37], while one study defined it as a count ≥ 12 × 109/L [40]. Pooled sensitivity for leukocytosis was 64.2% (95% CI 46.9–78.4, low certainty) and specificity 59.2% (95% CI 45.0–72.0, low certainty).

We evaluated pathogen growth from three different sampling techniques. Positive ETA had a sensitivity of 75.7% (95% CI 51.5–90.1, very low certainty) and specificity of 67.9% (95% CI 40.5–86.8, very low certainty). PSB from bronchoscopy had a sensitivity of 61.4% (95% CI 43.7–76.5, low certainty) and specificity of 76.5% (95% CI 64.2–85.6, low certainty), while BAL from bronchoscopy had a sensitivity of 71.1% (95% CI 49.9–85.9, low certainty) and specificity of 79.6% (95% CI 66.2–88.6, low certainty). There were insufficient studies available for meta-analysis of gram stain accuracy.

Finally, CPIS > 6 had a pooled sensitivity of 73.8% (95% CI 50.6–88.5, low certainty) and specificity of 66.4% (95% CI 43.9–83.3, low certainty).

Although several of the included studies examined combinations of signs or tests (Table 3), none of the specific combinations were evaluated in multiple studies, thus precluding meta-analysis. Three studies [40, 51, 52] evaluated combinations of clinical signs with concurrent infiltrate on chest radiography. The presence of infiltrate coupled with ≥ 1 of (fever, purulent secretions, or leukocytosis) had a sensitivity of 65–85% and specificity of 33–36%, whereas an infiltrate and all 3 signs had a sensitivity of 16–23% and specificity of 91–92%.

Table 3 Summary estimates of the performance of combinations of clinical findings from individual studies

Sensitivity analyses—excluding studies with histology alone as reference standard

We repeated our primary analysis after excluding studies that used histopathology alone (without tissue culture) as the reference standard. Pooled estimates and conclusions were consistent with the primary analysis (Supplemental Table 11 and Supplemental Figures 12–15).

Secondary analysis—reference standard of Bronchoalveolar lavage

The results of our secondary analysis of studies using BAL ≥ 104 CFU/mL of a known pathogenic VAP organism as reference standard are shown in Table 4. Corresponding forest plots and HSROC curves are shown in Supplemental Tables 16–18. GRADE evidence profiles are depicted in Supplemental Tables 11–13.

Table 4 Summary estimates of the performance of physical examination, chest radiography, laboratory values, and CPIS for the diagnosis of ventilator-associated pneumonia, relative to reference standard of bronchoalveolar lavage

Three features had a suitable number of studies for meta-analysis using this reference standard. Presence of purulent secretions had a sensitivity of 87.9% (95% CI 68.3–96.1, low certainty) and specificity of 38.8% (95% CI 20.4–61.0, low certainty). Presence of an infiltrate on chest radiography had a sensitivity of 85.2% (95% CI 52.9–96.7, low certainty) and specificity of 18.0% (95% CI 4.7–49.7, low certainty). CPIS > 6 had a sensitivity of 75.4% (95% CI 38.5–93.7, low certainty) and specificity of 68.3% (95% CI 42.9–86.1, low certainty).

Discussion

We performed a systematic review and meta-analyses to investigate the accuracy of physical examination findings, leukocytosis, chest radiography, ETA (≥ 105 CFU/mL), bronchoscopic sample cultures [PSB (≥ 103 CFU/mL) and BAL (≥ 104 CFU/mL)], and CPIS (> 6) for diagnosis of VAP in critically ill adults. When evaluating these signs against the reference standard of lung histopathology, we found that none were very accurate. The presence of infiltrate on plain chest radiography had the highest sensitivity (88.9%); however, all signs had poor specificity. Results were similar when using BAL as the reference standard: purulent secretions and infiltrate on chest radiography had modest sensitivity (87.9% and 85.2%), but all signs had poor specificity. CPIS was inaccurate, regardless of the reference standard used. Our study suggests that the individual signs and tests clinicians routinely use to diagnose VAP and inform initiation of antibiotics in the ICU are neither sensitive nor specific.

Some clinicians perceive that BAL overcomes the limited accuracy of clinical signs, but our results suggest that quantitative BAL cultures are also subject to high rates of both false positives and false negatives [49]. Several studies have used BAL as a reference standard in the evaluation of other VAP diagnostic tools, such as biomarkers [42, 55]. The limited accuracy of BAL to diagnose VAP should be considered when interpreting these studies. The more reliable reference standard for VAP diagnosis is histopathology from lung biopsy [39], but this is impractical for routine diagnosis, may be influenced by the area of the lung that is biopsied, and is itself subject to disagreement between pathologists [1, 56]. The accuracy of various clinical signs has been evaluated in two previous systematic reviews [1, 17]; however, meta-analysis, summarizing pooled sensitivity and specificity, was not performed. As such, this study provides novel data highlighting the poor accuracy of the signs and tests commonly used to diagnose VAP in mechanically ventilated patients.

Our analyses suggest that none of the indicators we evaluated have sufficient sensitivity to safely rule out VAP in susceptible patients. Our findings also highlight the need to be mindful of the possibility of VAP, even in patients without classic clinical signs. Conversely, none of the evaluated clinical indicators had sufficient specificity to rule in a diagnosis of VAP. Bayesian analysis demonstrates that the individual signs we evaluated do little to alter provider pre-test probability (Supplemental Tables 14–15). For example, in a patient with a 50% pre-test probability of VAP, absence of infiltrate on chest x-ray only reduces the post-test probability to 30%, while presence of infiltrate only increases it to 55%. We also sought to evaluate the accuracy of gram stain, but there were not enough studies to meta-analyze this for either reference standard.

It remains unclear if combinations of particular signs increase diagnostic accuracy. The CPIS score combines a number of patient features, but our analysis found it still has poor sensitivity and specificity. These findings therefore suggest that neither individual signs nor the CPIS is reliable indicator of VAP, and relying upon any of these tools may result in undertreatment or overtreatment.

The limited accuracy of clinical signs and tests for VAP has important implications. Suspected respiratory infections, including VAP, are the biggest drivers of antimicrobial use in the ICU [57]. Qualitative analyses, however, suggest much of this utilization is unnecessary [7]. Our results indicate that clinicians should not initiate antibiotics solely on the basis of these independent signs. We suggest instead tailoring the urgency of antibiotics to severity of illness. In patients with suspected VAP and hemodynamic instability or severe hypoxemia, antibiotics should be started expeditiously regardless of clinical certainty. If patients are more stable, however, then more conservative approaches should be considered. Computed tomography, if available, may improve accuracy compared to plain chest radiography although this has not been sufficiently studied [58, 59]. Clinicians might also consider treating alternative causes of respiratory instability (e.g., diuresis, suctioning of mucous plugs) before starting antimicrobials in hemodynamically stable patients that can tolerate some delay in antibiotic initiation [60,61,62]. This approach has the potential to reduce antimicrobial utilization but requires further investigation to confirm safety. Notably, many RCTs have used VAP rates as primary outcomes [63]. Our study suggests this is an unreliable outcome. More concrete outcomes such as mortality, duration of mechanical ventilation, and antimicrobial utilization may be more appropriate outcomes in such trials [63].

The difficulties inherent to VAP diagnosis informed the Centers for Disease Control and Prevention (CDC) decision to move from VAP surveillance to ventilator-associated event (VAE) surveillance for the purposes of internal benchmarking and quality improvement [64]. This approach follows from the recognition that accurately diagnosing VAP using clinical signs is incredibly challenging, a sentiment that is supported by the results of this review. Instead, VAE surveillance focuses on detecting nosocomial deterioration in respiratory function from any cause, since this can be objectively defined through examining the trajectory of patients’ ventilator settings. This approach emphasizes the importance of preventing both infectious and non-infectious complications of mechanical ventilation. However, the VAE definitions are designed to inform surveillance, not real-time clinical care.

This review was performed using a comprehensive search with clear inclusion and exclusion criteria, examining various clinical indicators for VAP as compared to two different reference standards, and included unpublished data provided by study authors. We assessed risk of bias for individual studies and used GRADE methodology to assess and contextualize our findings based on our overall certainty in effect estimates. This review also has limitations. First, we evaluated clinical signs independently. In practice, providers typically use combinations of signs and tests to arrive at a diagnosis. Unfortunately, not all combinations were amenable to meta-analysis, though we did evaluate the accuracy of CPIS and summarize some combinations of signs evaluated in individual studies. Second, since we performed a meta-analysis using the published literature, there was a lack of available details on the included patients, which precluded our ability to perform subgroup analyses to further identify sources of heterogeneity. In particular, the accuracy of clinical indicators may be influenced by prior exposure to antibiotics. Unfortunately, the included studies did not always indicate whether patients had been treated with antimicrobials prior to sampling, thus precluding a sensitivity analysis restricted to patients without prior exposure to antimicrobials. Third, our pooled estimates were mostly of low certainty, due to imprecision and inconsistency. This suggests the need for high-quality studies addressing this important topic. Fourth, there is likely clinical heterogeneity across studies, particularly with relation to subjective indicators, such as purulent sputum and chest radiography, which may vary between raters. Fifth, not all tests evaluated by histopathology as reference standard were also amenable to testing by BAL as reference standard. Finally, our study may suffer from spectrum bias. In particular, the reference standard of histopathology was largely applied to deceased patients and therefore may reflect test performance in patients with more severe disease. Our findings were consistent, however, when using quantitative BAL cultures as the reference standard.

Conclusion

This systematic review and meta-analyses found that classically used clinical indicators, including fever, purulent secretions, leukocytosis, chest radiography, cultures from 3 sampling techniques (ETA, PSB, BAL), and CPIS had poor specificity for diagnosis of VAP. Reliance upon the presence of any of these indicators may result in misdiagnosis and possibly unnecessary antimicrobial utilization. These findings highlight the uncertainty of diagnosing VAP and underscore the need for better tools to help clinicians know when to start and stop empiric antibiotics for possible VAP.