Lung cancer is the most common cancer worldwide. In 2008, the number of incident cases was estimated to be around 1.6 million (13% of all incident cancers). Mortality is high with 1.4 million of deaths the same year (18% of all deaths from cancer) (www.globocan.iarc.fr). Overall, survival rate at 5 years is <20% but heterogeneity is important and the search for prognostic factors has led to the publication of an impressive number of papers. However, due to the design and often retrospective nature of prognostic factors studies, few of these factors can really be used in routine care to guide management and to determine prognosis. More recently, with the development of so-called targeted therapies, more and more attention has been paid to the identification of predictive factors that might be better tools to guide therapy.
In the present paper, we will review how prognostic and predictive factors are different and how they can be identified. We will also present some well-known and important factors although we will not attempt at all to make an exhaustive report. This is virtually impossible; in 2002, Brundage et al. , in a systematic overview of prognostic factors for non-small cell lung cancer, identified 887 articles published during a decade and more than 150 possible prognostic factors for non-small cell lung cancer.
Prognostic versus predictive factor
A prognostic factor is generally defined as a factor, measured before treatment, that has an impact on a patient′s outcome “independently” of received treatment or of the general class of treatment. Populations of patients used for prognostic factors identification may be very broad (from resected stage I patients to stage IV patients scheduled to receive chemotherapy) or more specific such as patients treated with radical radiotherapy or stage III patients. Outcome is most often defined as overall survival but other outcome measures may be used, such as progression free survival, response to the anti-tumoural treatment as well as disease-free survival rate or proportion of patients alive at a specific time point.
A predictive factor is a factor expected to be able to identify patients who will benefit from a specific treatment. The hypothesis that treatment effect is only subject to random variation does not hold any more. If a predictive factor is validated for a particular treatment, it will obviously guide therapy.
Figures 1–3 illustrate schematically the difference between a prognostic and a predictive factor using a binary outcome for the ease of the graphical representations although time-to-event variables are obviously more interesting when the search for prognostic or predictive factors is done. The same characteristic may be both a prognostic and a predictive factor.
A famous example of predictive factor is the HER2 status for breast cancer, a factor predicting response to trastuzumab, which has been used for >10 years to guide breast cancer treatment . This targeted agent has indeed revolutionised the management of HER2+ breast cancer that was associated with a poor prognosis.
These definitions have implications on the design of studies to be carried out to identify and/or to validate a prognostic factor or a predictive factor. Indeed, prognostic factors can be studied using patient cohorts, prospectively or retrospectively as the treatment administered does not truly matter even if sometimes, authors try to consider series of patients having been treated with a similar strategy. For predictive factors, the situation is totally different: one has to prove that patients bearing a specific characteristic respond better or benefit better from one treatment compared to another one. Therefore, before having a predictive factor useful to guide therapeutic choice in clinical practice, validation of the predictive ability needs to be reached using data from randomised clinical trials. A retrospective validation planned using the databases of randomised clinical trials might however be considered provided that some conditions are met like the availability of the data for almost all patients, the existence of a prespecified hypothesis regarding the effect of the treatment stratified by the levels of the predictive factor, the a priori existence of an analysis plan, a priori agreement on how to measure the predictive factor (in case of a biological covariate, a standardised assay and scoring method should exist) and upfront and justified sample size. These last conditions should also be met when validating a prognostic factor. Some designs for validating predictive factors have been proposed. Basically, true validation trials need to randomise both patients harbouring or not the putative predictive factor and to show absence of treatment effect in one group and presence of treatment effect in the other group using either separate analyses or an analysis testing interaction between the marker and the tested treatment. However, enriched designs can be used to demonstrate an effect in one selected subgroup. The choice between all-comers designs and enriched designs needs to be carefully made depending on the existence of strong biologically founded assumptions, the existence of an assay to accurately measure the marker, the knowledge of a threshold to define marker positivity as well as the prevalence of the marker. We suggest to the reader interested in the methodology of predictive factors validation to read references [3–5].
For decades, two general entities were distinguished among lung cancer: small cell lung cancer, around 15% of all lung cancers , almost never surgically treated, and non-small cell lung cancer.
Non-small cell lung cancer
The staging of cancer is one of the most reproducible prognostic factors with the TNM classification based on tumor size (T), nodal (N) and metastatic (M) involvement. The TNM system was first described by Denoix (1946) and successive TNM classifications for malignant tumors has been published since 1968 by the Union for International Cancer Control (UICC). The 7th edition integrates the changes proposed by the International Association for the Study of Lung Cancer (IASLC)  thanks to the analysis of a retrospective worldwide data base of 17,726 patients (training set) and a careful statistical analysis using recursive partitioning and amalgamations algorithms. The staging system is described in table 1 and the stage is one of the most used factors to guide therapy. According to the clinical stage, the survival rates at 5 years range from 50% for stage Ia to 2% for stage IV while, if the pathological stage is used, a 73% rate of survival at 5 years is observed for stage Ia decreasing to 13% for stage IV (table 1). Stage is a powerful prognostic variable summarising the information included in the three separate factors: T, N and M. Of course, taken separately, these factors are prognostic factors: an increasing tumour size worsens prognosis and the lymph node involvement is per se a major prognostic characteristic which has also an impact on the possibility of surgical treatment (N3 involvement being generally a contraindication to surgery). Pleural dissemination is a negative prognostic feature and, from the 7th edition, a patient with pleural dissemination is now considered M1a . In metastatic patients, a single metastatic site is less detrimental than multiple metastases .
Classical host-related and tumour-related factors
The second most reproducible prognostic factor, also very useful to guide therapy is performance status measured on the Karnofsky scale or on the Eastern Cooperative Oncology Group (ECOG) scale although its value has mostly been demonstrated for non-resected patients [6, 8]. Therefore, some authors have argued that chemotherapy for stage IV patients should be limited to patients with ECOG performance status 0 or 1 ; however, other publications suggest that some patients with PS 2 may also benefit from treatment .
Female sex is also a quite reproducible factor and this has recently been confirmed by a meta-analysis . The authors selected 39 studies having included 86,800 patients; these studies were heterogeneous in terms of stage, histology, geographic origin, treatment distribution and covariates used for adjustment. The authors combined extracted hazard ratios from both univariate and multivariate analyses. The conclusions were similar with combined hazard ratios in favour of women (respectively 0.79 and 0.78; both p<0.0001).
Histology may be an independent prognostic factor although it has not been shown to be highly reproducible except for the fact that tumours with neuroendocrine characteristics have worst prognosis .
In resected patients, age does not seem to be a major prognostic factor . Histology may play a role in outcome: a large Norwegian population based study (n = 3,211 resected patients) suggested that adenocarcinoma and large cell histology might be of worse prognosis . The histological classification has been recently reviewed: an Australian team showed on a retrospective series of 210 resected patients with adenocarcinoma (stage I–III) that subtype might be of prognostic importance with very good prognosis for adenocarcinoma in situ, minimally invasive adenocarcinoma and lepidic-predominant adenocarcinoma whereas micropapillary-predominant and solid with mucin-predominant adenocarcinomas could be associated with particularly poor survival .
Similarly, in the very large series collected by the IASLC Lung Cancer Staging Project, an analysis carried out on 9,137 patients, showed statistical significance for histology after adjustment for stage, age and sex with the following hazard ratios: adenocarcinoma versus bronchioloalveolar carcinoma (BAC): 1.35 (although BAC does not exist anymore in the new classification); squamous versus adeno: 0.86; and large cell versus squamous: 1.16 (all p<0.001) . Haemoglobin level (<12 g·dL−1) might be associated with a higher mortality and preoperative high Cyfra 21-1 level has been associated with higher risk of relapse . Blood vessel invasion is associated to an increased risk of relapse and death as shown by a meta-analysis (multivariate combined hazard ratio for relapse free survival 3.98 (95% CI 2.24–7.06) and for survival 1.90 (95% CI 1.65–2.19)) .
Surgical procedures more extended than lobectomy might also be indicative of a poor prognosis but this variable might just be correlated with other factors that led to the decision of the type of surgery . Alternatively, restrictive procedures may be not enough.
In more advanced non-resectable disease, a younger age might be a feature of better prognosis although competing risks might have a higher impact on mortality in older patients. Among routine biological parameters, normal leukocytosis and normal neutrophil count, lactate dehydrogenase (LDH) level, calcaemia, haemoglobinaemia and albuminaemia have been identified as favourable independent prognostic factors. A meta-analysis of individual data showed that Cyfra 21-1 level has also an independent prognostic value ; anaemia was also shown as an independent prognostic factor in patients with cancer, especially in patients with lung cancer in a systematic quantitative review .
There are plenty of publications in the literature about biological markers not measured routinely in clinical practice. Most often, these factors are not reproducible and their prognostic independent value is not proven, with adjustment for well-known prognostic factors. We will cite only those that have been studied with meta-analyses or pooled analyses of selected trials, although published data generally do not allow the study of the independent value of the possible prognostic marker. The following features have been suggested to be associated with a more favourable prognosis: p53 normal status ; no EGFR expression ; low microvessel count ; low VEGF expression ; no overexpression of c-erbB-2  with an effect possibly restricted to non-squamous histology ; Bcl-2 expression ; low KI67 expression ; absence of KRAS mutation ; TTF-1 positivity ; high level of p16 expression ; low or no ERCC1 expression (advanced NSCLC treated with platinum-based chemotherapy) ; low class III β-tubulin expression, in resected patients ; low survivin expression, in resected patients only ; and low lymphatic microvessel density, in surgically treated patients . Regarding the prognostic value of angiogenesis, microvessel count was confirmed as prognostic factor in a meta-analysis based on individual data, only if assessed by the Chalkley method .
Numerous studies have looked at the prognostic value of tumor metabolic activity as measured by [F]-fluoro-2-deoxy-d-glucose positron emission tomography. These studies have been meta-analysed and this review has shown that high metabolic activity is indeed an univariate prognostic factor (estimated hazard ratio of 2.08). The independent value remains to be proven and the conclusion holds mainly for limited tumours as few stage IV patients were included in the published studies .
Some prognostic classifications have been published [35, 36], integrating several independent classical prognostic factors but they need to be validated before being used in clinical practice. If validated, they could serve as standard covariate for adjustment in the search of further clinically useful factors. In resected patients, some publications have looked at genetic signatures, most often using small-to-moderate series of patients divided into training and validation sets. They provide however very promising results. For example, on resected patients, Chen et al.  derived a five-gene signature with impressive hazard ratio between low- and high-risk patients: 3.36 for overall survival (95% CI 1.35–8.35; p = 0.009) in the validation series (n = 86). Zhu et al.  published a 15-gene signature with a larger effect in resected patients, independent from stage with an overall HR of 15.02 (95% CI 5.12–44.04) with consistent results in stage I and stage II. Although very interesting and promising, the additional prognostic value should be validated with adjustment for classical prognostic factors. Multiplicity testing and over-fitting may prevent reproducibility of the models in external validation series. Those signatures are not ready for use in clinical practice.
Small cell lung cancer
Small cell lung cancer is a highly chemosensitive tumour but progression-free survival and overall survival remain extremely poor. Long-term survival is rare and cure rate is reached in <5% of the patients . For years, treatment of small cell lung cancer has been guided by the extension of the disease: limited disease (generally defined as a disease limited to the hemithorax of origin, the mediastinum and the supraclavicular lymph nodes which can be encompassed in a radiation field) versus extensive disease. Respective median survival times range within 15–20 and 8–13 months . Recently, within the IASLC Lung Cancer Staging Project, data concerning 12,620 small cell lung cancer cases were collected and complete clinical TNM staging was available for 3,430 cM0 patients as well as complete pathologic TNM staging for 343 cases. On that series, it has been shown that increasing T is associated with progressively lower survival as well as increasing N and increasing stage (6th and 7th editions) although the numbers of patients staged IA, IB, IIA were quite small . The revised staging system was also tested on a larger Surveillance, Epidemiology and End Results (SEER) series of 4,884 patients diagnosed between 1998 and 2000. Median survival times in months were the following: IA: 26; IB: 21; IIA: 15; IIB: 12; IIIA: 13; IIIB: 11; and IV: 6. The authors concluded that the TNM stage should be used to stratify in clinical trials patients with stages I–III. Similarly to non-resected non-small cell lung cancer, performance index is also a reproducible factor . Among other classical factors easily measurable in routine, female sex, younger age, no or low weight loss, low LDH level, normal neutrophil count, normal hemoglobinaemia, as well as normal levels of NSE and CYFRA 21-1 have been mentioned as independent favourable prognostic factors . Some authors also suggested that disease extent could be replaced by several laboratory parameters (albuminaemia, natraemia and level of alkaline phosphatases) . Other parameters from molecular biology like BCL2 expression, p53 normal status or no overexpression of HER2  have been suggested but evidence is less clear. Four different collaborative research groups attempted to construct prognostic classifications making use only of independent prognostic factors . Those prognostic classifications, although including different covariates, were recently validated using external data and can be used in clinical trials for stratification purposes. For example, the one published by the ELCWP has four groups distinguished by Karnofsky performance index, sex, disease extent and neutrophils count. In the validation series, the four groups had respective median survival times of 19, 11, 7 and 6 months .
Development of targeted therapies is evolving rapidly for non-small cell lung cancer. With the term “targeted therapies”, we mean a treatment that is supposed to target a specific characteristic of the tumour. This specific target is expected to be a predictive factor. Most of the research carried out on predictive factors in lung cancer has been devoted to non-small cell lung cancer and we will restrict this review to non-small cell lung cancer.
But the recognition and identification of a predictive factor is not so straightforward and some new drugs have been developed without specifically knowing the target or without having available a method to measure the target with adequate reproducibility. Most of the predictive factors are molecular biological factors but this is not always the case. Indeed, histology which has not been proven to be a strong independent and reproducible prognostic factor, is predictive of the benefit of pemetrexed in non-squamous non-small cell lung cancer, irrespective of the setting; pemetrexed combined with cisplatin versus cisplatin gemcitabine in chemo-naïve patients, maintenance pemetrexed versus placebo and pemetrexed versus docetaxel in second-line treatment. In each of three randomised phase-III studies, a treatment interaction effect with histology has been identified .
EGFR and TKIs
Tyrosine-kinase inhibitors (TKI) targeting EGFR, such as gefitinib and erlotinib, have been first tested in randomised clinical trials without patient selection in addition to chemotherapy, in chemotherapy-naïve or untreated patients [43–45]. They failed to show any benefit of the TKIs, although some clinical factors were suggested to be predictive of benefit: Asian, female sex, non-smoking status, non-squamous histology. The true predictive factor was identified later ; the subgroup of patients who benefit in terms of progression-free survival from TKIs were those with somatic mutations in the EGFR gene (exons 19 and 21). Further studies, either subgroups analyses of the first randomised trials or randomised trials having used of an enrichment design (i.e. a design in which only patients harbouring the predictive characteristic are eligible for the trial) have undoubtedly proven that patients with EGFR mutation benefit from TKIs in terms of progression-free survival although the benefit on overall survival is less clear. So, EGFR has become the first molecular target in advanced non-small cell lung cancer that is definitely of clinical usefulness in routine practice [47–53]; it is now a standard treatment to give patients with EGFR mutation a TKI as part of their first-line treatment although there still remains a role for chemotherapy .
KRAS and TKIs
The KRAS pathway links the EGFR pathway to cell proliferation and survival and KRAS mutations have been suggested as a mediating resistance to EGFR mediators. A retrospective analysis of the BR.21 trial , as well as a meta-analysis, confirmed that presence of KRAS mutation is a negative predictive factor for benefit of TKIs in advanced non-small cell lung (HR of 1.97, 95% CI 1.16–3.33 for KRAS mutated tumours, HR of 0.79, 95% CI 0.59–1.05 for wild-type tumours; p-value for interaction 0.003) .
EML4-ALK and crizotinib
The fusion between echinoderm microtubule-associated protein-like 4 (EML4) and anaplasic lymphoma kinase (ALK) has been recently identified in a subset of non-small cell lung cancers. EML4-ALK is most often found in never-smoking patients with lung cancer. Its expression is mutually exclusive from expression of KRAS and EGFR; it has no prognostic value but it is a predictive factor for efficacy of the ALK inhibitor crizotinib. Early trials with crizotinib led to approval of crizotinib but confirmatory trials are still ongoing [57, 58].
Vascular endothelial growth factor receptors: VEGF and VEGFR-2 were investigated as predictive biomarkers in the BATTLE study (Biomarker-Integrated Approaches of Targeted Therapy for Lung Cancer Elimination). Patients heavily pre-treated were investigated for 11 biomarkers and four different targeted treatments. The predictive value remains to be further investigated .
Predictive factors for chemotherapy activity
Although chemotherapy drugs have not been developed with the hypothesis of the existence of a molecular characteristic to target, some studies have also searched to identify predictive factors that might be useful in the choice of a chemotherapy regimen. These studies are extremely important as chemotherapy remains a cornerstone in the treatment of early or advanced non-small cell lung cancer.
ERCC1 and p27 and adjuvant cisplatin-based chemotherapy in completely resected patients
Adjuvant chemotherapy provides a demonstrated benefit in overall survival when given to resected patients but brings also some toxicities. It was hypothesised that not all patients benefit from adjuvant chemotherapy and some biomarkers have been studied in order to identify subgroups of sensitive patients. Among them, ERCC1 has been tested and it is suggested that patients with low or no ERCC1 expression do benefit from chemotherapy (HR 0.65, 95% CI 0.50–0.86) while those with high ERCC1 expression do not benefit at all (HR 1.14, 95% CI 0.84–1.55) with a significant interaction test showing that chemotherapy effect is indeed not the same across the two subgroups . A recent meta-analysis  also comes to the same conclusion, although through indirect comparisons, that patients treated with cisplatin-based chemotherapy and high ERCC1 expression have worse survival than patients with low expression of ERCC1 (HR 1.61, 95% CI 1.23–2.10) while this is not true when no chemotherapy is given (HR 0.80; 95% CI 0.51–1.31).
A retrospective analysis of the IALT trial suggests that p27 negative characteristic may also be a predictive factor of benefit from cisplatin-based adjuvant chemotherapy .
RRM1 in more advanced non-small cell lung cancer
The predictive role of RRM1 for sensitivity to gemcitabine, an antimetabolite frequently used in combination with platinum has been recently studied in the context of a randomised trial comparing cisplatin, docetaxel and gemcitabine to cisplatin–vinorelbine. Although the analysis was retrospectively done on a subgroup of 261 patients (out of the 443 randomised), the results suggest, surprisingly, that the predictive role of RRM1 is present for sensitivity to cisplatin–vinorelbine with better outcomes observed for RRM1-negative patients (better disease control rate, better progression free survival (6.9 months versus 3.9 months; p<0.001), better overall survival (11.6 months versus 7.4 months; p = 0.002) .
The signature proposed by Zhu et al.  as prognostic might also be predictive of a benefit reached with adjuvant chemotherapy (cisplatin and vinorelbine) in stage IB and II resected patients.
Prognostic factors are very useful to get information about disease evolution and to construct homogeneous groups of patients. They can sometimes guide the therapy and identify subgroups of patients where more aggressive therapy is needed. They can also be used as stratification factors. They are however not powerful enough to be used at the individual level. Further consensus about the adequate methodology to search and identify new prognostic factors is lacking; indeed, we have no agreement on the set of factors that should systematically be used to adjust the effect of new factors and how to assess what independent additional value a new factor brings. For example, genetic signatures that might be very promising are not necessarily validated when adjusted for known classical prognostic factors.
Predictive factors are more directly useful in clinical practice as they are directly related to the efficacy of a specific treatment. A few of them now have a definite place for guiding therapeutic decisions in non-small cell lung cancer and we are on the way to a personalised medicine for the treatment of this disease. However, their development and validation are more difficult and may require very large sample sizes in particular when the incidence of the predictive biomarker is low. Integrating several targets is also a challenge for future research.
- ©ERS 2012