Back to Journals » Clinical Epidemiology » Volume 10

Describing the association between socioeconomic inequalities and cancer survival: methodological guidelines and illustration with population-based data

Authors Belot A, Remontet L, Rachet B, Dejardin O, Charvat H, Bara S, Guizard AV, Roche L, Launoy G, Bossard N

Received 5 September 2017

Accepted for publication 5 February 2018

Published 17 May 2018 Volume 2018:10 Pages 561—573


Checked for plagiarism Yes

Review by Single-blind

Peer reviewers approved by Dr Amy Norman

Peer reviewer comments 3

Editor who approved publication: Professor Vera Ehrenstein

Aurélien Belot,1-3 Laurent Remontet,3,4 Bernard Rachet,1 Olivier Dejardin,5,6 Hadrien Charvat,7 Simona Bara,8 Anne-Valérie Guizard,5,9 Laurent Roche,3,4 Guy Launoy,5,6 Nadine Bossard3,4

1Cancer Survival Group, Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom; 2Non-Communicable Diseases and Trauma Direction, The French Public Health Agency, Saint-Maurice, France; 3Department of Biostatistics and Bioinformatics, Hospices Civils de Lyon, Lyon, France; 4UMR 5558, Biometry and Evolutionary Biology Laboratory, Biostatistics Health Group, CNRS, University Lyon 1, Lyon, France; 5National Institute of Health and Medical Research U1086 ANTICIPE, Caen, France; 6Calvados Digestive Cancer Registry, Centre Hospitalier Universitaire, Caen, France; 7Prevention Division, Center for Public Health Sciences, National Cancer Center, Tokyo, Japan; 8Manche General Cancer Registry, Centre Hospitalier Public du Cotentin, Cherbourg-en-Cotentin, France; 9Calvados General Cancer Registry, Centre François Baclesse, Caen, France

Background: Describing the relationship between socioeconomic inequalities and cancer survival is important but methodologically challenging. We propose guidelines for addressing these challenges and illustrate their implementation on French population-based data.
Methods: We analyzed 17 cancers. Socioeconomic deprivation was measured by an ecological measure, the European Deprivation Index (EDI). The Excess Mortality Hazard (EMH), ie, the mortality hazard among cancer patients after accounting for other causes of death, was modeled using a flexible parametric model, allowing for nonlinear and/or time-dependent association between the EDI and the EMH. The model included a cluster-specific random effect to deal with the hierarchical structure of the data.
Results: We reported the conventional age-standardized net survival (ASNS) and described the changes of the EMH over the time since diagnosis at different levels of deprivation. We illustrated nonlinear and/or time-dependent associations between the EDI and the EMH by plotting the excess hazard ratio according to EDI values at different times after diagnosis. The median excess hazard ratio quantified the general contextual effect. Lip–oral cavity–pharynx cancer in men showed the widest deprivation gap, with 5-year ASNS at 41% and 29% for deprivation quintiles 1 and 5, respectively, and we found a nonlinear association between the EDI and the EMH. The EDI accounted for a substantial part of the general contextual effect on the EMH. The association between the EDI and the EMH was time dependent in stomach and pancreas cancers in men and in cervix cancer.
Conclusion: The methodological guidelines proved efficient in describing the way socioeconomic inequalities influence cancer survival. Their use would allow comparisons between different health care systems.

Keywords: cancer net survival, socioeconomic inequalities, European Deprivation Index, excess mortality hazard, flexible parametric model


Assessing the relationship between socioeconomic deprivation and cancer survival is important as socioeconomic differences in cancer survival are still observed even in countries with universal health care coverages.16 Describing this relationship at the population level calls for population-based cancer registry data, but the way of performing the analysis is challenging. Indeed, several methodological conditions should be met: 1) the use of a relevant measure of deprivation, usually ecological (ie, defined at an area level) as individual level of deprivation is not routinely collected in population-based data; 2) the use of a relevant mortality indicator such as the excess disease-specific mortality among cancer patients vs noncancer subjects; 3) the use of a regression model able to deal first with nonlinear functional forms of the association between continuous prognostic factors and the excess mortality (eg, between the deprivation index and the excess mortality throughout the range of the deprivation index values) and then deal with time-dependent associations (eg, nonconstant association between the deprivation index and the excess mortality over the time elapsed since diagnosis); 4) the use of an appropriate method for statistical inference that accounts for the statistical dependency between patients who share similar characteristics because they live in the same area across which the ecological deprivation index is defined (this is especially important when the interest lies in estimating regression parameters associated with the ecological deprivation variable);79 and 5) the use of a measure that summarizes the “importance” of the cluster level on the Excess Mortality Hazard (EMH).

Several recent ecological population-based studies addressing the question of the association between social deprivation and cancer survival were found in the literature.46,1012 Disease-specific mortality was generally the outcome of interest, usually estimated using an excess mortality approach, while cause-specific mortality approach was also used (using the cause of death). Statistical methods relied on either nonparametric or parametric regression modeling approaches. When modeling approaches were used, the adopted deprivation index was considered as a categorical variable (quintiles). When regression models included the age at diagnosis as a prognostic factor, it was most of the time considered as a categorical variable. Finally, some studies explored the time-dependent associations between the variables and the hazard but it is not the rule, and none had taken into account the hierarchical structure of the data.

The present work proposes methodological guidelines for addressing the above-mentioned challenges (1–5) and illustrates their use through an investigation of the association between socioeconomic deprivation and cancer survival from solid tumor cancers up to 10 years after diagnosis. It is worth noticing that, although the article uses the term “effect” in a few places, no causal association is implied.

Materials and methods


We used population-based cancer registry data that cover two contiguous Départements [French administrative areas] of West France (Calvados and Manche, nearly 1.1 million inhabitants). The quality and exhaustiveness of the included registries are certified every 4 years by an audit of the National Institute of Health and Medical Research (INSERM), the “Santé Publique France” agency, and the French National Cancer Institute. The incidence data from those registries are regularly included in the “Cancer Incidence in Five Continents” monograph series of the International Agency of Research on Cancer, where their quality and exhaustiveness are also assessed.

We analyzed cancer cases diagnosed between 1997 and 2010 in people aged >15 years at diagnosis. The follow-up of all cases ended on June 30, 2013. The 17 cancers under study are displayed in Tables S1 and S2.

The data from these registries are not publicly available. We analyzed these data under the ethical approval obtained by each registry from the French institute “Commission Nationale de l’Informatique et des Libertés” (“998018” for the Calvados digestive cancer registry, “981001 V1” for the Calvados general cancer registry, and “912669” for the Manche cancer registry).

The measure of social deprivation

Because individual levels of deprivation are not routinely collected, ecological measures defined at area levels have been proposed.13,14 These measures are considered as good proxies of individual deprivation in relatively small areas15 and measure additionally the patients’ social and economic environment (“contextual variables”).7,16,17

The European Deprivation Index (EDI) was developed using information from the European Union Statistics on Income and Living Conditions (EU-SILC) survey as well as other country-specific information.18 The ultimate goal of this index is to have in each European country an ecological deprivation index based on (country-specific) census variables using the same methodological approach for its construction while accounting for cultural and social specificities of each European country. The approach relies on the concept of relative deprivation, first proposed by the sociologist Peter Townsend.19 Deprivation refers to unmet fundamental needs caused by the lack of resources of all kinds (not only financial), those fundamental needs differing between societies (thus “relative” as it refers to deprivation specifically for a given society). Individuals can be said deprived when they lack the resources to obtain those types of needs (diet, type of living conditions, amenities, or services), which are obtained by the majority of people in the societies to which they belong to.

The EU-SILC is organized every year in every country of the EU-28. Based on a representative panel of European household, individuals answer some detailed questions on their living condition in each country. The construction of the EDI can be summarized as follows: First, fundamental needs are identified for each European country using the EU-SILC data. Among them, those associated with both objective poverty and subjective poverty are used to build a deprivation indicator at the individual level. Then, after identifying which variables are available at both the individual level (EU-SILC) and the area level (census), the area-level variables that are best correlated with the deprivation indicator built in the previous step are used to finally construct the area-based deprivation index. Details of concepts and construction methods are available in the previous methodologic papers.18,20

In France, this EDI is assigned to each IRIS (Îlot Regroupé pour l’Information Statistique, a geographical area of nearly 2000 individuals); it was then assigned to each patient from a given IRIS. The correspondence between a patient and an IRIS was determined according to the patient’s address at the time of diagnosis. This used a Geographic Information System software (ArcGIS 10.2) and a street map database (BD TOPO premium). In this work, we used the EU-SILC from 2006 to derive the EDI, which ranges for France from −17.3 to 51.1 (quintile 1 [Q1]: [−17.3;−2.9], Q2: [−2.9;−1.4], Q3: [−1.4;0], Q4: [0;2.1], and Q5: [2.1;51.1]).

The Excess Mortality Hazard

A relevant disease-specific mortality indicator is needed. Cancer-specific mortality using the cause of death is very popular but hardly usable in our context. Actually, the cause of death may be inaccurate or unreliable, especially for long-term studies, because it may be diversely coded over time and between regions. Besides, attributing a single cause of death to elderly people is debatable.21 Alternative approaches called “EMH methods” have been then developed;2226 these do not require the knowledge of the cause of death.2730 The basic idea of EMH methods is comparing the mortality between cancer patients and noncancer subjects with the same sex, age, and other main characteristics. The mortality of cancer-free subjects, called “expected mortality”, is assumed to be correctly given by the general-population mortality, which is a known value. The EMH is then estimated by subtracting the expected mortality from the mortality of cancer patients; it provides the excess mortality due (directly or indirectly) to cancer at any time after diagnosis. For the expected mortality hazards, we used the French population mortality rates by sex, age (0–99 years), Département [French administrative area], and calendar year (1997–2013) as provided by the French Institut National de la Statistique et des Études Économiques.

From the EMH, we derived directly the net survival, using the classical relationship between hazard and survival. Net survival is then the probability of survival of cancer patients if the cancer under study was the only cause of death. In population-based studies, this key indicator allows comparisons between countries or periods and is not affected by differences in mortalities from other causes.31

Regression modeling of the EMH

In cancer patients, the relationship between a prognostic factor (such as EDI) and EMH may be complex.27,32,33 A multivariable regression model has to consider these complex relationships using flexible functions.22,34 We defined a “full model” that modeled the EMH (on a log scale) as a function of time, age at diagnosis, year of diagnosis, and EDI, with these last three variables having time-dependent coefficients and nonlinear functional forms (thus leading to time-dependent and nonlinear log excess hazard ratios [EHRs], as denoted hereafter).

In addition, the EDI being an ecological variable and the individuals living in a given area sharing similar characteristics (including the EDI variable), a specific statistical method should allow dealing with the hierarchical structure of the data (ie, multilevel data with dependence between individuals at each level).79 This was done by including a normally distributed random effect at the IRIS level.34

Thus, in formula, the “full model” for the EMH λ+ is written as follows:

where λ0(t) is the baseline hazard, a the age at diagnosis, y the year of diagnosis, i the EDI, and w the random effect defined at the IRIS level (with mean 0 and standard deviation σ). The logarithm of the baseline hazard and the functions h, k, n were modeled with quadratic B-splines with knots located at 1 and 5 years, and the nonlinear functional forms g, j, m were modeled using quadratic splines with one knot (located at 70 years for age at diagnosis, at 2000 for the year of diagnosis, and at 0 for the EDI).

Finally, because the estimated standard deviation of the random effect per se is difficult to interpret, we summarized the “importance” of the cluster level on the EMH using the median excess hazard ratio (MEHR).35 This value reflects the influence of the cluster context as a whole, thus measuring the “general contextual effect”.17,35 The MEHR corresponds to the median relative change in the EMH when comparing identical subjects from two randomly selected different clusters that are ordered by risk.35

The analysis was separately conducted in men and women and used the iterative model-building strategy recommended by Wynant and Abrahamowicz.36 Starting with the “full model”, this strategy eliminates spurious time-dependent and nonlinear EHR functions of the three variables using the likelihood ratio test and 0.05 as significance threshold. This led to retain a final model for each sex-cancer couple. However, unlike the original proposal,36 we kept by default the simplest EHR (ie, linear and time-constant) for each of the three variables.

To implement the advocated statistical methods, we developed a specific package named mexhaz (version 1.1), which runs on R software (version 3.2.0). Both the software and the package may be freely downloaded from the CRAN repository (

Indicators produced

For each sex-cancer couple, we predicted from the final model the age-standardized net survival (ASNS) at 1, 5, and 10 years after diagnosis per deprivation quintile of the French population using the International Cancer Survival Standard weights.37 We used the delta method to derive the 95% confidence intervals (CIs) for the ASNSs assuming the normality of the log of the cumulative excess hazard.

The change in the EMH over the time elapsed since diagnosis was illustrated for three values of age and three values of the EDI: the 10th, 50th, and 90th percentiles of each variable distribution observed in each sex-cancer couple.

When the EDI was retained in the final model with a time-constant coefficient and a linear functional form, we reported the EHR for 1-unit increase of the EDI with its 95% CI. When the EDI was retained in the final model with time-constant coefficients and with a nonlinear form, we plotted the EHR vs the EDI values. When the EDI was retained in the final model with the time-dependent coefficient, we plotted the EHR vs the EDI values at various times after diagnosis. Because the sample size was usually small in this work, we focused on the effect size and its pattern rather than on the statistical significance in interpreting differences in function of the EDI.

For each sex-cancer couple, we calculated the MEHR with and without adjustment on the EDI from the final model to compare the general contextual effect on the EMH.


Data description

Tables S1 and S2 display the number of cases and deaths over 10 years after diagnosis. The highest numbers of deaths were found in deprivation quintiles Q4 and Q5 that group the most deprived people. These deaths represent almost 50% of all events in most cancers (Tables S1 and S2). A few sex-cancer couples were not analyzed because of the low number of deaths (<300 in each of esophagus, liver, and larynx cancers in women; breast cancer in men; and thyroid cancer in men and women).

Deprivation (EDI)

A constant-in-time EHR of the EDI with a linear functional form was retained in most cancer sites, except lip–oral cavity–pharynx (LOCP; nonlinear EHR in both sexes), stomach (time-dependent EHR), pancreas (nonlinear and time-dependent EHR) in men, and cervix uteri (nonlinear and time-dependent EHR; Table S3).

Five-year ASNS according to EDI

In men, a substantial difference in 5-year ASNS was seen between deprivation quintiles Q1 (the least deprived) and Q5 (the most deprived) regarding LOCP cancers (41%; 95% CI: [38;43] vs 29% [27;31]). A similar difference was seen regarding skin melanoma (87% [84;89] vs 76% [72;80]; Table 1). The difference in 5-year ASNS between Q1 and Q5 was nearly 7% for colon–rectum and bladder cancers, 6% for kidney, 5% for prostate cancer, 4% for lung and liver cancers, 3% for stomach and larynx cancers, and ≤2% for esophagus and pancreas cancers. Tables S4 and S5 show the results of 1- and 10-year ASNS by deprivation quintile. For pancreas cancer, the absence of the impact of EDI on the 5-year ASNS contrasts greatly with the substantial difference observed in 1-year ASNS (36% [33;40] in Q1 vs 25% [22;28] in Q5) (Table S4). This is due to a special time-dependent EHR of the EDI that we explain later.

Table 1 Age-standardized 5-year net survivals by cancer and deprivation quintiles (Q1–Q5, from the less to the more deprived) and EHRs for 1-unit increase of the EDI, in men and women, with their 95% confidence intervals

Notes: The ICSS weights were used for standardization. The EHR estimates are adjusted on age at diagnosis and year of diagnosis and based on the final model retained (see Materials and methods section) for cancer with a time-constant coefficient and a linear functional form of the EDI. For cancers with an NL functional form and/or a TD coefficient of the EDI, the EHRs are reported in Figures 3 and 4. The bounds of the deprivation quintiles here correspond to those of the deprivation quintiles observed in the general French population. *Due to sparse data, we merged age groups ([15–45] and [45–55] into a single group [15–55] [adding up the corresponding ICSS weights]) to calculate the age-standardized net survival.

Abbreviations: CNS, central nervous system; EDI, European Deprivation Index; EHR, excess hazard ratio; ICSS, International Cancer Survival Standard; LOCP, lip–oral cavity–pharynx; NA, not available; NL, nonlinear; TD, time-dependent.

In women, as in men, a substantial difference in 5-year ASNS was observed between Q1 and Q5 regarding LOCP cancers, 55% [49;62] vs 43% [39;47], with a higher predicted ASNS for Q2 (60% [56;64]) than Q1. The difference in 5-year ASNS between Q1 and Q5 was around 6% for bladder cancer; 4% for breast and melanoma cancers; 3% for colon–rectum, lung, and ovary cancers; and ≤2% for CNS, pancreas, and stomach cancers. In contrast, for cervix uteri, the 5-year ASNS was lower in the less deprived women in comparison with the most deprived: 55% [51;59] for Q1 vs 64%–66% for the other quintiles. This pattern was also found for 1- and 10-year ASNS (Tables S4 and S5) for that cancer site.

Changes over time since diagnosis of the EMH according to EDI and age, and complementary illustrations of the relationship between EDI and EMH

The changes of the EMH over the time elapsed since diagnosis are given for three values of age and three values of EDI (the 10th, 50th, and 90th percentiles) for 1) LOCP in men, LOCP in women, and melanoma in men (Figure 1); 2) pancreas and stomach in men and cervix uteri (Figure 2); and 3) all other cancer sites (Figures S1–S4). Marked differences were seen by age at diagnosis; the EMHs were higher in old than in young patients, especially during the first year(s) after diagnosis. Changes of the EMH over time since diagnosis illustrate how and when the EDI impact takes place across the follow-up at specific ages at diagnosis and complement the previously given net survival results. For example, the graphs allow illustrating the strong association between the EDI and the EMH for LOCP cancer in both sexes: the curve is always higher in deprived people. A quick look at the graphs might give the false impression that the EHR of the EDI depends on time (see the middle box in Figure 1 where the curves are not parallel for LOCP in women aged 61.5 years). This is because the hazards are proportional on the log-scale, whereas the graphs use an arithmetic scale. On an arithmetic scale, a value of a time-constant EHR of 2, for example, will display a larger difference between hazards when the baseline hazard is high rather than low.

Figure 1 Changes over time since diagnosis of the excess mortality hazard for the 10th, 50th, and 90th percentiles of the age distribution (left, middle, and right column, respectively) and for the 10th, 50th, and 90th percentiles of the EDI distribution (curves with black circles, red triangles, and green crosses, respectively) regarding LOCP in men and women, and melanoma in men; patients diagnosed in 2010.

Abbreviations: EDI, European Deprivation Index; LOCP, lip–oral cavity–pharynx.

Figure 2 Changes over time since diagnosis of the excess mortality hazard for the 10th, 50th, and 90th percentiles of the age distribution (left, middle, and right column, respectively) and for the 10th, 50th, and 90th percentiles of the EDI distribution (curves with black circles, red triangles, and green crosses, respectively) regarding stomach and pancreas cancers in men, and cervix uteri; patients diagnosed in 2010.

Abbreviation: EDI, European Deprivation Index.

Figure 3 Excess hazard ratio of the EDI on lip–oral cavity–pharynx cancer in men (A) and women (B) with 95% confidence intervals (shaded area).

Notes: We limited the EDI values on the x-axis to the 5th and 95th percentiles of the observed EDI distribution in the sex-cancer couple. Rug plots indicate the locations of the observed EDI values.

Abbreviation: EDI, European Deprivation Index.

For LOCP cancers in both sexes, the model-building strategy retained a nonlinear functional form (though time-constant) for the log-EHR of the EDI (Table S4). In men, the EHR increased according to EDI values but then plateaued in the more deprived people (Figure 3A); however, in women, a plateau is seen in both the least and the most deprived people (Figure 3B).

We also observed a substantial association between the EDI and the EMH in melanoma in men (bottom plots of Figure 1): the retained EHR of EDI was constant in time with a linear functional form (Table S3).

For stomach cancer in men, we observed higher EMHs in deprived patients starting from 5 years after diagnosis (Figure 2, upper plots), and thus, weak differences between Q1 and Q5 regarding 1- and 5-year ASNS (Table 1 and Table S4) but a substantial difference (11%) regarding 10-year ASNS (Table S5). Figure 4A shows the time-dependent EHR of the EDI in stomach cancer, especially a substantial impact in late follow-up (5 years), even if the EMHs are quite low after 5 years (Figure 2).

Figure 4 Excess hazard ratio of the EDI at different times after diagnosis for stomach (A) and pancreas (B) cancers in men and cervix uteri (C) cancer in women.

Notes: We limited the EDI values on the x-axis to the 5th and 95th percentiles of the observed EDI distribution in the sex-cancer couple. Rug plots indicate the locations of the observed EDI values.

Abbreviation: EDI, European Deprivation Index.

For pancreas cancer in men, we observed a very complex pattern associated with the EDI, especially a lower EMH within the first year in the less deprived patients vs other patients and, in contrast, a lower EMH between years 1 and 4 in deprived vs less deprived patients. Therefore, the impact of deprivation on net survival was high over the first year after diagnosis and resulted in a substantial difference in 1-year ASNS between the less deprived to the other patients (Table S4). This difference shrunk at 5 years because of the reverse association observed after 1 year (Figure 4B). At 6 months, the EHR is <1 at small EDI values (ie, in the less deprived patients) and ~1 at other values. At 3 years, the EHR is slightly >1 in the less deprived patients and slightly <1 in the more deprived (Figure 4B). At 5 years, the EHR should be interpreted with caution because the prognosis of pancreas cancer at 5 years is rather poor, and thus, the number of patients still at risk is rather low.

Finally, for cervix uteri cancer, the EMH was higher in the less deprived people than in people with a median EDI whatever the time since diagnosis and the age at diagnosis (Figure 2). Therefore, the 1-, 5-, and 10-year ASNS was lower in the less deprived people than in others (Tables 1, S4, and S5). This corresponds to very complex nonlinear and time-dependent EHRs of the EDI (Figure 4C), the main information relying on the U-shape of the curves (ie, EHRs >1 were observed in the least and the most deprived people).

General contextual effect

The MEHRs with and without adjustment on the EDI are given in Table S6. For LOCP cancers in men, the median increase in the EMH between similar patients from IRIS with a high vs a low excess mortality was 25.5% before adjustment on the EDI (MEHR=1.255) and 21.4% after adjustment (MEHR=1.214).35 The figures in women were also substantial: the median increase in the EMH was 23.8% before adjustment vs 8.1% after adjustment. This reveals an important general contextual effect for LOCP cancers; the EDI seems to explain an important part of EMH variability between IRIS, especially in women. We also observed an important decrease (before vs after adjustment on the EDI) of the MEHR for prostate, melanoma, and pancreas cancers in men (Table S6).


In an international context of increasing socioeconomic inequalities,38 describing and quantifying the association between socioeconomic inequalities and the excess cancer-related mortality hazard is important. Here, we used a strategy able to deal with specific methodological requirements: the use of a relevant measure of deprivation and a relevant mortality indicator (the EMH) estimated using a flexible regression model able to deal with nonlinear and time-dependent associations. The approach should account for the fact that individuals within a cluster share similar characteristics and should also allow to summarize the “importance” of the cluster level on the EMH. We applied this approach to 17 solid tumors diagnosed in a specific area of France and followed up over 10 years after diagnosis to investigate the change over time of the excess mortality by age and socioeconomic level. We summarize the recommendations we believe important to describe the association between socioeconomic deprivation and the EMH (Table 2).

Table 2 Summary of the guidelines for describing the association between socioeconomic inequalities and cancer survival

Abbreviation: EDI, European Deprivation Index.

Using population-based cancer registry data ensures depicting the full picture of cancer survival inequalities. For decades, the notifications of cancer cases come from many different sources (public and private pathology laboratories and hospital discharge databases as well as databases of the National Health System). Even if the number of data sources has dramatically increased since 1997, it was to collect further information on cancer cases such as treatment, thus not affecting the core of the cancer registry data and their exhaustiveness. For these reasons, we do not suspect any differential ascertainment over the study time period nor between areas of residence or individual and area-level socioeconomic determinants. We used the EDI to quantify the deprivation as this index was built to be reproducible in European countries.18 We assumed that 1) the EDI assigned to each IRIS remains constant from 1997 to 2010, and 2) the patient’s deprivation corresponds to the EDI measured at the time of diagnosis (no misclassification). We considered these assumptions reasonable because 1) the crude level of the EDI has little significance per se: it is more the ranking of each IRIS across the overall distribution which is of interest and this ranking is less influenced by time, and 2) the number of patients moving after the diagnosis of cancer, which can be seen as a misclassification problematic, should be low for different reasons (access to cancer treatment, preservation of social network, etc). Bryère et al showed that the bias of such misclassification on the association between deprivation and cancer incidence was minimal in their study context.39 However, more research should be conducted in the context of deprivation and cancer survival.

We recommend using flexible parametric regression models and underline the importance of examining the changes of the EMH over time since diagnosis together with the net survival (Figures 1 and 2) and the EHRs (Figures 3 and 4); this ensures relevant and complementary clinical information.4043 Indeed, at a given time t, the probability of net survival is a cumulative measure up to time t, whereas the EMH gives an instantaneous picture of what happens specifically at time t. It quantifies the instantaneous rate at which subjects experience an excess death (given they survived up to t) and, being a rate, the EMH may be >1. When the EMH is low (say <0.1) and practically constant over the year, its value is very close to the annual probability of death from the disease. With higher values, a back-transformation on the probability scale (using the classical relationship between hazard and survival) may be advantageous for clinical interpretation because it provides a conditional probability.

Caution should be taken when interpreting the changes of the EMH over time because its decrease in a population (“marginal” EMH) could be due either to true decreases of individual EMHs or to a “selection effect” over time.44 For example, when a population includes a mix of 1) patients with localized cancer stages and low and constant-in-time EMH and 2) patients with advanced stages and high and constant-in-time EMH, the analysis of this population as a whole (in the absence of information on stage) will estimate a “marginal” EMH that will decrease with time. The more “frail” individuals (with the higher hazards) will die early, whereas the more “robust” individuals (with the lower hazards) will stay at risk (are “not selected to die”): the marginal EMH will then decrease and approach the EMH of the more robust subjects.44 Nevertheless, the possibility to estimate and depict those quantities (EMH and EHR) over time using flexible functional forms is an important advantage of our proposed methodology compared to using a simpler model with either shape-restricted baseline hazard (such as monotonic for the Weibull distribution) or assuming proportional hazard ratio. As an illustration, we fitted a simple model without a random effect and assumed a Weibull distribution and linear and proportional hazard ratios for each prognostic factor. We applied this simple model to the LOCP cancer in men and in women, and to pancreas cancer in men. In LOCP cancers, using this simple model would not allow to identify the plateau of the EHR for the most deprived men nor both plateaus for the less deprived and the most deprived women (Figure 3). From this simple model, the estimated EHR comparing women with EDI=4 to women with EDI=0 was 1.21 [1.09;1.34] compared to 1.50 [1.19;1.90] with our approach. Neglecting the time-dependent effect of the EDI for pancreas cancers with the simple model would also lead to a substantial oversimplification, showing no evidence of an association between the EDI and the EMH (EHR for 1- unit increase of the EDI=1.00 [0.98;1.022]), compared to the complex time-varying association found with our approach (Figure 4B).

We advocated the use of a model-building strategy to eliminate spurious time-dependent and nonlinear EHR functions from a flexible regression model. We used the one proposed by Wynant and Abrahamowicz,36 but an alternative model-building strategy could be used, such as the one proposed by Royston and Sauerbrei.45 However, the development of algorithms for model building is still an active area of statistical research, and studies comparing the ability of model-building strategies to eliminate spurious time-dependent and nonlinear EHR functions would be useful for giving advice to the analyst. Whatever the choice of the model-building strategy, fitting regression models requires observing enough events for providing reliable estimates, and this may be an issue in small sample studies or when studying cancer with a very good prognosis. In our work, we did not analyze some sex-cancer couples because of insufficient observed events for fitting the “full model” based on the “rule” of observing at least 10 events per parameters,46 even though this “rule” is still debatable.47

We evidenced an association (linear and constant-in-time) between the EDI and the EMH in colon–rectum, lung, melanoma, and prostate cancers in men as in breast cancer in women, with lower survivals in the most deprived. We also found a substantial deprivation gap in LOCP cancers in both sexes with >10% differences in 5-year ASNS between deprivation Q1 and Q5. The main drivers of LOCP cancer are alcohol and tobacco consumptions, and both are associated with other comorbidities; this limits the treatment possibilities and leads to poor prognoses. In France, the prevalence of tobacco smokers in men or women is generally higher in deprived than in most affluent people though women with management responsibilities seem more prone to smoking than others.48,49 Regarding alcohol consumption, the picture is more complex and differs with sex: excessive alcohol consumption is more frequent among women with management responsibilities vs other women but affects both extremes of the deprivation scale in men.48 In addition, the probability of alcohol avoidance is quite high among deprived people.48 These observations are in line with the patterns of the EHR of the EDI (Figure 3A and B).

For stomach cancer in men, deprived patients were found exposed to a higher excess mortality at 5 years after diagnosis vs less deprived patients, whereas the EDI plays no role at 1 or 3 years after diagnosis (Figure 4A). This may be due to 1) more comorbidities among deprived patients that may preclude the recourse to the best treatment strategies and lead to higher risks of relapse in the long term and/or 2) lower patient adherence to cancer follow-up among deprived patients. For cervix uteri, we showed a higher excess mortality among the less deprived patients (Figure 4C): it may be linked to a higher participation to cervical screening among the less deprived subjects,50 which would eliminate a higher number of curable precancerous lesions in affluent than in deprived people.

The interpretation of such relationships would benefit from additional information on cancer stage at diagnosis and comorbidities. Such data were not available for the present study but French registries have started the systematic collection of stage at diagnosis. Another limitation of the study is the lack of deprivation-specific expected mortality rates in France. Therefore, the use of the general-population mortality as expected mortality rate overestimates the excess hazard in the more deprived people (because their expected mortality is usually higher than the “average” mortality in the general population) and underestimates it in the less deprived ones. This may lead to amplify the impact of the EDI and highlight the urgent need to produce deprivation-specific life tables in France.

In the present article, we predicted the ASNSs from the fitted regression model and obtained the ASNSs even in case of sparse data because model-predicted NSs can be obtained after the date of the last event in a specific stratum (which is another advantage of using our proposed methodology compared to using only nonparametric estimates). However, these predictions rely on the assumption that the regression model is correctly specified. For each sex-cancer couple, we checked the goodness of fit of the model by comparing the model-based ASNS with the nonparametric ASNS as given by the Pohar-Perme estimator24 for each deprivation quintile and each period of diagnosis ([1997–2000], [2001–2005], [2006–2010], and all periods combined). Comparing the 5-year ASNSs showed the good accuracy of model-based NS prediction (Figure S5).

Quantities that measure between- and within-cluster variability may help interpreting the results. We extended the median hazard ratio proposed by Austin and Merlo35 to our context of EMH to reach a better understanding of the impact of a within-IRIS clustering on the EMH. According to these authors, one would additionally compare the MEHR to the estimated EHR of each prognostic factor. However, in our final explanatory model, we rarely retained a single parameter for each prognostic factor, which makes impossible such a comparison. So, though the MEHR has the merit of simplicity, an interesting perspective would be to extend the approach proposed by Oliveira et al.51 These authors derived an intraclass correlation coefficient for time-to-event regression models with a random effect (frailty). As in a linear model, this coefficient is defined as a ratio of variance components, which allows interpreting the coefficient as the proportion of the total variance due to the between-IRIS variability.52 However, the approach proposed by Oliveira et al suits their specific models that include closed forms of marginal variance, which leads to closed forms of intraclass correlation.51 A future work would check whether their approach may be applied to our model.

Evaluating the interactions between prognostic factors is a further important step when describing the association between deprivation and cancer survival. For example, the interactions allow checking whether the EHR of the EDI is the same whatever the age at diagnosis. In an exploratory analysis that used Royston and Sauerbrei’s methodology45 to study interaction, our preliminary results suggested that such interactions do exist for some cancers (results not shown). These results still need a validation of a robust statistical approach to test the interactions. Another important research area would be to extend this analysis by including socioeconomic measures defined at both the individual level and the area level. Indeed, with the EDI being an ecological variable,53 the estimated effect of deprivation actually combines individual and contextual effects. Adjusting for both subject- and area-specific measures would allow disentangling individual from contextual effects of deprivation.16,5456

International comparisons of the association between socioeconomic deprivation and cancer survival are useful to understand differences between health care systems. Several studies have already reported poorer prognoses in deprived vs less deprived cancer patients.2,46,57 However, comparing the results is difficult because of distinct study designs, statistical analysis methods, and deprivation indexes. We hope the proposed approach will provide a methodological basis for such explorations. The use of the present approach with the EDI in other European countries20 will ease comparisons between European health care systems.


We thank the IRESP (Institut de Recherche en Santé Publique) for supporting the study (grant for the ANGEFLEX study, Convention AAR2013-13 “Soutien à la recherche statistique et mathématique appliquée à la cancérologie”). This work was also partly supported by Cancer Research UK grant number C7923/A18525. The authors thank Jean Iwaz (Hospices Civils de Lyon) for the revision of the final draft of this manuscript.

Author contributions

AB, LR, BR, OD, GL, and NB developed the concept and the design of the study. AB analyzed the data and drafted the manuscript. All authors interpreted the data, drafted the manuscript, revised it critically, and read and approved its final version.


The authors report no conflicts of interest in this work.



Kogevinas M, Porta M. Socioeconomic differences in cancer survival: a review of the evidence. In: Kogevinas M, Pearce N, Susser M, Boffetta P, editors. Social Inequalities and Cancer. (IARC Scientific Publications No. 138). Lyon: International Agency for Research on Cancer; 1997:177–206.


Coleman MP, Rachet B, Woods LM, et al. Trends and socioeconomic inequalities in cancer survival in England and Wales up to 2001. Br J Cancer. 2004;90(7):1367–1373.


Woods LM, Rachet B, Coleman MP. Origins of socio-economic inequalities in cancer survival: a review. Ann Oncol. 2006;17(1):5–19.


Ito Y, Nakaya T, Nakayama T, et al. Socioeconomic inequalities in cancer survival: a population-based study of adult patients diagnosed in Osaka, Japan, during the period 1993–2004. Acta Oncol. 2014;53(10):1423–1433.


Jansen L, Eberle A, Emrich K, et al; GEKID Cancer Survival Working Group. Socioeconomic deprivation and cancer survival in Germany: an ecological analysis in 200 districts in Germany. Int J Cancer. 2014;134(12):2951–2960.


Stanbury JF, Baade PD, Yu Y, Yu XQ. Cancer survival in New South Wales, Australia: socioeconomic disparities remain despite overall improvements. BMC Cancer. 2016;16:48.


Subramanian SV. The relevance of multilevel statistical methods for identifying causal neighborhood effects. Soc Sci Med. 2004;58(10):1961–1967.


Duchateau L, Janssen P. The Frailty Model. New York, NY: Springer; 2008.


Wienke A. Frailty Models in Survival Analysis. Boca Raton, FL: CRC Press; 2011.


Antunes L, Mendonca D, Bento MJ, Rachet B. No inequalities in survival from colorectal cancer by education and socioeconomic deprivation – a population-based study in the North Region of Portugal, 2000–2002. BMC Cancer. 2016;16:608.


Kish JK, Yu M, Percy-Laurry A, Altekruse SF. Racial and ethnic disparities in cancer survival by neighborhood socioeconomic status in Surveillance, Epidemiology, and End Results (SEER) Registries. J Natl Cancer Inst Monogr. 2014;2014(49):236–243.


Dialla PO, Arveux P, Ouedraogo S, et al. Age-related socio-economic and geographic disparities in breast cancer stage at diagnosis: a population-based study. Eur J Public Health. 2015;25(6):966–972.


Townsend P, Phillimore P, Beattie A. Health and Deprivation: Inequality and the North. London: Croom Helm; 1988.


Carstairs V, Morris R. Deprivation: explaining differences in mortality between Scotland and England and Wales. BMJ. 1989;299(6704):886–889.


Woods LM, Rachet B, Coleman MP. Choice of geographic unit influences socioeconomic inequalities in breast cancer survival. Br J Cancer. 2005;92(7):1279–1282.


Diez Roux AV. Investigating neighborhood and area effects on health. Am J Public Health. 2001;91(11):1783–1789.


Diez Roux AV. A glossary for multilevel analysis. J Epidemiol Community Health. 2002;56(8):588–594.


Pornet C, Delpierre C, Dejardin O, et al. Construction of an adaptable European transnational ecological deprivation index: the French version. J Epidemiol Community Health. 2012;66(11):982–989.


Townsend P. Deprivation. J Soc Policy. 1987;16(2):125–146.


Guillaume E, Pornet C, Dejardin O, et al. Development of a cross-cultural deprivation index in five European countries. J Epidemiol Community Health. 2016;70(5):493–499.


Moreno-Betancur M, Sadaoui H, Piffaretti C, Rey G. Survival analysis with multiple causes of death: extending the competing risks model. Epidemiology. 2017;28(1):12–19.


Remontet L, Bossard N, Belot A, Estève J; French network of cancer registries FRANCIM. An overall strategy based on regression models to estimate relative survival and model the effects of prognostic factors in cancer survival studies. Stat Med. 2007;26(10):2214–2228.


Estève J, Benhamou E, Croasdale M, Raymond L. Relative survival and the estimation of net survival: elements for further discussion. Stat Med. 1990;9(5):529–538.


Pohar MP, Stare J, Estève J. On estimation in relative survival. Biometrics. 2012;68(1):113–120.


Danieli C, Remontet L, Bossard N, Roche L, Belot A. Estimating net survival: the importance of allowing for informative censoring. Stat Med. 2012;31(8):775–786.


Roche L, Danieli C, Belot A, et al. Cancer net survival on registry data: use of the new unbiased Pohar-Perme estimator and magnitude of the bias with the classical methods. Int J Cancer. 2013;132(10):2359–2369.


Bossard N, Velten M, Remontet L, et al. Survival of cancer patients in France: a population-based study from the Association of French Cancer Registries (FRANCIM). Eur J Cancer. 2007;43(1):149–160.


Mariotto AB, Noone AM, Howlader N, et al. Cancer survival: an overview of measures, uses, and interpretation. J Natl Cancer Inst Monogr. 2014;2014(49):145–186.


De Angelis R, Sant M, Coleman MP, et al. Cancer survival in Europe 1999-2007 by country and age: results of EUROCARE-5 – a population-based study. Lancet Oncol. 2014;15(1):23–34.


Allemani C, Weir HK, Carreira H, et al; CONCORD Working Group. Global surveillance of cancer survival 1995–2009: analysis of individual data for 25,676,887 patients from 279 population-based registries in 67 countries (CONCORD-2). Lancet. 2015;385(9972):977–1010.


Pohar Perme M, Esteve J, Rachet B. Analysing population-based cancer survival – settling the controversies. BMC Cancer. 2016;16(1):933.


Quantin C, Abrahamowicz M, Moreau T, et al. Variation over time of the effects of prognostic factors in a population-based study of colon cancer: comparison of statistical models. Am J Epidemiol. 1999;150(11):1188–1200.


Moller H, Sandin F, Robinson D, et al. Colorectal cancer survival in socioeconomic groups in England: variation is mainly in the short term after diagnosis. Eur J Cancer. 2012;48(1):46–53.


Charvat H, Remontet L, Bossard N, et al; CENSUR Working Survival Group. A multilevel excess hazard model to estimate net survival on hierarchical data allowing for non-linear and non-proportional effects of covariates. Stat Med. 2016;35(18):3066–3084.


Austin PC, Wagner P, Merlo J. The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis. Stat Med. 2017;36(6):928–938.


Wynant W, Abrahamowicz M. Impact of the model-building strategy on inference about nonlinear and time-dependent covariate effects in survival analysis. Stat Med. 2014;33(19):3318–3337.


Corazziari I, Quinn M, Capocaccia R. Standard cancer patient population for age standardising survival ratios. Eur J Cancer. 2004;40(15):2307–2316.


Commission E. Why Socio-Economic Inequalities Increase? Facts and Policy Responses in Europe. 1st ed. Luxembourg: Publications Office of the European Union; 2010.


Bryere J, Pornet C, Dejardin O, Launay L, Guittet L, Launoy G. Correction of misclassification bias induced by the residential mobility in studies examining the link between socioeconomic environment and cancer incidence. Cancer Epidemiol. 2015;39(2):256–264.


Binquet C, Abrahamowicz M, Astruc K, Faivre J, Bonithon-Kopp C, Quantin C. Flexible statistical models provided new insights into the role of quantitative prognostic factors for mortality in gastric cancer. J Clin Epidemiol. 2009;62(3):232–240.


Corm S, Roche L, Micol JB, et al. Changes in the dynamics of the excess mortality rate in chronic phase-chronic myeloid leukemia over 1990–2007: a population study. Blood. 2011;118(16):4331–4337.


Hess KR, Levin VA. Getting more out of survival data by using the hazard function. Clin Cancer Res. 2014;20(6):1404–1409.


Mounier M, Bossard N, Remontet L, et al; EUROCARE-5 Working Group; CENSUR Working Survival Group. Changes in dynamics of excess mortality rates and net survival after diagnosis of follicular lymphoma or diffuse large B-cell lymphoma: comparison between European population-based data (EUROCARE-5). Lancet Haematol. 2015;2(11):e481–e491.


Vaupel JW, Yashin AI. Heterogeneity’s ruses: some surprising effects of selection on population dynamics. Am Stat. 1985;39(3):176–185.


Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–473.


Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol. 2007;165(6):710–718.


Heinze G, Wallisch C, Dunkler D. Variable selection – a review and recommendations for the practicing statistician. Biom J. Epub 2018 Jan 2.


Com-Ruelle L, Dourgnon P, Jusot F, Lengagne P. Les problèmes d’alcool en France: quelles sont les populations à risque? [The problems of alcohol in France: what are at-risk populations?] Institut De Recherche Et Documentation En Economie De La Santé. 2008;129:1–6. French.


Guignard R, Beck F, Richard JB, Lermenier A, Wilquin JL, Nguyen-Thanh V. La consommation de tabac en France en 2014: caractéristiques et évolutions récentes [The use of tobacco in France in 2014: characteristics and recent developments]. Évolutions. 2015;31:1–6. French.


Duport N, Serra D, Goulard H, Bloch J. [Which factors influence screening practices for female cancer in France?]. Rev Epidemiol Sante Publique. 2008;56(5):303–313. French.


Oliveira IR, Molenberghs G, Demetrio CG, Dias CT, Giolo SR, Andrade MC. Quantifying intraclass correlations for count and time-to-event data. Biom J. 2016;58(4):852–867.


Goldstein H, Browne W, Rasbash J. Partitioning variation in multilevel models. Understand Stat. 2002;1(4):223–231.


Bryere J, Pornet C, Copin N, et al. Assessment of the ecological bias of seven aggregate social deprivation indices. BMC Public Health. 2017;17(1):86.


Diez-Roux AV. Multilevel analysis in public health research. Annu Rev Public Health. 2000;21:171–192.


Sloggett A, Young H, Grundy E. The association of cancer survival with four socioeconomic indicators: a longitudinal study of the older population of England and Wales 1981–2000. BMC Cancer. 2007;7:20.


Singer S, Bartels M, Briest S, et al. Socio-economic disparities in long-term cancer survival-10 year follow-up with individual patient data. Support Care Cancer. 2017;25(5):1391–1399.


Skyrud KD, Bray F, Eriksen MT, Nilssen Y, Moller B. Regional variations in cancer survival: impact of tumour stage, socioeconomic status, comorbidity and type of treatment in Norway. Int J Cancer. 2016;138(9):2190–2200.

Creative Commons License This work is published by Dove Medical Press Limited, and licensed under a Creative Commons Attribution License. The full terms of the License are available at The license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Download Article [PDF]