Back to Journals » OncoTargets and Therapy » Volume 8

Novel prognostic genes of diffuse large B-cell lymphoma revealed by survival analysis of gene expression data

Authors Li C, Zhu B, Chen J, Xiaobing Huang X

Received 7 June 2015

Accepted for publication 25 September 2015

Published 18 November 2015 Volume 2015:8 Pages 3407—3413

DOI https://doi.org/10.2147/OTT.S90057

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 4

Editor who approved publication: Dr Faris Farassati



Chenglong Li,1,2 Biao Zhu,1,2 Jiao Chen,1,2 Xiaobing Huang1,2

1Department of Hematology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, Sichuan, People’s Republic of China; 2Department of Hematology, Affiliated Medical School of University of Electronic Science and Technology, Chengdu, Sichuan, People’s Republic of China

Objective: This study aimed to identify prognostic genes for diffuse large B-cell lymphoma (DLBCL), using bioinformatic methods.
Methods: Five gene expression data sets were downloaded from the Gene Expression Omnibus database. Significance analysis of microarrays algorithm was used to identify differentially expressed genes (DEGs) from two data sets. Functional enrichment analysis was performed for the DEGs with the Database for Annotation, Visualization and Integration Discovery (DAVID). Survival analysis was performed with the Kaplan–Meier method using function survfit from package survival of R for the other three data sets. Cox univariate regression analysis was used to further screen out prognostic genes.
Results: Thirty-one common DEGs were identified in the two data sets, mainly enriched in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway. Combined with 47 DLBCL-related genes acquired by literature retrieval, a total of 78 potential prognostic genes were obtained. Cases from the other three data sets were used in hierarchical clustering, and the 78 genes could cluster them into several subtypes with significant differences in survival curves. Cox univariate regression analysis revealed 45, 33, and eleven prognostic genes in the three data sets, respectively. Five common prognostic genes were revealed, including LCP2, TNFRSF9, FUT8, IRF4, and TLE1, among which LCP2, FUT8, and TLE1 were novel prognostic genes.
Conclusion: Five prognostic genes of DLBCL were identified in this study. They could not only be used for molecular subtyping of DLBCL but also be potential targets for treatment.

Keywords: diffuse large B-cell lymphoma, gene expression profile, differentially expressed genes, survival analysis, subtype

Introduction

Diffuse large B-cell lymphoma (DLBCL) is one of the most common types of non-Hodgkin lymphoma, which occurs primarily in older individuals. It is an aggressive tumor. R-CHOP, an improved form of cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP) with the addition of rituximab, is a standard treatment for DLBCL.

Many subtypes of the lymphoid neoplasms are established based on the World Health Organization classification system, and DLBCL is the most common type in Asians.1 However, classification merely based on morphology and clinical information is difficult and thus a considerable percentage of cases are not classified. Gene expression profiling studies have attempted to distinguish heterogeneous groups of DLBCL from each other.24 For instance, by gene expression profile, two groupings of germinal center B-cell-like and the activated B-cell-like were identified as two DLBCL subtypes in the current World Health Organization classification.5 The study by Lenz et al6 provides genetic evidence that the DLBCL subtypes are distinct diseases that use different oncogenic pathways. Obviously, DNA microarrays provide a better understanding of the biology of DLBCL and advance the development of novel diagnostic tools.7

Meanwhile, many genes with prognostic effect have been reported in DLBCL, such as BCL28 and BCL6.9 Hu et al10 suggested that MYC/BCL2 coexpression, rather than cell-of-origin classification, is a better predictor of prognosis in patients with DLBCL treated with R-CHOP. Additionally, Gratzinger et al11 reported the prognostic value of vascular endothelial growth factor and vascular endothelial growth factor receptors in DLBCL patients treated with anthracycline-based chemotherapy. Besides, Hussain et al12 found that X-linked inhibitor of apoptosis expression is a poor prognostic factor for DLBCL.

Due to the heterogeneity of DLBCL, more works are necessary to advance molecular subtyping as well as to discover the prognostic genes. In this study, two gene expression data sets were analyzed to identify differentially expressed genes (DEGs), which were regarded as potential prognostic genes for DLBCL, and to ascertain whether these genes would be used to well distinguish the subtypes of DLBCL in other three expression profile data sets.

Methods

Gene expression data

All the five gene expression data sets were downloaded from the Gene Expression Omnibus.

  1. The data set of GSE3291813,14 collected gene expression profiles of 172 DLBCL samples. The platform of Illumina GPL8432 (Illumina HumanRef-8 WG-DASL v3.0) was used. It included a total of 294 sequencing data since some samples were sequenced repeatedly.
  2. The data set of GSE1084615,16 included gene expression profiles of 181 clinical samples from chemotherapy-treated patients and 233 clinical samples from rituximab–chemotherapy-treated patients. The platform was Affymetrix GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). A total of 416 gene expression data were included.
  3. The data set of GSE113186 consisted of gene expression profiles of 203 DLBCL samples, based on the platform of Affymetrix GPL570.
  4. The data set of GSE932717 collected gene expression profiles of 36 DLBCL samples and eight reactive lymph nodes samples, which were used as controls. The platform of GPL6011 (CNIO Human Oncochip 1.0, 1.2, and 2.0) was used.
  5. The data set of GSE3088118 contained gene expression profiles of 23 DLBCL samples and ten healthy controls, in order to investigate the changes in NF-κB pathway activation. The platform was Affymetrix GPL3738 (Affymetrix Canine Genome 2.0 Array).

Pretreatment of raw data

Probes were mapped to genes according to the annotation files. For a gene corresponding to more than one probe, the average probe value was calculated as the gene expression value for the specific gene.19 Subsequently, log2 conversion and quantile normalization20 were applied on the data.

A total of 4,356 and 16,454 unique genes were identified in GSE9327 and GSE30881, respectively. Both GSE10846 and GSE11318 were obtained using GPL570, and a total of 20,693 unique genes were acquired. Besides, 18,403 unique genes were identified in GSE32918.

Clinical information

The expression profiles of GSE10846 and GSE11318 provided clinical information such as age, sex, stage, lactate dehydrogenase (LDH) level, extranodal versus nodal presentation, treatment, subtype, survival time, and survival status. GSE32918 described age, sex, treatment, subtype, survival time, and survival status. According to these three data sets, we found that “stage” could well separate samples into different groups with diverse survival time while “age”, “sex”, and “treatment” could not.

Screening of DEGs

Significance analysis of microarrays algorithm21 was adopted to screen out DEGs. It can reduce the false-positive rate in multiple testing via controlling false discovery rate. Relative difference (statistic d) is calculated as follows:


(1)

Statistic d measures the relative differences in gene expression levels, and it is the corrected t. represents the average expression level of a gene under certain state, represents the average expression level of a gene under another state, and s represents the variance of a gene.

Adjusted P-value <0.05 and log |fold change| >1.5 were set as the threshold to select the DEGs.

Functional enrichment analysis

Gene ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were performed for the DEGs with DAVID22 to examine the potential altered functions and pathways of these DEGs. False discovery rate <0.05 was set as the cutoff.

Survival analysis

Kaplan–Meier method (K–M method; product-limit method) is suitable for analysis with small sample size. The analysis procedure is as follows: 1) Put the samples in ascending order according to the survival time, rank i=1, 2, …, n. 2) List the number of surviving at the beginning of each time point (in fact, a short time). 3) Calculate the probability of death at each time point q and survival probability p (p=1-q). 4) Calculate the survival rate S(ti) for each time point, which equals to the product of each survival probability from the starting point to ti. S(ti)=p1×p2×p3pti. Finally, plot survival curves with survival time in abscissa and survival rate in ordinate.

Survival analysis was performed with function survfit from package survival of R.23 Difference in survival curves for two groups was analyzed with log-rank method using function survdiff from package survival.24

Screening of risk factors

Cox univariate regression analysis was carried out using function coxph from package survival to screen out risk factors related to survival.25 The formula is as follows:

h(t,x) = h0(t)exp(βi×x,i)

(2)

h0(t) is the basic risk function, the risk function when all covariates X1, X2, …, Xm are 0 or under standard conditions, and it is generally unknown. h (t, x) represents the risk function when each covariate X is given a fixed value, and it is proportional to h0(t). Therefore, the model is also known as the proportional hazard model. X1, X2, …, Xm are covariates while β1, β2, …, βm are regression coefficients. When the regression coefficient βi>0, that is, the risk ratio >1, it indicates that the covariate is a risk factor. The greater the covariate is, the shorter the survival time is. When the regression coefficient βi<0, that is, the risk ratio <1, it indicates that the covariate is a protective factor, so the greater the covariate is, the longer the survival time is.

Results

Differentially expressed genes and enriched biological functions

According to the aforementioned criteria, a total of 437 DEGs were identified in DLBCL from the data set GSE9327 and 1,457 DEGs from the data set GSE30881. Thirty-one overlapping genes were selected out and functional enrichment analysis was performed for these genes, which are mainly involved in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway (Figure 1), suggesting that the 31 DEGs were closely associated with the development of DLBCL.

Figure 1 Functional enrichment analysis result for the 31 differentially expressed genes (DEGs) (top 20 gene ontology [GO] terms ranked by the significance).
Notes: X-axis represents the adjusted P-value transformed by log2, and Y-axis denotes the enriched GO terms.
Abbreviation: IL, interleukin.

Moreover, 47 DLBCL-related genes were acquired via literature retrieval.2,15,2631

Survival analysis result

The 31 DEGs and 47 DLBCL-related genes were combined and a total of 78 potential prognostic genes were obtained, which were used to classify samples with diverse survival time from other three data sets.

  1. In the data set of GSE10846, 71 out of the 78 genes were detected. Using hierarchical clustering, the 71 genes could well cluster the 416 DLBCL samples into four subtypes (Figure 2A). The differences in survival curves of the four subtypes were found to be significant (P=7.65e-11; Figure 2B).
  2. In the data set of GSE11318, 71 out of the 78 genes were detected. Using hierarchical clustering, the 71 genes could well classify the 203 DLBCL samples into three subtypes (Figure 2C). The difference in survival curves of the three subtypes was found to be significant (P=7.5e-05; Figure 2D).
  3. In the data set of GSE32918, 69 out of the 78 genes were detected. Some samples were sequenced repeatedly, and thus average expression levels were calculated as the final values. Using hierarchical clustering, the 69 genes could cluster the 172 DLBCL samples into three subtypes (Figure 2E). The difference in survival curves of the three subtypes was found to be significant (P=0.013; Figure 2F).

Figure 2 Subtyping of diffuse large B-cell lymphoma (DLBCL) in three gene data sets using the 78 predicted and curated DLBCL-related genes.
Notes: (A, C, and E) Hierarchical clustering that denotes the subtypes of DLBCL clustered by the 78 genes in the gene data sets of GSE10846, GSE11318, and GSE32918, respectively; (B, D, and F) Kaplan–Meier survival curves of the subtypes in the gene data sets of GSE10846, GSE11318, and GSE32918, respectively.

Prognostic genes

The correlation between each gene and the survival of DLBCL patients was calculated with Cox univariate regression analysis to further screen out genes with prognostic value. In the data set of GSE10846, 45 genes were found to have significant prognostic effect, while in GSE11318, 33 genes had prognostic effect, and in GSE32918, eleven genes showed prognostic value. Five prognostic genes were common among the three data sets (Figure 3; Table 1). According to the coefficient, lymphocyte cytosolic protein 2 (LCP2) and tumor necrosis factor receptor superfamily member 9 (TNFRSF9) might be related to poor prognosis while fucosyltransferase 8 (FUT8), interferon regulatory factor 4 (IRF4), and transducin-like enhancer of split 1 (TLE1) might bring in favorable prognosis.

Figure 3 Venn diagram of the prognostic genes from three gene expression data sets (GSE10846, GSE11318, and GSE21918).

Table 1 Five common prognostic genes

Discussion

In this study, five gene expression data sets were downloaded from the Gene Expression Omnibus. Thirty-one common DEGs were identified from two gene expression data sets, mainly enriching in the regulation of lymphocyte activation, immune response, and interleukin-mediated signaling pathway, which were closely associated with the development of DLBCL. Combined with 47 DLBCL-related genes acquired by literature retrieval, 78 potential prognostic genes were obtained, which could successfully cluster the DLBCL samples from another three gene expression data sets into several subtypes with significant differences in survival. Prognostic genes were screened out via Cox univariate regression analysis, and five common genes were acquired, such as LCP2, TNFRSF9, FUT8, IRF4, and TLE1.

TNFRSF932 and IRF433 are two known prognostic genes of DLBCL. TNFRSF9 is a member of the TNF-receptor superfamily that can induce proliferation in peripheral monocytes. Alizadeh et al32 indicate that expression levels of LIM domain only 2 (LMO2) and TNFRSF9 powerfully predict the overall survival in patients with DLBCL. TNFRSF9 can also serve as the target to treat DLBCL. The study by Houot et al34 demonstrates that anti-CD137 therapy has a potent antilymphoma activity in a mouse model. IRF4 belongs to the interferon regulatory factor (IRF) family of transcription factors. Salaverria et al35 report that translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. Therefore, it may be a therapeutic target of DLBCL.36

LCP2, FUT8, and TLE1 may be novel prognostic genes of DLBCL. LCP2 plays a positive role in promoting T-cell development and activation as well as mast cell and platelet function. FUT8 is an enzyme belonging to the family of fucosyltransferases. It may contribute to the malignancy of cancer cells and to their invasive and metastatic capabilities.37 Chen et al38 found that FUT8 is upregulated during epithelial–mesenchymal transition via the transactivation of β-catenin/lymphoid enhancer-binding factor (LEF)-1. Based on these instances, we speculated that FUT8 might exert a similar role in DLBCL and thus contributes to the metastasis of DLBCL. TLE1 is a multitasked transcriptional corepressor that acts through the acute myelogenous leukemia 1, Wnt, and Notch signaling pathways. Promoter CpG island hypermethylation-associated inactivation of TLE1 has been observed in DLBCL.39 Fraga et al40 further point out that TLE1 epigenetic inactivation contributes to the development of hematologic malignancies by disrupting critical differentiation and growth-suppressing pathways. However, the exact role of TLE1 in DLBCL remains to be explored. We supposed that more researches may unveil clinical applications of the three genes.

Overall, five critical genes with prognostic effect were disclosed in DLBCL via bioinformatic analysis of existing gene expression data. Two out of the five genes have been reported while the other three are novel predictors. Further researches on these genes can benefit molecular subtyping and also provide potential therapeutic targets of DLBCL.

Highlights

  1. A set of 31 common DEGs were identified from two gene expression data sets.
  2. Totally, 78 potential prognostic genes were suggested be used for subtyping of DLBCL.
  3. Five prognostic genes, including three novel ones, were identified in DLBCL.

Disclosure

The authors report no conflicts of interest in this work.


References

1.

Morton LM, Wang SS, Devesa SS, Hartge P, Weisenburger DD, Linet MS. Lymphoma incidence patterns by WHO subtype in the United States, 1992–2001. Blood. 2006;107(1):265–276.

2.

Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403(6769):503–511.

3.

Hoefnagel JJ, Dijkman R, Basso K, et al. Distinct types of primary cutaneous large B-cell lymphoma identified by gene expression profiling. Blood. 2005;105(9):3671–3678.

4.

Visco C, Li Y, Xu-Monette ZY, et al. Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma: a report from the International DLBCL Rituximab-CHOP Consortium Program Study. Leukemia. 2012;26(9):2103–2113.

5.

Xu Q, Tan C, Ni S, et al. Identification and validation of a two-gene expression index for subtype classification and prognosis in diffuse large B-cell lymphoma. Sci Rep. 2015;5:10006.

6.

Lenz G, Wright GW, Emre NC, et al. Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci U S A. 2008;105(36):13520–13525.

7.

Lossos IS, Morgensztern D. Prognostic biomarkers in diffuse large B-cell lymphoma. J Clin Oncol. 2006;24(6):995–1007.

8.

Dunleavy K, Wilson WH. Differential role of BCL2 in molecular subtypes of diffuse large B-cell lymphoma. Clin Cancer Res. 2011;17(24):7505–7507.

9.

Winter JN, Weller EA, Horning SJ, et al. Prognostic significance of Bcl-6 protein expression in DLBCL treated with CHOP or R-CHOP: a prospective correlative study. Blood. 2006;107(11):4207–4213.

10.

Hu S, Xu-Monette ZY, Tzankov A, et al. MYC/BCL2 protein coexpression contributes to the inferior survival of activated B-cell subtype of diffuse large B-cell lymphoma and demonstrates high-risk gene expression signatures: a report from The International DLBCL Rituximab-CHOP Consortium Program. Blood. 2013;121(20):4021–4031.

11.

Gratzinger D, Zhao S, Tibshirani RJ, et al. Prognostic significance of VEGF, VEGF receptors, and microvessel density in diffuse large B cell lymphoma treated with anthracycline-based chemotherapy. Lab Invest. 2008;88(1):38–47.

12.

Hussain AR, Uddin S, Ahmed M, et al. Prognostic significance of XIAP expression in DLBCL and effect of its inhibition on AKT signalling. J Pathol. 2010;222(2):180–190.

13.

Barrans SL, Crouch S, Care MA, et al. Whole genome expression profiling based on paraffin embedded tissue can be used to classify diffuse large B-cell lymphoma and predict clinical outcome. Br J Haematol. 2012;159(4):441–453.

14.

Care MA, Cocco M, Laye JP, et al. SPIB and BATF provide alternate determinants of IRF4 occupancy in diffuse large B-cell lymphoma linked to disease heterogeneity. Nucleic Acids Res. 2014;42(12):7591–7610.

15.

Lenz G, Wright G, Dave SS, et al; Lymphoma/Leukemia Molecular Profiling Project. Stromal gene signatures in large-B-cell lymphomas. N Engl J Med. 2008;359(22):2313–2323.

16.

Cardesa-Salzmann TM, Colomo L, Gutierrez G, et al. High microvessel density determines a poor outcome in patients with diffuse large B-cell lymphoma treated with rituximab plus chemotherapy. Haematologica. 2011;96(7):996–1001.

17.

Ruiz-Vela A, Aggarwal M, de la Cueva P, et al. Lentiviral (HIV)-based RNA interference screen in human B-cell receptor regulatory networks reveals MCL1-induced oncogenic pathways. Blood. 2008;111(3):1665–1676.

18.

Mudaliar MA, Haggart RD, Miele G, et al. Comparative gene expression profiling identifies common molecular signatures of NF-kappaB activation in canine and human diffuse large B cell lymphoma (DLBCL). PLoS One. 2013;8(9):e72591.

19.

Ma H, Schadt EE, Kaplan LM, Zhao H. COSINE: condition-specific sub-network identification using a global optimization method. Bioinformatics. 2011;27(9):1290–1298.

20.

Ferrari F, Bortoluzzi S, Coppe A, et al. Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinformatics. 2007;8:446.

21.

Larsson O, Wahlestedt C, Timmons JA. Considerations when using the significance analysis of microarrays (SAM) algorithm. BMC Bioinformatics. 2005;6:129.

22.

Dennis G Jr, Sherman BT, Hosack DA, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4(5):3.

23.

Xu Y, Gao X, Wang Z. [Nonparametric method of estimating survival functions containing right-censored and interval-censored data]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2014;31(2):267–272.

24.

Jones MP, Crowley J. A general class of nonparametric tests for survival analysis. Biometrics. 1989;45(1):157–170.

25.

Andersen PAG. R, Cox’s regression model for counting processes, a large sample study. Ann Stat. 1982;10:20.

26.

Rosenwald A, Wright G, Chan WC, et al; Lymphoma/Leukemia Molecular Profiling Project. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002;346(25):1937–1947.

27.

Shipp MA, Ross KN, Tamayo P, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med. 2002;8(1):68–74.

28.

Lossos IS, Czerwinski DK, Alizadeh AA, et al. Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N Engl J Med. 2004;350(18):1828–1837.

29.

Cai YD, Huang T, Feng KY, Hu L, Xie L. A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PLoS One. 2010;5(9):e12726.

30.

Rimsza LM, Unger JM, Tome ME, Leblanc ML. A strategy for full interrogation of prognostic gene expression patterns: exploring the biology of diffuse large B cell lymphoma. PLoS One. 2011;6(8):e22267.

31.

Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003;100(17):9991–9996.

32.

Alizadeh AA, Gentles AJ, Alencar AJ, et al. Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment. Blood. 2011;118(5):1350–1358.

33.

Richards KL, Motsinger-Reif AA, Chen HW, et al. Gene profiling of canine B-cell lymphoma reveals germinal center and postgerminal center subtypes with different survival times, modeling human DLBCL. Cancer Res. 2013;73(16):5029–5039.

34.

Houot R, Goldstein MJ, Kohrt HE, et al. Therapeutic effect of CD137 immunomodulation in lymphoma and its enhancement by Treg depletion. Blood. 2009;114(16):3431–3438.

35.

Salaverria I, Philipp C, Oschlies I, et al; Molecular Mechanisms in Malignant Lymphomas Network Project of the Deutsche Krebshilfe, German High-Grade Lymphoma Study Group, Berlin-FrankfurtMünster-NHL trial group. Translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. Blood. 2011;118(1):139–147.

36.

Shaffer AL, Emre NC, Romesser PB, Staudt LM. IRF4: immunity. malignancy! therapy? Clin Cancer Res. 2009;15(9):2954–2961.

37.

Ito Y, Miyauchi A, Yoshida H, et al. Expression of alpha1,6-fucosyltransferase (FUT8) in papillary carcinoma of the thyroid: its linkage to biological aggressiveness and anaplastic transformation. Cancer Lett. 2003;200(2):167–172.

38.

Chen CY, Jan YH, Juan YH, et al. Fucosyltransferase 8 as a functional regulator of nonsmall cell lung cancer. Proc Natl Acad Sci U S A. 2013;110(2):630–635.

39.

Castellano G, Torrisi E, Ligresti G, et al. Yin Yang 1 overexpression in diffuse large B-cell lymphoma is associated with B-cell transformation and tumor progression. Cell Cycle. 2010;9(3):557–563.

40.

Fraga MF, Berdasco M, Ballestar E, et al. Epigenetic inactivation of the Groucho homologue gene TLE1 in hematologic malignancies. Cancer Res. 2008;68(11):4116–4122.

Creative Commons License © 2015 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.