Skip to main content
Top

24-07-2017 | Systemic lupus erythematosus | Article

Novel risk genes for systemic lupus erythematosus predicted by random forest classification

Journal: Scientific Reports

Authors: Jonas Carlsson Almlöf, Andrei Alexsson, Juliana Imgenberg-Kreuz, Lina Sylwan, Christofer Bäcklin, Dag Leonard, Gunnel Nordmark, Karolina Tandre, Maija-Leena Eloranta, Leonid Padyukov, Christine Bengtsson, Andreas Jönsen, Solbritt Rantapää Dahlqvist, Christopher Sjöwall, Anders A. Bengtsson, Iva Gunnarsson, Elisabet Svenungsson, Lars Rönnblom, Johanna K. Sandling, Ann-Christine Syvänen

Publisher: Nature Publishing Group UK

Abstract

Genome-wide association studies have identified risk loci for SLE, but a large proportion of the genetic contribution to SLE still remains unexplained. To detect novel risk genes, and to predict an individual’s SLE risk we designed a random forest classifier using SNP genotype data generated on the “Immunochip” from 1,160 patients with SLE and 2,711 controls. Using gene importance scores defined by the random forest classifier, we identified 15 potential novel risk genes for SLE. Of them 12 are associated with other autoimmune diseases than SLE, whereas three genes (ZNF804A, CDK1, and MANF) have not previously been associated with autoimmunity. Random forest classification also allowed prediction of patients at risk for lupus nephritis with an area under the curve of 0.94. By allele-specific gene expression analysis we detected cis-regulatory SNPs that affect the expression levels of six of the top 40 genes designed by the random forest analysis, indicating a regulatory role for the identified risk variants. The 40 top genes from the prediction were overrepresented for differential expression in B and T cells according to RNA-sequencing of samples from five healthy donors, with more frequent over-expression in B cells compared to T cells.
Literature
1.
Bengtsson, A. A. & Ronnblom, L. Systemic lupus erythematosus: still a challenge for physicians. Journal of internal medicine 281, 52–64, doi:10.​1111/​joim.​12529 (2017).CrossRefPubMed
2.
Morris, D. L. et al. Genome-wide association meta-analysis in Chinese and European individuals identifies ten new loci associated with systemic lupus erythematosus. Nat Genet 48, 940–946, doi:10.​1038/​ng.​3603 (2016).CrossRefPubMedPubMedCentral
3.
Iwamoto, T. & Niewold, T. B. Genetics of human lupus nephritis. Clinical immunology, doi:10.​1016/​j.​clim.​2016.​09.​012 (2016).
4.
Bolin, K. et al. Association of STAT4 polymorphism with severe renal insufficiency in lupus nephritis. PLoS One 8, e84450, doi:10.​1371/​journal.​pone.​0084450 (2013).ADSCrossRefPubMedPubMedCentral
5.
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42, D1001–1006, doi:10.​1093/​nar/​gkt1229 (2014).CrossRefPubMed
6.
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444, doi:10.​1038/​nature14539 (2015).ADSCrossRefPubMed
7.
Cortes, C. & Vapnik, V. Support-Vector Networks. Machine Learning 20, 273–297, doi:10.​1023/​a:​1022627411411 (1995).MATH
8.
Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001).CrossRefMATH
9.
Jostins, L. & Barrett, J. C. Genetic risk prediction in complex disease. Hum Mol Genet 20, R182–188, doi:10.​1093/​hmg/​ddr378 (2011).CrossRefPubMedPubMedCentral
10.
Caruana, R. & Niculescu-Mizil, A. In ICML ‘06 Proceedings of the 23rd international conference on Machine learning 161–168.
11.
Goldstein, B. A., Polley, E. C. & Briggs, F. B. Random forests for genetic association studies. Stat Appl Genet Mol Biol 10, 32, doi:10.​2202/​1544-6115.​1691 (2011).MathSciNetCrossRefPubMedPubMedCentralMATH
12.
Okser, S. et al. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet 10, e1004754, doi:10.​1371/​journal.​pgen.​1004754 (2014).CrossRefPubMedPubMedCentral
13.
Wellcome Trust Case-Control Consortium 2. http://​www.​wtccc.​org.​uk/​ccc2/​wtccc2_​studies.​shtml (2016). Accessed 17 Aug 2016.
14.
Cortes, A. & Brown, M. A. Promise and pitfalls of the Immunochip. Arthritis research & therapy 13, 101, doi:10.​1186/​ar3204 (2011).CrossRef
15.
Almlof, J. C. et al. Powerful identification of cis-regulatory SNPs in human primary monocytes using allele-specific gene expression. PLoS One 7, e52260, doi:10.​1371/​journal.​pone.​0052260 (2012).ADSCrossRefPubMedPubMedCentral
16.
Metz, C. E. Basic principles of ROC analysis. Seminars in nuclear medicine 8, 283–298 (1978).CrossRefPubMed
17.
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77, doi:10.​1186/​1471-2105-12-77 (2011).CrossRefPubMedPubMedCentral
18.
Tan, E. M. et al. The 1982 revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum 25, 1271–1277 (1982).CrossRefPubMed
19.
Wu, L. et al. Identification of Cyclin-Dependent Kinase 1 as a Novel Regulator of Type I Interferon Signaling in Systemic Lupus Erythematosus. Arthritis & rheumatology 68, 1222–1232, doi:10.​1002/​art.​39543 (2016).
20.
Wang, J. et al. Deficiency of IRE1 and PERK signal pathways in systemic lupus erythematosus. The American journal of the medical sciences 348, 465–473, doi:10.​1097/​MAJ.​0000000000000328​ (2014).CrossRefPubMed
21.
Rupasree, Y., Naushad, S. M., Rajasekhar, L. & Kutala, V. K. Association of genetic variants of xenobiotic metabolic pathway with systemic lupus erythematosus. Indian journal of biochemistry & biophysics 50, 447–452 (2013).
22.
Girgenti, M. J., LoTurco, J. J. & Maher, B. J. ZNF804a regulates expression of the schizophrenia-associated genes PRSS16, COMT, PDE4B, and DRD2. PLoS One 7, e32404, doi:10.​1371/​journal.​pone.​0032404 (2012).ADSCrossRefPubMedPubMedCentral
23.
Yougbare, I., Boire, G., Roy, M., Lugnier, C. & Rouseau, E. NCS 613 exhibits anti-inflammatory effects on PBMCs from lupus patients by inhibiting p38 MAPK and NF-kappaB signalling pathways while reducing proinflammatory cytokine production. Canadian journal of physiology and pharmacology 91, 353–361, doi:10.​1139/​cjpp-2012-0233 (2013).CrossRefPubMed
24.
Wittmann, M. & Helliwell, P. S. Phosphodiesterase 4 inhibition in the treatment of psoriasis, psoriatic arthritis and other chronic inflammatory diseases. Dermatology and therapy 3, 1–15, doi:10.​1007/​s13555-013-0023-0 (2013).CrossRefPubMedPubMedCentral
25.
Eloranta, M. L. & Ronnblom, L. Cause and consequences of the activated type I interferon system in SLE. Journal of molecular medicine. doi:10.​1007/​s00109-016-1421-4 (2016).PubMedPubMedCentral
26.
Harper, J. W. et al. Inhibition of cyclin-dependent kinases by p21. Molecular biology of the cell 6, 387–400 (1995).CrossRefPubMedPubMedCentral
27.
Yang, W. et al. Meta-analysis followed by replication identifies loci in or near CDKN1B, TET3, CD80, DRAM1, and ARID5B as associated with systemic lupus erythematosus in Asians. Am J Hum Genet 92, 41–51, doi:10.​1016/​j.​ajhg.​2012.​11.​018 (2013).CrossRefPubMedPubMedCentral
28.
Pastinen, T. Genome-wide allele-specific analysis: insights into regulatory variation. Nature reviews. Genetics 11, 533–538, doi:10.​1038/​nrg2815 (2010).CrossRefPubMed
29.
Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 6, e1000864, doi:10.​1371/​journal.​pgen.​1000864 (2010).CrossRefPubMedPubMedCentral
30.
Fries, J. F. & Holman, H. R. Systemic lupus erythematosus: a clinical analysis. Major problems in internal medicine 6, v–199 (1975).
31.
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–575, doi:10.​1086/​519795 (2007).CrossRefPubMedPubMedCentral
32.
Liaw, A. & Wiener, M. Classification and Regression by randomForest. R News 3, 18–22 (2002).
33.
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44, D733–745, doi:10.​1093/​nar/​gkv1189 (2016).CrossRefPubMed
34.
Stahl-Hallengren, C., Jonsen, A., Nived, O. & Sturfelt, G. Incidence studies of systemic lupus erythematosus in Southern Sweden: increasing age, decreasing frequency of renal manifestations and good prognosis. J Rheumatol 27, 685–691 (2000).PubMed
35.
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nature methods 12, 1061–1063, doi:10.​1038/​nmeth.​3582 (2015).CrossRefPubMedPubMedCentral
36.
McManus, C. J. et al. Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res 20, 816–825, doi:10.​1101/​gr.​102491.​109 (2010).CrossRefPubMedPubMedCentral
37.
Heap, G. A. et al. Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19, 122–134, doi:10.​1093/​hmg/​ddp473 (2010).CrossRefPubMed
38.
Pastinen, T. & Hudson, T. J. Cis-acting regulatory variation in the human genome. Science 306, 647–650, doi:10.​1126/​science.​1101659 (2004).ADSCrossRefPubMed
39.
Ge, B. et al. Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nat Genet 41, 1216–1222, doi:10.​1038/​ng.​473 (2009).CrossRefPubMed
40.
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols 7, 562–578, doi:10.​1038/​nprot.​2012.​016 (2012).CrossRefPubMedPubMedCentral
41.
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. doi:10.​1093/​nar/​gkv007 (2015).
42.
Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3, doi:10.​2202/​1544-6115.​1027 (2004).
43.
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29, doi:10.​1186/​gb-2014-15-2-r29 (2014).CrossRefPubMedPubMedCentral
44.
Carr, E. J. et al. Contrasting genetic association of IL2RA with SLE and ANCA-associated vasculitis. BMC Med Genet 10, 22, doi:10.​1186/​1471-2350-10-22 (2009).CrossRefPubMedPubMedCentral