Disease clusters subsequent to anxiety and stress-related disorders and their genetic determinants

0
Disease clusters subsequent to anxiety and stress-related disorders and their genetic determinants

Study design

The analytic process included two parts, namely phenotypic and genetic analyses (Fig. 1). In the phenotypic analysis, we undertook a phenome-wide association study (PheWAS), followed by both comorbidity network analysis and disease trajectory analysis to determine robust disease clusters (i.e., associated diseases with both temporal and non-temporal relationships) following a diagnosis of anxiety or stress-related disorders in the Swedish cohort (i.e., the exploratory dataset). To validate identified disease clusters and the diseases they entailed, we performed similar analyses in the UK cohort of similar study designs (i.e., the validation dataset). Regarding the genetic analyses based on individual-level genotyping data of the UK cohort, we first calculated a cluster-specific susceptibility score, which was designated as a quantitative index of individuals’ susceptibility to a specific disease cluster, and then performed GWAS analysis, gene mapping, and enrichment analysis to identify risk genes and biological pathways that may count for the pathogenesis of such a disease cluster after anxiety or stress-related disorders.

Swedish cohort

The Swedish Patient Register includes nearly complete health records of inpatient care since 1987 and outpatient specialist care since 2001 in Sweden49. By cross-linkage to the Total Population Register using the unique Swedish personal identification numbers, we included all Swedish-born individuals residing from 2001 to 2016 in Sweden and excluded those with any pre-existing psychiatric disorders or history of severe somatic diseases at the time of diagnosis determined by the Charlson Comorbidity Index before 200150, leading to a study population of 8,456,485 (Supplementary Fig. 1). We focused on Swedish-born individuals in the present study to reduce the heterogeneity in genetic background as well as other sociodemographic factors, including differential health-seeking behaviors. Among these, we identified all individuals who received a first primary diagnosis in specialized care of anxiety or stress-related disorders from 2001 to 2016 (N = 212,767, 63.3% with anxiety disorder), and a set of ten unaffected individuals randomly selected from the study base per exposed patient, individually matched on sex and birth year using incidence density sampling (N = 2,127,670), without history of other psychiatric disorders and severe somatic diseases. The diagnostic date of anxiety or stress-related disorders was used as the index date for the start of follow-up of both the exposed patients and their matched unaffected individuals.

Follow-up

To minimize the concern of reverse causality, we followed all participants of the Swedish cohort for all medical conditions from 6 months after the index date until death, first diagnosis of anxiety or stress-related disorders (for matched unexposed individuals), emigration, or the end of the study period (i.e., 31 December 2016), whichever occurred first.

Ascertainment of anxiety and stress-related disorders and subsequent medical conditions

In the Swedish cohort, we defined anxiety or stress-related disorders as any first specialist care diagnosis in an inpatient or outpatient hospital visit, where these disorders were identified as the primary discharge diagnosis, according to the Swedish Patient Register, using the 10th Swedish revision of the International Classification of Diseases (ICD-10) codes (anxiety: F40 and F41, stress-related disorder: F43) (Supplementary Table 1). In the PheWAS, medical conditions refer to any disease or health outcomes recorded in the Patient Register comprising inpatient and outpatient diagnoses. We ascertained medical conditions through the primary diagnosis from the Patient Register, using the 3-digit ICD-10 codes (A00 to N99) (Supplementary Table 1). The diagnostic codes for most common diseases in the Patient Register have been validated, showing a satisfactory accuracy with positive predicted values [PPV] of 85–95% for most common diseases49, 81% for social anxiety disorder51, and 75–90% for PTSD52. We obtained the highest level of education and income at the year of index date from the Swedish Longitudinal Integration Database for Health Insurance and Labor Market53.

UK cohort

The validation dataset (UK cohort) was constructed based on the UK Biobank, using a similar design (Supplementary Fig. 2). The UK Biobank (UKB) is a community-based cohort study that enrolled half a million participants aged 40–69 at recruitment between 2006 and 2010 across England, Scotland, and Wales. Details of the study design are described elsewhere54. The inpatient hospital data, obtained from the Hospital Episode Statistics database, the Scottish Morbidity Record, and the Patient Episode Database, cover all UK Biobank participants since 199755. The primary care data, provided by various general practitioner computer system suppliers, cover ~45% of participants since 198555. We first excluded individuals who had withdrawn from the UK Biobank (n = 108) or had conflicting information (n = 1), leaving 502,398 eligible participants (Supplementary Fig. 2). Among these, we constructed a matched cohort, including patients with newly inpatient/primary care diagnosed anxiety or stress-related disorders between January 1, 1997 and December 31, 2019 (N = 23,365) who had no history of severe somatic diseases or other psychiatric disorders, and up to ten unaffected individuals for each patient who were randomly selected and individually matched by sex and year of birth using incidence density sampling (N = 233,596). The diagnostic date of anxiety or stress-related disorders was used as the index date for the start of follow up of the exposed and matched unaffected individuals.

We followed all participants of the UK cohort for all medical conditions from 6 months after the index date until death, first diagnosis of anxiety or stress-related disorders (for matched unexposed individuals), loss to follow-up56, or the end of the study period (i.e., 31 December 2019), whichever occurred first.

In the UK cohort, we defined a new diagnosis of anxiety or stress-related disorder as a first primary diagnosis based on the inpatient and primary care data, using the ICD-10 codes for the inpatient hospital data (Supplementary Data 1) and the version 2 and version 3 read codes (i.e., Read v2 and Read v3) for the primary care data (Supplementary Table 4). Medical conditions were ascertained from the primary and secondary diagnoses through the inpatient hospital data (Supplementary Data 1). Information on the highest educational level and Townsend Deprivation Index (proxy for socioeconomic status, with a higher index score indicating a higher degree of deprivation)57 were collected at recruitment through questionnaires.

Statistical analyses

The age distribution differed between the Swedish cohort and the UK cohort (median age at diagnosis 32 versus 52). To facilitate validation between the two cohorts, we selected a sub-cohort of the Swedish cohort, namely participants with age >second tertile (median age at index date = 51, N = 70,026, Table 1), and used this sub-cohort as the exploration dataset throughout the main analyses. A sensitivity analysis was performed using the entire Swedish cohort with all age groups.

Exploration of associated disease clusters in the Swedish cohort

In the exploration dataset (Swedish cohort), we identified a total of 454 medical conditions diagnosed six months after the diagnosis of anxiety or stress-related disorders. To ensure statistical power, we included only medical conditions with a prevalence ≥0.5% among patients with anxiety or stress-related disorders. We performed a PheWAS to investigate the associations between anxiety or stress-related disorders and each medical condition, using Cox regression models stratified by matching variables (i.e., sex and birth year) with adjustment for highest education and income. Individuals with a prior diagnosis of the studied medical condition were excluded, when estimating the association with each medical condition. Only medical conditions with statistically significant positive associations, after adjusting for multiple testing (hazard ratio [HR] > 1, and false discovery rate [FDR] adjusted p value [i.e., q value] <0.05), were included in the following analyses.

Among the identified medical conditions from the PheWAS, we constructed all possible disease pairs as disease 1 (D1) and disease 2 (D2) pairs and only analyzed disease pairs that co-occurred with a prevalence ≥0.25% among patients with anxiety or stress-related disorders. To ensure comorbidity strength, we calculated the relative risk (RR) and Pearson’s correlation (Φ-correlation) for each disease pair. For each disease pair, a sub-cohort was formed through excluding patients with a history of D1 and D2 before their index date (i.e., the diagnosis date of anxiety or stress-related disorders). The formulas for RR and Φ-correlation were calculated using the following formulas:

$$RR_ij=\fracC_ijN_ijC_iC_j$$

$$\Phi _ij=\fracC_ijN_ij-C_iC_j\sqrtC_iC_j(N_ij-C_i)(N_ij-C_j)$$

Where \(C_ij\) is the number of patients affected by both D1 and D2, and \(N_ij\) is the number of individuals in the sub-cohort, while \(C_i\) and \(C_j\) are the number of patients affected by D1 and D2 respectively. For both RR and Φ-correlation measures, the significance of RR = 0 and Φ = 0 can be both determined using z-test (given large sample size in our study). The corresponding z-score for RR and Φ-correlation were calculated using the following formula21,58:

$$z_ij^{{{{\rmRR}}}}=\frac{{\mathrmln}\,(RR_ij)}{\sqrt{\frac1C_ij-\frac1N_ij+\frac1C_iC_j/N_ij-\frac1N_ij}}$$

$$z_ij^\varPhi =\frac{\varPhi _ij\sqrt{\max (C_ij,\, C_j)-2}}{\sqrt{1-{\varPhi _{{ij}}}^2}}$$

P values were then calculated using the z-score and adjusted for the issue of multiple testing. Only disease pairs with strong comorbidity strength (i.e., RR > 1, Φ-correlation > 0, and q value < 0.05) were included in the comorbidity network and disease trajectory analyses.

In the comorbidity network analysis, we used logistic regression to determine the magnitude of association between the disease pairs with strong comorbidity strength (i.e., significant non-temporal relationship). Disease pairs with confirmed positive association (i.e., odds ratio [OR] > 1 and q value < 0.05) were selected to construct a comorbidity network. The comorbidity network was then subdivided into different comorbidity modules with high intrinsic connectivity determined by the Louvain clustering algorithm59. For disease trajectory analysis, binomial tests were used to assess the temporal direction (i.e., D1 → D2 or D2 → D1 among D1D2 pairs) among disease pairs with strong comorbidity strength (i.e., significant temporal relationship). For each disease pair with a determined temporal order, we constructed a nested case-control dataset in the sub-cohort, by considering D2 as outcome and D1 as exposure. For each patient with D2, at most two controls were matched by sex and birth year using intensity density sampling. to confirm the magnitude of the association between the disease pair, we then used conditional logistic regression by adjusting for education level and Townsend Deprivation Index. We then included the disease pairs with positive associations (OR > 1 and q value < 0.05) to construct the disease trajectory.

As disease trajectory analysis is designed to visualize sequential disease progression while comorbidity network analysis captures disease groups with high intrinsic connectivity, the combined use of those two data-driven approaches can theoretically lead to the identification of more reliable disease clusters (i.e., groups of diseases with both temporal and non-temporal relationships). Thus, based on results from the aforementioned disease trajectory and comorbidity network analyses, we defined disease clusters as the first layer diseases (D1) and their subsequent diseases in a disease trajectory that were also located within the same comorbidity module (Fig. 3). For example, in the disease trajectories, out of all diseases in the first layer, “F32” and “F10” were located in one comorbidity module (i.e., the module predominated by psychiatric disorders) derived from the comorbidity network. We then found the following diseases of “F32” and “F10” in the trajectories which were also in such a comorbidity module to constitute a disease cluster (i.e., the disease cluster featured by psychiatric disorders including “E66”, “F10”, “F13”, “F19”, “F20”, “F30”, “F32”, “F39”, “F60”, and “F90”).

Validation of associated disease clusters in the UK cohort

To validate the identified disease clusters in the UK cohort, we used Cox models to assess the associations between anxiety and stress-related disorders and each medical condition of the identified disease cluster, comparing affected patients to their matched unaffected individuals. The models were stratified by matching variables (i.e., sex and birth year) and adjusted for highest educational level and Townsend Deprivation Index. Only medical conditions with statistically significant positive associations were included in the comorbidity network analysis (same as described above) to construct disease clusters in the UK cohort. Trajectory analysis was not performed in the UK cohort due to the lack of complete primary care and outpatient data.

Genetic determinants of associated disease clusters using data from the UK Biobank

In the UK cohort, a cluster-specific “susceptibility score” was calculated to quantify the subsequent risk of each disease cluster for each patient with anxiety or stress-related disorder. The susceptibility score was defined as an individual person’s number of diagnosed diseases included in each disease cluster, according to the inpatient hospital data.

The quality control contains two parts. For quality control on individuals from the UK Biobank, we removed individuals with non-European ancestry, inconsistent sex, or sex chromosome aneuploidy. For quality control on individual level of genetic data, we first removed SNPs with imputation quality score <0.8, minor allele frequency < 0.001, or deviations from Hardy-Weinberg equilibrium (p < 1 × 10−10)60. We removed SNPs in the extended major histocompatibility complex (MHC) region (chr6: 25–34 Mb), considering the long-range linkage disequilibrium (LD) and special genetic architecture in this region. After standard GWAS quality control on the individual-level genotyping data60, we included 22,781 patients (out of the 23,365 patients) and 13,225,429 variants for further analysis.

To assess the association between SNPs and five susceptibility scores (as a continuous variable) respectively, we used mixed linear model (MLM)-based models for GWAS analysis, adjusted for sex, birth year, genotyping array, and the first ten principal components61. Independent significant SNPs with p < 5 × 10−8 were identified for each genomic locus. The inflation of GWAS analyses was tested by LD score regression62. Together with surrounding genomic loci that were identified based on LD structure at r2 ≥ 0.6, all the SNPs were further mapped to identify potential genes using the web-based platform FUMA (http://fuma.ctglab.nl/)63. The strategies for gene mapping included positional mapping, expression quantitative trait loci mapping based on the GTEx v8 Project64, and Chromatin interaction mapping65,66. Further, these mapped genes were included in the gene-set enrichment analyses based on Metascape ( to identify underlying biological pathways for each disease cluster with the following ontology sources: KEGG Pathway, GO Biological Processes, Reactome Gene Sets, Canonical Pathways, CORUM, WikiPathways, and PANTHER Pathway67. To further investigate genetic overlap across disease clusters, the abovementioned mapping genes were included in the PPI network enrichment analysis to identify protein network components using the Molecular Complex Detection (MCODE) algorithm based on Metascape68 based on the following genomic interaction databases: STRING69, BioGrid70, OmniPath71, and InWeb_IM72. To test whether these identified cluster-specific genes are associated with any of the individual diseases in each disease cluster, we first conducted hypergeometric tests using the function of “GENE2FUNC” in the website platform ‘FUMA’. As an alternative approach serving a similar purpose, we searched GeneCards, a gene database providing all annotated and predicted human genes ( to identify traits that have been previously associated with the top 20 identified genes of each disease cluster.

In a sensitivity analysis, we repeated the genetic analysis merely for the disease clusters (and their components) that can be validated from the UK cohort. Additionally, to test whether the identified genes were primarily driven by the studied disease clusters, independent of the prior anxiety/stress-related disorders, we conducted additional GWAS analyses for those disease clusters among individuals without anxiety/stress-related disorders (n = 452,148).

Subgroup analyses

As we used the advanced age group (i.e., >second tertile) of the Swedish cohort in the main analysis, we repeated the phenotypic analyses in the entire Swedish cohort (N = 216,727). To assess whether the results would differ between anxiety and stress-related disorders, we constructed two independent matched cohorts for anxiety and stress-related disorders, separately, and repeated the main analyses to identify disease clusters associated with anxiety disorder and stress-related disorder exclusively. To explore the role of sex in disease clusters, we performed analyses separately for males and females.

The phenotypic analyses were conducted using SAS 9.4 (SAS Institute), R (Version 4.0.2), Python (Version 3.8), and Cytoscape (Version 3.8.0). PLINK (Version 1.9) and GCTA (Version 1.24) were used for genetic analyses. For multiple testing, a q value < 0.05 was considered statistically significant. This study was approved by the Ethical Vetting Board in Stockholm, Sweden (DNRs 2012/1814-31/4 and 2015/1062-32), the NHS National Research Ethics Service (16/NW/0274), and the Biomedical Research Ethics Committee of West China Hospital (2019-1171). The requirement of informed consent for Swedish participants is waived in register-based studies in Sweden, and all the participants in the UK Biobank provided written informed consent before data collection.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link

Leave a Reply

Your email address will not be published. Required fields are marked *