Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing

0
Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing

Identification of neuropsychiatric signs and symptoms and exploration of the labeled data

We have established a computational pipeline that consists of text parsers and NLP models to convert the extensive medical record summaries into clinical disease trajectories (Fig. 1a). This pipeline consists of three steps, with the first parsing NBB donor files, the second defining and predicting attributes in the clinical history (Extended Data Table 1) and converting the predicted signs and symptoms into clinical disease trajectories, and the third using the trajectories for downstream analyses. In total, we included 3,042 donor files from donors with various NDs (Extended Data Fig. 1a, Table 1 and Supplementary Tables 1 and 2).

Fig. 1: Introduction to the project.
figure 1

a, Workflow of the project describing the different data types in the NBB donor files (i), the processing of the clinical history data resulting in clinical disease trajectories (ii) and downstream analyses (iii). b, Clinical attributes (signs and symptoms), their domains, and groupings, including colors and illustrative brain icons. Relevant data, meta-data and analyses for this project can be found on https://nnd.app.rug.nl.

Table 1 Overview of the most common NDs and corresponding abbreviations, including ICD-10 codes

First, we defined a new crossdisorder clinical categorization system that contains 90 neuropsychiatric signs and symptoms, associated with brain disorders and overall wellbeing/functioning, across 5 broad domains (Fig. 1b). From a random set of 293 donors, 18,917 sentences were scored by 1 scorer to create a dataset to refine, validate and test different NLP models (Supplementary Table 3). To determine the reliability of the scoring process, 1,000 sentences were randomly selected and scored independently by another scorer. The interannotator agreement was high, corroborating the reliability of our gold standard (Cohen’s κ = 0.86). Next, we performed an enrichment analysis to determine whether the labeled signs and symptoms were more frequently observed in each disorder than expected by random chance. This analysis identified many expected disease-specific signs and symptoms such as ‘dementia’ being significantly enriched in AD, PDD, DLB and VD but not in PD without dementia and ‘bradykinesia’ in PD, PDD, MSA and PSP, disorders that are known to exhibit extrapyramidal symptoms (Extended Data Fig. 1b). These observed neuropsychiatric signs and symptoms were significantly overrepresented for a priori defined signs and symptoms of diagnostic importance (χ2 = 171.28, P = 1 × 10−31).

Refining NLP models and constructing clinical disease trajectories

To reliably identify neuropsychiatric signs and symptoms in individual sentences, we established a pipeline to refine and compare different NLP model architectures (Extended Data Fig. 2a). The data were divided into a training and a hold-out test set, stratified according to a relatively equal distribution of sign and symptom observations. We then employed a stratified fivefold crossvalidation approach, where models were refined in fourfold and validated on the remaining part of the data. Five different model architectures (bag of words model (BOW), support vector machine (SVM), Bio_ClinicalBERT, PubMedBERT and T5) were refined and optimized with Optuna, and the best performing model, according to average micro-F1-score and average micro-precision, was selected. Almost all signs and symptoms were reliably identified by all models, but a small subset of six signs and symptoms performed considerably less well. These consistently included the same attributes and were subsequently excluded. Next, the highest scoring iterations of each model architecture were compared using the hold-out test data, on which PubMedBERT showed the best model performance (Extended Data Fig. 2b). The optimal PubMedBERT architecture was fine-tuned again on all labeled data for the prediction of the 84 remaining signs and symptoms that exhibited a micro-precision ≥0.8 or a micro-F1-score ≥0.8 (Extended Data Fig. 2c). This final model was then used to predict whether specific signs or symptoms were described in individual sentences of the full corpus. To construct the final clinical disease trajectories (Supplementary Table 4), the predictions of multiple sentences were collapsed per year. These new clinical disease trajectories encompass a wider range of neuropsychiatric signs and symptoms, covering a longer time frame, and include a larger number of donors compared with what has been previously published (Supplementary Table 5).

Interpretation of signs and symptoms across common brain disorders

The clinical disease trajectories represent a distinctive dataset documenting neuropsychiatric signs and symptoms observed on a yearly basis for each donor. Again, we performed an enrichment analysis to determine whether the predicted signs or symptoms were more frequently observed in each disorder than expected (Fig. 2a). Of the signs and symptoms, 269 were significantly enriched in specific diagnoses, of which 148 were also a priori defined to be of diagnostic importance, a highly significant enrichment 2 = 295.96, P = 2.5 × 10−66). Importantly, the enrichment of the predicted dataset for a priori predicted signs and symptoms is much more pronounced than the labeled dataset, offering orthogonal evidence for the validity of our NLP approach.

Fig. 2: Clinical disease trajectories offer a wealth of information.
figure 2

a, Integrated plot showing attribute (y axis) manifestation by NDs (x axis). The dot size corresponds to the proportion of donors in which an attribute was observed. The dot color corresponds to the mean number of observations of an attribute across donors. Orange highlight and asterisks represent attributes important for diagnostics and significantly overrepresented signs/symptoms (one-sided permutation test, FDR-corrected P < 0.1), respectively. oth. path., other pathological. b, ‘Dementia’ temporal profiling (n = 1,326 donors, of which n = 682 with ≥1 ‘dementia’) showing density plot, Kaplan–Meier plot and three violin plots (center marker, box limits and whiskers represent the median, interquartile range (IQR) and 1.5× IQR). Two-sided Mann–Whitney U-test, FDR-corrected P values: *1.00 × 10−4 < P ≤ 1.00 × 10−2; **1.00 × 10−6 < P ≤ 1.00 × 10−4; ***1.00 × 10−8 < P ≤ 1.00 × 10−6; ****1.00 × 10−10 < P ≤ 1.00 × 10−8; ****P ≤ 1.00 × 10−10. c, ‘Bradykinesia’ temporal profiling plots (n = 762 donors, of which n = 268 with ≥1 ‘bradykinesia’). All plots as defined in b.

It is interesting that all neuropsychiatric signs and symptoms were significantly enriched in at least one brain disorder, suggesting that all these signs and symptoms were indeed relevant for (a subset) of disorders. As expected, ‘dementia’ and ‘memory impairment’ were significantly enriched in dementias including AD, FTD, DLB, VD and PDD, but not in PD without dementia. Similarly, MS showed a striking enrichment for ‘impaired mobility’ and ‘muscle weakness’ and ‘fatigue’, which is very much in line with the disabling pathology of the brain and spinal cord. However, where ‘impaired mobility’ was significantly enriched in MS, PD, PDD, PSP, ATAXIA and MSA, ‘muscle weakness’ was enriched in VD, MND, PSP, MSA and MS, showing that our approach can detect a unique compendium of signs and symptoms in a disorder-specific manner.

Dementias are frequently clinically misdiagnosed. Hence, we aimed to determine whether we could identify neuropsychiatric signs and symptoms that could contribute to improved differential diagnosis between subsets of frequently misdiagnosed disorders. We found a number of signs and symptoms that were uniquely enriched in specific dementia subtypes, including ‘paranoia’, and ‘façade behavior’ in AD and ‘hearing problem’ and ‘muscle weakness’ in VD (Extended Data Table 2). Similarly, MSA, PD, PSP and DLB are frequently misdiagnosed13,14. We found that ‘depressed mood’ was unique to PDD, ‘apraxias’ in DLB, ‘ataxia’ and ‘muscle fasciculation’ in MSA and ‘visual impairment’ in PSP (Extended Data Table 3). These findings suggest that we retrospectively have created a unique dataset that describes the clinical signs and symptoms that are associated with various brain disorders, which could contribute to improved diagnosis.

Temporal profiling of signs and symptoms across brain disorders

We utilized the clinical disease trajectories to conduct temporal profiling of specific neuropsychiatric signs and symptoms across various disorders. To this end, we calculated three different statistics. First, we calculated the total number of year observations in each condition in relation to the donors, to determine whether specific signs and symptoms were significantly more frequently observed in different diagnoses. Second, we calculated the temporal profile of those signs and symptoms, as a distribution of the years in which they were observed. Third, we performed a survival analysis to determine whether there are differences in the overall survival rate after the first observation of a sign or symptom between donors with different NDs. As expected, we observed that the attribute ‘dementia’ was present at a significantly younger age in FTD15 than in other dementias (Fig. 2b and Supplementary Table 6). The survival analysis showed that, after the first observation of ‘dementia’, the survival of donors with VD, PD or PDD was significantly shorter than donors with AD or FTD. These observations are in line with clinical expectations and corroborate the temporal validity of these clinical disease trajectories.

Synucleinopathies are neurological conditions that are characterized by α-synuclein protein aggregation, including PD, PDD, DLB and MSA. There is debate about whether these synucleinopathies are different manifestations of the same underlying neuropathology manifesting in different brain regions or whether there are unique neuropathological processes associated with each disorder14,16. By studying the temporal and survival profiles after the manifestation of specific symptoms, we can determine whether these disorders exhibit unique temporal features, suggesting qualitatively different neuropathological processes. To study this in more detail, we performed temporal profiling analyses with ‘bradykinesia’ (Fig. 2c and Supplementary Table 6). Similar to ‘dementia’ in FTD, we found that ‘bradykinesia’ was observed at a significantly younger age in MSA than in the other disorders. To the contrary, the survival analysis showed that donors with MSA, PSP and DLB with ‘bradykinesia’ had significantly shorter survival than donors with PD and PDD. These findings are in line with the hypothesis that there are qualitatively different aspects to these synucleinopathies, in which PD and PDD are very similar, but that DLB, and especially MSA, are uniquely different14,16. Both analyses corroborate the notion that many brain disorders exhibit partially overlapping clinical symptoms that manifest in a distinct temporal fashion, potentially indicative of the neuronal substructures that are affected.

We next compared rare and mixed dementias, including dementia-vascular encephalopathy (DEM-VE), DEM with senile involutive cortical changes (DEM-SICC) and AD-VE. Dementias are a broad category of disorders and mixed and rare forms of dementia are frequently disregarded. We found that ‘dementia’ was observed at a significantly later age in several mixed forms of dementia, including AD-VE and AD-PD, than in AD and VD (Extended Data Fig. 3), suggesting that the pathogenesis generally strikes at later age in patients with these mixed disorders. Furthermore, survival analysis suggests that AD, DLB and FTD might exhibit an extended survival period after the manifestation of ‘dementia’ compared with several other subtypes of dementia. Our analysis deviates in certain aspects from previous studies17,18, in which the diagnosis was based only on clinical data. Future studies using neuropathologically defined cohorts are necessary to address these differences.

Finally, clinically, it is difficult to differentiate between different FTD subtypes and associated conditions, hence we aimed to identify signs and symptoms that could differentiate subtypes (Extended Data Fig. 4a). ‘Dementia’ observations were significantly lower in PSP cases than in other FTD subtypes, suggesting that this FTD subtype is less affected by dementia, whereas ‘compulsive behavior’ was consistently higher in FTD-TAR DNA-binding protein (TDP)-B, FTD-TDP-C compared with many other FTD subtypes (Extended Data Fig. 4b). Temporally, ‘dementia’ was observed earliest in FTD tauopathy (FTD-TAU) and corticobasal degeneration (CBD) and latest in Pick’s disease (PiD) and PSP. This temporal profile was consistent when these analyses were performed using ‘memory impairment’. Many of these observations were in line with and extended upon earlier work and can contribute toward a better understanding of the relationship between neuropathology and clinical syndromes in FTD disorders19.

Comparing clinical with NDs

As neurodegenerative disorders are frequently clinically misdiagnosed10,11, we aimed to determine the diagnostic accuracy of this brain autopsy cohort. For this, we cleaned and linked the CD descriptions to the human disease ontology and compared the resulting CD labels with the ND (Fig. 3a). We then created a set of rules, exemplified in Fig. 3b, to calculate the diagnostic accuracy (Fig. 3c). Most importantly, 84% of neuropathologically defined AD donors and 83% of neuropathologically defined FTD donors were clinically diagnosed as AD (Jaccard score (JS) = 0.642) and FTD (JS = 0.466), respectively. We do note that this also includes ‘ambiguous’ diagnoses, such as the CD dementia. MSA (JS = 0.465) was frequently clinically diagnosed as PD and both VD (JS = 0.117) and PSP (JS = 0.510) were clinically diagnosed as multiple other disorders. Donors with both AD and DLB pathology were most often clinically diagnosed only with AD. These findings suggest that the brain donors of the NBB were also frequently diagnosed inaccurately, in a disease-specific manner.

Fig. 3: Comparison of CD with ND.
figure 3

a, Confusion matrix heatmap of ND (y axis) versus CD (x axis). Values represent diagnosis observations and hue represents the CD observations divided by the total ND observations for each disorder group. b, Table containing illustrative examples of donors to show how CD accuracy was assessed, resulting in three clinical accuracy categories: ‘accurate’, ‘ambiguous’ and ‘inaccurate’. Clinical accuracy is colored to reflect the AD Venn diagram in c. c, Venn diagrams depicting the intersection of ND and CD for 11 disorders and control cases. Total number of donors and the corresponding JS values are shown below the disorder abbreviation. The percentage represents the proportion of donors with ND who have the same CD (left) and the proportion of donors with CD who have the same ND (right).

Predicting brain disorders using clinical disease trajectories

With the integration of machine-learning models into healthcare practices, we aimed to assess whether the ND could reliably be predicted from clinical disease trajectories. For this, we established a workflow to train a gated recurrent unit (GRU-D) that is particularly developed to work with time-series data with missing values. This model could reliably diagnose most disorders for which we had a higher number of donors (Extended Data Fig. 5a). We also calculated the percentage of accurate diagnoses (in which the ND is considered to be the ground truth) for the GRU-D model (Extended Data Fig. 5b,c) and the CD. Out of 1,810 donors, 1,342 were accurately diagnosed by the model, 83 were ambiguously diagnosed (for example, an AD diagnosis for an AD-DLB donor) and 385 were inaccurately diagnosed. Clinically, 1,236 donors had an accurate diagnosis, 311 were ambiguous (for example, both AD and FTD written down for an AD donor) and 263 were inaccurate. This suggests that the model had a higher percentage of accurate and inaccurate diagnoses simultaneously, owing to the smaller percentage of ambiguous diagnosis.

Compared with the CD, the GRU-D predictions (Extended Data Fig. 5d) performed better for FTD, similarly for AD and PD and worse for MS and PSP. Both model and CD performed equally poorly on DLB, VD, MND and MSA. The GRU-D model performed best for the diagnosis of donors for whom we had at least 100 training cases, whereas most rare cases were missed. Of note, a subset of donors was consistently inaccurately diagnosed by clinicians and the model, indicating that these donors exhibited atypical disease-specific symptoms. We hypothesized that there might be commonalities in the symptomatology of donors with an inaccurate CD and included these inaccurately diagnosed donors as a separate category in the next analysis.

Dimensionality reduction to characterize the clinical heterogeneity

To better understand the clinical heterogeneity of the various brain disorders, we performed dimensionality reduction and clustering on the temporal clinical disease trajectories. Six main clusters were identified (Fig. 4a) that were enriched for: (1) different types of dementias, occurring later in life (LATE-DEM); (2) PD and related disorders that manifest extrapyramidal signs (PD+); (3) different types of dementias, occurring at an early age (EARLY-DEM); (4) CON donors and asymptomatic/mild brain disorders (CTRL/ASYM.); (5) motor disorders including MS, MND and ATAXIA (MS/+); and (6) psychiatric disorders (PSYCHIATRIC) (Fig. 4b,c). Of note, some disorders were clinically more homogeneous than others. For example, donors with AD, MSA, PD, FTD, MND, MS, PSYCH and CON tend to cluster relatively closely together, whereas donors with VD, PSP and DLB were much more heterogeneous (Fig. 4b).

Fig. 4: Characterizing clinical heterogeneity through dimensionality reduction.
figure 4

a, A wnn-UMAP scatterplot depicting the results of dimensionality reduction and clustering of clinical disease trajectories (n = 2,109 NBB donors) based on attribute observations and their temporal manifestation. b, A wnn-UMAP scatterplot from a depicting the NDs (as colors) and CD accuracy (shape in which circle = accurate or unknown and triangle = inaccurate). c, Bar graph showing ND distribution with results of significance testing (one-sided Fisher’s exact test) for overrepresentation of (1) ND across clusters (white asterisk) and (2) inaccurate CDs (black asterisk). FDR-corrected P values. *P ≤ 5.00 × 10−2, **P ≤ 5.00 × 10−4, ***P ≤ 5.00 × 10−6. d, Heatmap showing average number of observations (obs.) of significant attributes (left) and temporal (Temp.) plot showing the median age of onset of significant attributes (right) (two-sided Wilcoxon’s rank-sum test), with width set to the s.d. and height set to percentage of donors experiencing the attribute.

To obtain insight into the signs and symptoms that differentiate the clusters, we performed a differential analysis (Fig. 4d and Supplementary Tables 7–16). Three distinct observations were made. First, EARLY-DEM and LATE-DEM shared many signs and symptoms, but differed in their temporal manifestation, hence their names. Second, we observed a high number of motor domain attributes in both cluster PD+ and MS/+, with the PD+ cluster having mainly extrapyramidal symptoms and the MS/+ cluster mainly ‘muscle weakness’ and ‘impaired mobility’. Third, the PSYCHIATRIC cluster manifested more psychiatric symptoms. These observations largely align with our previous characterizations when we compiled donors according to their diagnosis but, in addition, also illustrate the heterogeneity of these disorders.

In addition, we performed an overrepresentation analysis to determine whether clinically inaccurately diagnosed donors were overrepresented in specific clusters (Fig. 4b,c and Supplementary Table 6). It is interesting that inaccurate FTD, AD, PD, PSP and CON donors were overrepresented in clusters other than their accurately diagnosed counterparts, suggesting that these atypical donors share clinical features with each other that masquerade as another group of disorders. For example, inaccurate AD donors often masquerade as PD+ disorders, and vice versa, whereas inaccurate MSA donors often manifest as early or late dementia. This insight elucidates the difficulty of achieving precise diagnoses in a substantial proportion of patients with neurodegeneration.

To assess the validity of the identified clusters, we aimed to perform an enrichment analysis for the APOE4/4 genotype, which is associated with early AD and more severe neurodegeneration in general20,21,22,23. Notably, the EARLY-DEM cluster exhibited a robust and highly significant enrichment for the APOE4/4 genotype (P = 5.50 × 10−8), the LATE-DEM cluster showed a modest significant enrichment (P = 1.32 × 10−3), whereas the CTRL/ASYM cluster was significantly underrepresented (P = 2.87 × 10−4). The remaining clusters did not display significant over- or underrepresentation. These findings offer orthogonal genetic evidence for the validity of these clusters.

Subclustering analysis to identify data-driven clinical subtypes

To better understand the heterogeneity of donors within a cluster and to identify data-driven clinical subtypes of disease, we performed a subclustering analysis on donors grouped together in a main cluster.

Subclustering analysis of the merged-DEM clusters (EARLY-DEM and LATE-DEM) resulted in four subclusters (1, s-LATE-DEM; 2, EARLY-DEM; 3, MOTOR-DEM; and 4, PSYCH-DEM) (Fig. 5a). Subcluster 1 (s-LATE-DEM) was significantly enriched for AD and DEM-SICC and inaccurately diagnosed FTD-TDP. Subcluster 2 (s-EARLY-DEM) was significantly enriched for FTD-TDP, FTD-fused in sarcoma (FUS), FTD-TAU and PiD. The symptomatology of this cluster in general manifested at a younger age and showed more ‘compulsive behavior’. Subcluster 3 (MOTOR-DEM) was characterized by ‘muscle weakness’, ‘impaired mobility’ and other motor domain symptoms (Extended Data Fig. 6a). This cluster was also significantly enriched for inaccurate AD, which suggests that AD cases with motor disturbances are clinically frequently misdiagnosed. Subcluster 4 (PSYCH-DEM) was overrepresented for DLB, DLB-SICC, PD, PD-AD and psychiatric donors. This analysis indicates that there might be clinical subtypes of dementia that are manifesting beyond the boundaries of the individual diagnosis that encompasses a relatively early type, psychiatric type, motoric type and generic dementia type. The presence of individual psychiatric and motoric symptoms in subsets of dementia cases has been reported previously7,24,25. However, to date, no studies have performed an integrative analysis of the combination of these neuropsychiatric signs and symptoms and their temporal manifestation, resulting in data-driven subtypes. These findings suggest that psychiatric and motor symptoms might be indicative of the clinical subtypes of dementia, potentially mediated by different neurological substructures.

Fig. 5: Identification of clinical subtypes.
figure 5

ad, Subclustering analysis of 997 EARLY-DEM + LATE-DEM donors (a), 444 PD+ donors (b), 275 MS/+ donors (c) and 135 PSYCHIATRIC donors (d), based on both attribute observations and their temporal manifestation. PP, primary progressive; RR, relapsing–remitting; SP, secondary progressive. Left, wnn-UMAP scatterplot depicting the results of dimensionality reduction and clustering of clinical disease trajectories and CD accuracy (shape in which circle = accurate or unknown and triangle = inaccurate). Right, bar graph showing ND distribution with results of significance testing (one-sided Fisher’s exact test) for overrepresentation of: (1) ND across clusters (white asterisk) and (2) inaccurate diagnoses (black asterisk). FDR-corrected P values. +P ≤ 1 × 10−1, *P < 5.00 × 10−2, **P < 5.00 × 10−4, ***P ≤ 5.00 × 10−6.

Next, we performed subclustering analysis on the PD+ cluster which resulted in four subclusters (1: LATE-PD+; 2: LATE-MENTAL-PD+; 3: EARLY-PD+; and 4: EARLY-MENTAL-PD+) (Fig. 5b). It is interesting that two subclusters showed a more limited number of signs and symptoms, one of which had an early onset (EARLY-PD+, enriched for MSA) and another with late onset (LATE-PD+, enriched for PD and inaccurate PSP donors). Conversely, the remaining two subclusters manifested a broader range of signs and symptoms in the cognitive and psychiatric domains (Extended Data Fig. 6b), again with early onset (EARLY-MENTAL-PD+) and late onset (LATE-MENTAL-PD+). It has previously been described that patients with PD and related disorders can manifest cognitive and psychiatric problems7,26,27. This analysis corroborates these findings and suggests that age of onset and whether mental problems are present are independent disease features.

We also performed a subclustering analysis on the MS/+ cluster (Fig. 5c) and identified three main clusters: SENSORY-MS/+, COG/PSYCH-MS/+ and VERBAL-MOTOR-DIS. Most MS donors were clustered in subclusters 1 and 2. The SENSORY-MS/+ subcluster manifested fatigue and many other attributes from the sensory/autonomic domain. The COG/PSYCH-MS/+ subcluster showed attributes from the cognitive and psychiatric domain. Finally, the third VERBAL-MOTOR-DIS subcluster was significantly enriched for amyotrophic lateral sclerosis and other MNDs, controls and MSA, manifested later in life (Extended Data Fig. 7a). MS, MSA and MND have previously been associated with sensory, mental and motor problems28,29. Our analysis expands on these observations and suggests that these motor disorders manifest these symptoms largely independently and these data-driven subtypes are indicative of different neurological substructures being affected.

Increasing lines of evidence suggest that mental illnesses are not discrete categories but that individuals with these disorders manifest behavior along a spectrum of traits4,30. Our analysis of the PSYCHIATRIC cluster corroborates this notion because we found three subclusters beyond the confines of the psychiatric diagnosis (Fig. 5d and Extended Data Fig. 7b). Subcluster 1 (PSY-DEP) was enriched for CON and primarily exhibited ‘depressed mood’. Subcluster 2 (PSY-MANIC) was enriched for BP, which was primarily enriched for ‘mania’ and extrapyramidal signs. Subcluster 3 (PSY-PSYCHOSIS) exhibits many observations of ‘psychosis’ and ‘feeling suicidal’, with an early age of onset, and was enriched for SCZ donors.

link

Leave a Reply

Your email address will not be published. Required fields are marked *