Introduction
Over the last decade, many studies have reported the existence of distinct progressive supranuclear palsy (PSP) phenotypes characterized by different initial clinical presentation and progression, with PSP-Richardson’s syndrome (PSP-RS) and PSP-parkinsonism (PSP-P) as the most frequent phenotypes [
1‐
6]. PSP-RS patients usually show a more severe disease course and overall earlier appearance of PSP typical symptoms, but the clinical differential diagnosis between different PSP phenotypes is challenging also for movement disorder specialists [
1,
2,
7‐
11]. It is based on the clinical presentation at the beginning of the disease, and the main difference between PSP-RS and PSP-P relies on postural instability (PI), which must be present within the first 3 years of the disease for a PSP-RS diagnosis, while is usually tardive in PSP-P [
1,
2,
8,
12]. The first logical implication is that PSP-P diagnosis requires a disease duration of at least three years to rule out the appearance of early falls, thus configuring a significant diagnostic delay [
1]. In addition, establishing the presence of PI can be difficult in the early stage, since the pull-test is not an objective test, suffering from variability of the pull strength and patient conditioning as well as from patient’s attention/cognition, age and comorbidities [
13,
14]. On the other hand, in patients with advanced disease, establishing the exact time of appearance of PI is difficult, since falls may have different causes including freezing of gait, impaired balance, cognitive decline and environmental factors [
15,
16]. On these bases, objective imaging biomarkers to support the differential diagnosis between common PSP phenotypes are urgently needed.
Most studies so far focused on the differential diagnosis between PSP and other parkinsonian syndromes, and several imaging biomarkers have been reported to distinguish PSP-RS from PD and multiple system atrophy, including planimetric MRI measures (manual or automated) [
17‐
20], brain volumetry [
21,
22], diffusion tensor imaging metrics [
23‐
25], and PET imaging with
18FDG [
26] or tau tracers [
27]. Among the MR planimetric measures, most studies evaluated the midbrain/pons area ratio and the Magnetic Resonance Parkinsonism Index (MRPI). This latter index is a MR planimetric biomarker combining the midbrain area and the superior cerebellar peduncle width (normalized by pons area and middle cerebellar peduncle width respectively, as reference structures), which can be calculated by multiplying the pons/midbrain area ratio by the ratio between middle cerebellar peduncle width and superior cerebellar peduncle width [
19]. A few imaging biomarkers, such as the MRPI 2.0 (a second version of MRPI, obtained by multiplying the MRPI value by the third ventricle width normalized by the frontal horns width) [
28,
29] and FDG-PET [
26], showed good performances also in distinguishing PSP-P from PD patients. Accurate biomarkers, however, to distinguish between PSP-RS and PSP-P are still lagging behind and are not currently available.
Advancements in machine learning (ML) have permeated various domains of medicine, through the development of accurate classification or prediction models which may assist physicians in clinical decision making [
30,
31]. Several machine learning algorithms have been successfully applied on structural MRI data in the differential diagnosis of neurological diseases [
21,
22,
32,
33]. Random Forest (RF) and XGBoost are widely used classification algorithms with a decision tree-based approach: RF is an algorithm based on classification and regression tree (CART) introduced by Breiman [
34], which constructs trees in parallel and makes predictions through majority voting; XGBoost algorithm uses eXtreme Gradient Boosting for maximizing the classification performance, generating trees sequentially leveraging error correction to improve their performance [
35].
In the current study we investigated if the MRPI and MRPI 2.0, alone or included in decision tree-based machine learning models (XGBoost and RF) in combination with other MRI structural data, could differentiate between PSP-RS and PSP-P.
Discussion
In this study, we investigated the role of several structural MRI features including both planimetric (MRPI and MRPI 2.0) and volumetric data (cortical thickness, cortical volumes and subcortical volumes), in differentiating between PSP-RS and PSP-P patients. Machine Learning models using a combination of MRPI, and volumetric/thickness data showed the best classification performance in distinguishing between these two PSP phenotypes.
Differentiating between PSP-RS and PSP-P may be challenging in clinical practice [
7‐
11], suggesting the need for objective imaging biomarkers to support the differential diagnosis between these two diseases. Previous MR studies found smaller volume of midbrain, superior cerebellar peduncles (SCPs), subthalamic nucleus and cerebellum, and more widespread white matter (WM) involvement in PSP-RS than in PSP-P at the group level [
48‐
51]. Pilot studies in small PSP cohorts reported excellent performances in differentiating between PSP-RS and PSP-P using DTI metrics in the dentatorubrothalamic tract [
23,
50], but these findings were not confirmed by other authors [
52,
53], making further studies necessary to explore the potential of DTI in the differential diagnosis between PSP phenotypes. Taken together, these findings suggest that no robust imaging biomarker to accurately differentiate among PSP-RS and PSP-P phenotypes at individual level is currently available.
The MRPI and MRPI 2.0 (a second version of this biomarker also including the measurement of the third ventricle width) are two well-known automated biomarkers to distinguish PSP-RS and PSP-P from other parkinsonian syndromes [
17,
28]. Here, we investigated the performance of these biomarkers in distinguishing between these two PSP phenotypes. In our cohort, PSP-RS patients had higher MRPI and MRPI 2.0 values than PSP-P, and these biomarkers showed acceptable performances (AUC 0.88 and 0.81, respectively) using ROC analysis in differentiating between these two diseases. Similar results were obtained in the early PSP cohorts where MRPI and MRPI 2.0 showed AUC of 0.87 and 0.79, respectively in differentiating PSP-RS from PSP-P. Our results are in line with some previous reports [
51,
54] and slightly better than others [
4,
55] showing suboptimal performances of these MR biomarkers in distinguishing between PSP phenotypes. Previous evidence demonstrated that the MRPI 2.0 was more powerful than the MRPI in distinguishing patients with PSP-P from those with Parkinson’s disease (PD) [
28,
29,
56]. In our study, however, the MRPI 2.0 was not superior to the MRPI in distinguishing between PSP-RS and PSP-P, likely due to the similar degree of third ventricle enlargement usually observed in these two PSP phenotypes [
28].
In the current study, we compared the performances of MRPI and MRPI 2.0 with those of cortical thickness, cortical volumes and subcortical volumes in differentiating between PSP-RS and PSP-P employing two of the most used decision tree-based approaches for ML classification (Random Forest and XGBoost). These ML models showed that cortical thickness, cortical volumes and subcortical volumes, used separately, were not able to accurately distinguish between PSP-RS and PSP-P patients, and that these features were less powerful than MRPI in differentiating between these two PSP phenotypes. This result may be surprising since PSP-RS and PSP-P showed significant differences in volumetric/cortical thickness atrophy of the brain. Indeed, in agreement with previous imaging and pathological data [
9,
57‐
59] a reduced volume in the thalamus, globus pallidus and cerebellum was found in PSP-RS compared to PSP-P patients. On the other hand, PSP-P patients showed more widespread cortical thinning than PSP-RS, involving also some temporal and parietal regions in addition to the frontal lobes, which were affected in both diseases. These between-group differences, however, were not large enough to allow these features to accurately classify PSP phenotypes.
In an effort to improve the classification accuracy of the automated MRPI biomarkers in the differential diagnosis between PSP phenotypes, in the current study, we combined MRPI and MRPI 2.0 with other structural MRI data (cortical thickness, cortical volumes and subcortical volumes) into ML models. This new approach yielded a very good performance (AUC 0.94) when MRPI, cortical thickness and subcortical volumes were combined together for differentiation between PSP-RS and PSP-P, outperforming these features used alone, and the performance improvement was even higher in the early cohort. The ML model with the best performance used XGBoost where MRPI was selected as the most important feature, both in the whole and in the early cohorts. This higher classification performance obtained with ML approach may be the result of combining the larger subcortical atrophy observed in PSP-RS patients (detected by MRPI and subcortical volumes) and the higher cortical involvement in PSP-P (detected by cortical thickness and volumes). These results on the combination of cortical and subcortical data are in line with very recent structural MRI studies in PSP. A recent large study [
60] demonstrated that the MRPI performed well in distinguishing pathologically-proven PSP-RS patients from cortico-basal degeneration (CBD) and from other neurodegenerative diseases including fronto-temporal lobe degeneration and Alzheimer’s disease, but the addition of cortical thickness data to the MRPI allowed to further increase the classification performances, due to the lower cortical atrophy in PSP-RS patients than in the other considered neurodegenerative conditions.
Finally, we investigated the performance of each structural MR metric in distinguishing between early and late patients, separately for PSP-RS and PSP-P, which may provide insights on the brain atrophy progression in these common PSP phenotypes. In our cohort, the cortical thickness was the best structural metric in distinguishing between early and late patients, both in PSP-RS and PSP-P cohorts. These results are in line with pathological and imaging studies showing that the neurodegenerative process usually starts in the brainstem regions and basal ganglia, and later spreads to cortical regions [
59,
61]. This time sequence thus makes brainstem atrophy more useful for the early differential diagnosis and cortical atrophy more suitable for distinguishing between early and late stages of the disease.
Overall, the two ML algorithm used in this study showed very similar results in most comparisons, with XGB showing slightly better performances than RF in a few cases. Although, these two tree-based ML algorithms share several rules for tree growing, they differ in the creation of the ensemble of trees. RF uses bagging to build trees in parallel and then the prediction is done by majority voting [
34]. On the contrary, XGB builds a sequential ensemble of trees with the aim to improve the performance of the previous tree by correcting its errors [
35]. Broadly speaking, XGB may thus be slightly more powerful than RF because of its ability to learn from its wrong predictions, which are corrected by giving more weight to the misclassified instances, and to its higher ability to deal with imbalanced datasets [
35,
62]. The main advantage of RF is that its performance may be less influenced by slight hyperparameters tuning modifications compared with XGB [
62], and the very similar results obtained using RF in the present work (compared to XGB) increase the reliability of the findings.
The importance of the current study, demonstrating a role of structural MRI in the differential diagnosis among common PSP phenotypes, is linked to the large clinical overlap between PSP-RS and PSP-P, which can make the clinical differential diagnosis difficult. Distinguishing between these two PSP phenotypes, however, is of extreme relevance in clinical practice for prognostic implications, since PSP-P is characterized by significantly slower disease progression than PSP-RS. Indeed, while PSP-RS is a rapidly progressive PSP phenotype, with death occurring after 6–8 years, PSP-P patients have a more benign disease course and longer survival [
63‐
65]. These discrepancies among PSP phenotypes may also significantly affect the results of clinical trials with new possible disease-modifying therapies in PSP patients. In fact, to avoid bias and optimize statistical power, it is crucial to include in these trials homogeneous populations with similar rate of progression over time, not lumping PSP patients with different phenotypes [
7,
65]. The current study provides evidence that ML models using structural combined MRI data can accurately differentiate between PSP-RS and PSP-P also in the early stage of the disease when patients are more suitable for enrollment in trials; thus, if further validated in independent cohorts, these automated imaging biomarkers to support PSP phenotype classification may significantly improve future clinical trial design in PSP. A limitation to the immediate widespread use of such biomarkers is the complexity of ML approaches, which require high level-technology and expertise not yet available in clinical routine; however, there is a growing interest in ML use for diagnostic purposes in medicine and such approaches will be likely available in clinical practice soon.
This study has several strengths. First, we enrolled a large cohort of around 100 probable PSP patients, including 40 PSP-P patients classified according to recent international diagnostic criteria. Second, all imaging data (thickness, volumes, MRPI and MRPI 2.0 values) were obtained using fully automated validated procedures. Third, two distinct decision-tree based ML models were compared, and the performances of the ML models were assessed using fivefold cross-validation with 5 repetitions to increase the reliability of the findings. Some limitations can be identified in the current study. First, PSP patients did not undergo autopsy, thus it is possible that in some cases the clinical diagnosis might be in error. However, clinical evaluations were performed according to the MDS diagnostic criteria for PSP-RS and PSP-P [
1] and the recent MAX rules [
8], by movement disorder specialists with more than 10 years of experience. Second, our study focused on PSP-RS and PSP-P only, while others PSP variants were not included due to low sample size. Third, an independent validation cohort is missing. In this study, two different ML algorithms showed similar classification performances, increasing the robustness of the findings; however, future studies to validate the performances of these models based on structural MR data in independent patient cohorts are warranted. Fourth, in this study we used only structural MRI data without exploring the potential of combining structural features with Quantitative Susceptibility Mapping or DTI data. However, structural data obtained from T1-weighted images have the advantage of wider availability and lower variability in the MR acquisition protocols, hopefully allowing a broader use of these biomarkers.
In conclusion, this study demonstrates that ML models combining the MRPI values with cortical thickness and volumetric data had high classification performances in distinguishing PSP-RS from PSP-P patients, also in the early stage of the disease, and can thus assist the differential diagnosis between these common PSP phenotypes in vivo.