Neurofilament light and heterogeneity of disease progression in amyotrophic lateral sclerosis: development and validation of a prediction model to improve interventional trials

Interventional trials in amyotrophic lateral sclerosis (ALS) suffer from the heterogeneity of the disease as it considerably reduces statistical power. We asked if blood neurofilament light chains (NfL) could be used to anticipate disease progression and increase trial power. In 125 patients with ALS from three independent prospective studies—one observational study and two interventional trials—we developed and externally validated a multivariate linear model for predicting disease progression, measured by the monthly decrease of the ALS Functional Rating Scale Revised (ALSFRS-R) score. We trained the prediction model in the observational study and tested the predictive value of the following parameters assessed at diagnosis: NfL levels, sex, age, site of onset, body mass index, disease duration, ALSFRS-R score, and monthly ALSFRS-R score decrease since disease onset. We then applied the resulting model in the other two study cohorts to assess the actual utility for interventional trials. We analyzed the impact on trial power in mixed-effects models and compared the performance of the NfL model with two currently used predictive approaches, which anticipate disease progression using the ALSFRS-R decrease during a three-month observational period (lead-in) or since disease onset (ΔFRS). Among the parameters provided, the NfL levels (P < 0.001) and the interaction with site of onset (P < 0.01) contributed significantly to the prediction, forming a robust NfL prediction model (R = 0.67). Model application in the trial cohorts confirmed its applicability and revealed superiority over lead-in and ΔFRS-based approaches. The NfL model improved statistical power by 61% and 22% (95% confidence intervals: 54%–66%, 7%–29%). The use of the NfL-based prediction model to compensate for clinical heterogeneity in ALS could significantly increase the trial power. NCT00868166, registered March 23, 2009; NCT02306590, registered December 2, 2014.


Introduction
The Amyotrophic Lateral Sclerosis Functional Rating Scale Revised (ALSFRS-R) score has become the predominantly used primary outcome parameter in ALS trials [1,2]. The ALSFRS-R assesses the functional capability of ALS patients in daily life, and the score points lost per month are an established parameter for disease progression rate [3]. Interventional trials use the score to investigate if a treatment slows down the functional decline.
Due to the heterogeneity of the disease, the progression rates vary greatly between patients [4][5][6][7]. Figure 1a shows the variability of progression rates in an example of our trial participants. The interindividual differences hamper the ability to recognize a treatment effect and thereby reduce the statistical power. Therefore, the disease heterogeneity is a major challenge in the design of ALSFRS-R-based trials [8][9][10]. The power issues have been discussed in some positive ALSFRS-R-based phase 2 trials, of which the positive results could not be reproduced in phase 3, as well as in the context of many negative trials in ALS [8,[11][12][13][14][15]. Due to the low prevalence of ALS, the heterogeneity cannot easily be compensated by increasing the number of trial participants [16]. Using prediction models to anticipate the patients' disease progression rates throughout a trial is considered a promising strategy to meet this challenge [17][18][19][20][21] (see also Fig. 1b).
However, there is still a lack of sufficiently validated prediction models for the ALSFRS-R course, and thus no implementation of such models in randomized controlled trials to date. Instead of using prediction models, current clinical trials have measured the ALSFRS-R decrease during an observational phase of several months in the enrolment process or used the monthly decrease of ALSFRS-R score since disease onset (ΔFRS) to estimate a patient's disease progression.
Neurofilaments are increasingly recognized as a prognostic biomarker for ALS [22,23]. For blood levels of neurofilament light chains (NfLs), a moderate correlation with disease progression rate has repeatedly been reported [24][25][26][27][28]. The good accessibility, objectivity, and prognostic value have made NfL a promising candidate biomarker to improve prediction models.
In this study, we set out to study if NfL levels could be used to improve prognostic models, and evaluate the transferability of the prediction models to new datasets to test their practical applications and to quantify the potential impact on trial power.

Study cohorts
This study included three independent ALS cohorts, an observational cohort for model development (DC) and two trial cohorts for model validation (V1 and V2) (Fig. 2). The DC cohort consists of patients with ALS who participated in an observational study at the Department of Neurology at Ulm University (German Network for Motor Neuron Diseases, MND-NET site Ulm, Ulm, Germany). At the time of analysis, the observational study included prospective clinical parameters and biosamples of 1440 patients collected between 07/2012 and 07/2019, of whom 560 had an available blood sample at diagnosis. To ensure a trial-like longitudinal cohort and the broadest possible progression spectrum for model development, we defined the following eligibility criteria: patients at their time of diagnosis with a probable (clinically or laboratory-supported) or definite ALS according to the revised version of the El Escorial criteria [29], with follow-up blood sampling between 5 and 12 months after diagnosis. Also, the patients needed a documented continuous riluzole treatment since the use of riluzole is an inclusion criterion in most interventional trials. Treatments with edaravone, rasagiline, or high-caloric nutritional supplements were defined as exclusion criteria due to potential disease-modifying effects [30][31][32].
The validation cohorts were acquired from the placebo arms of two completed ALS trials. V1 consisted of patients on placebo who participated in LIPCAL ALS, a trial investigating a high caloric fatty diet (conducted between 02/2015 and 09/2018; n = 201; follow-up time: 18 months) [32,33]. In this study, serum was collected on a voluntary basis, resulting in blood sample availability in 46 patients [33]. V2 consisted of patients who participated in the MitoTarget ALS trial investigating olesoxime, a drug interacting with mitochondrial membrane proteins and associated with neuroprotective features (conducted between 05/2009 and 09/2011; n = 512; follow-up time: 18 months) [19,34]. We used EDTA plasma samples from a randomly selected subgroup of patients who had completed the 18-month follow-up, equivalent to 33 patients on placebo [19]. In both trials, the patients were continuously treated with riluzole.

Model development
The prediction model was developed in the DC cohort in patients at their time of diagnosis using multiple linear regression, with the subsequent decrease of ALSFRS-R score per month (ALSFRS-R slope; pt/m) as the dependent variable. We investigated the predictive value of blood NfL levels at the time of diagnosis and the clinical parameters sex, age, site of onset (bulbar or spinal), body mass index, disease duration, monthly ALSFRS-R decrease since disease onset (ΔFRS), and ALSFRS-R score at diagnosis as independent variables. NfL was logarithmically transformed to ln(NfL) to achieve a normal distribution.
To determine the predictive quality of the candidate predictors, we compared all possible combinations of candidate predictors using a sequence of F-Tests. To identify the variables for the final prediction model, we eliminated in a one-by-one manner the non-significant variables with the largest P-value in the coefficient analysis until only variables remained that statistically significantly contributed to the prediction.

Model validation
For external model validation in the two validation cohorts, we used the patients' baseline data and our model to predict each patient's future ALSFRS-R slope. After this, we evaluated the absolute deviation between the predicted ALSFRS-R slopes and the ALSFRS-R slopes observed during the trial follow-up time, and visually checked for accuracy and systematic deviations.
In the second step, we split the follow-up time in the validation cohorts and separately computed ALSFRS-R slopes for the first three months of the study period mimicking an observational period (lead-in) and the subsequent time mimicking an interventional period. The splitting allowed us to compare the ALSFRS-R slopes during the interventional period with ALSFRS-R slopes predicted for each patient in three different ways: (1) using our prediction model with NfL levels and clinical parameters assessed at study baseline, (2) using ΔFRS, and (3) using the ALSFRS-R slope during the lead-in period.
The predictive quality of each prediction method was evaluated using established statistical methods: rootmean-square error (RMSE), Coefficient of Determination (CoefD), and variance change (see Statistical methods).
Finally, we analyzed each method's impact on statistical power using an approach based on mixed-effects models introduced by Küffner et al. [18] (see Statistical methods). Briefly, it computes the hypothetical reduction in trial size that could be compensated by normalizing on the disease progression rates derived from a predictive model. Using this method, trial size reduction becomes a measure for the increase in statistical power.

NfL assay
Blood samples were obtained from peripheral blood and stored with strict adherence to standard operating procedures [35]. NfL concentrations in serum (DC and V1) or EDTA-plasma (V2) were measured in the same laboratory (Department of Neurology, University of Ulm, Germany) on a SIMOA HD-1 analyzer, using a commercially available kit (Quanterix, Lexington, MA) with an analytical limit of detection of 0.038 pg/ml given by the manufacturer, equally usable for serum and EDTAplasma. Temporal fluctuations of ln(NfL) levels were studied by relative deviations from the patient's mean ln(NfL) value.

Statistical methods
All statistical analyses were done in R version 4.0.0 using R packages lme4 (version 1.1.23), tidyverse (version 1.3.0), and cowplot (version 1.0.0) at a level of significance of P < 0.05.
The ALSFRS-R slopes in the DC were computed by linear regression, using all ALSFRS-R scores of a patient and their corresponding times since disease onset. The trial ALSFRS-R slopes were computed using linear regression with a patient's ALSFRS-R scores assessed throughout the trial and trial duration. The ΔFRS was computed using the formula: with 48 being the maximum ALSFRS-R score.
Goodness-of-fit for the internal validation was measured as the adjusted R 2 and its square root R. For external validation, RMSE, CoefD, and variance change were computed with the following formulas: where p is the predicted ALSFRS-R slope, y the ALS-FRS-R slope in the interventional period, and m the mean of y. Trend lines and standard error in Fig. 5 were calculated using ggplot2's geom_smooth function with method = `lm`.
To compute the trial size savings for a randomized, placebo-controlled clinical trial, we adopted the approach by Küffner et al. [18]. We randomly assigned patients from the validation cohorts (V1/V2) to treatment and control groups with equal sizes, retained only timepoints from the interventional period, and centered time to 0 at the start of the interventional period for each patient. We then fitted the following multivariable mixed-effects 48 − ALSFRS − R Score at baseline months between disease onset and baseline model for one cohort at a time and using the predictions from the NfL model, the lead-in period and ΔFRS: where a ij is the ALS-FRS-R value of patient i at time point j, a i1 is the first time point used as model offset, t ij is the time since baseline, treatment i is 0 (control) or 1 (treatment), and p i is the predicted progression rate. In this model, β 0 is the global intercept, β 1 is the slope over time with a random effect per patient b i , β 2 is the coefficient measuring the treatment effect and β 3 and β 4 are coefficients for the predicted slope. The standard error of β 2 was used to compare models for their statistical power to detect treatment effects using the formula where SE alt is the standard error of β 2 in the above model, and SE null is the standard error in a reduced model lacking all terms involving predictive information p i . To account for each patient's random assignment to a placebo or treatment group in the mixed-effects models, we applied each model in 10,000 permutations per predictive method and cohort. The reported 95% CIs are Monte Carlo CIs, i.e. the 2.5% and 97.5% quantiles across these permutations.

Patient characteristics
In the observational study, 46 patients were eligible for model development. ALS was diagnosed at median  Table 1 displays the patient characteristics of the three cohorts.

NfL and its interaction with site of disease onset enable robust predictions
Multivariate regression, including all candidate predictors in the development cohort, showed that the ALSFRS-R slopes were significantly correlated with the ln(NfL) values (P < 0.001) and their interaction with site of onset (P < 0.01). Higher ln(NfL) levels were indicative of faster disease progression, and the interaction with site of onset (5) resulted in a greater change of the ALSFRS-R slope per change of one NfL log unit in bulbar-onset patients compared to spinal-onset patients (Fig. 3). The candidate predictors sex, age, body mass index, disease duration, ΔFRS, and ALSFRS-R score at diagnosis did not add significantly to the prediction and hence were not included in the final model. By testing all possible compositions of predictors in multivariate linear regression models, we found that the models including ln(NfL) always outperformed corresponding models without ln(NfL). The final NfL model had the following form, where S is the site of onset (S = 1 for spinal, S = 0 for bulbar): Applying this formula, an NfL value of 100 pg/ml results in an ALSFRS-R slope of − 0.75 pt/m -at this value, patients with a spinal disease onset and those with a bulbar disease onset would have an identical ALSFRS-R slope. The model predicts a ± 0.5 pt/m change of progression rate by an NfL level change of ± 1.67 logarithmic units for patients with spinal onset and ± 0.44 logarithmic units for patients with bulbar onset, respectively.
Internal validation of the model showed a correlation of R = 0.67 between predicted and measured ALSFRS-R slopes. We verified the model, applying the same development process in the validation cohorts; this led us to the same significant predictors showing similarly high correlations (R = 0.65 and 0.62). Including the site of onset significantly improved model performance compared to using ln(NfL) as the only predictor in each cohort.

NfL levels are stable over time
To assess the temporal stability of ln(NfL) measurements for each patient, we visualized the trajectories   for all three cohorts (Fig. 4). Measurements from different patients ranged over multiple orders of magnitude, while measurements from the same patient showed comparatively small variation. Importantly, these variations were small compared to the patient's average ln(NfL) value, with mean relative deviations (± SD) in the DC/ V1/V2 cohorts of 2.5% (± 3.7%), 3.0% (± 3.8%), and 4.8% (± 7.2%).

The NfL prediction model is transferable to actual clinical trials and increases statistical power
As illustrated in Fig. 5a, the model could predict the correct ALSFRS-R slope with less than 0.5 pt/m error for 72%, 59%, and 89% of patients in the DC/V1/V2 cohorts, respectively. Furthermore, absolute deviations were randomly scattered around zero, indicating that the NfL model can be applied equally for patients with low and high progression rates. Importantly, we also did not observe a systematic pattern of deviation concerning disease duration (Fig. 5b). Figure 6 compares the measured ALSFRS-R slopes to the predictions computed with the NfL model, the ΔFRS method, and the lead-in period. The NfL model added valuable information in both validation cohorts, as indicated by the lowest RMSE, positive CoefD values, and a considerable decrease in slope variances. In contrast, we observed negative CoefD, high RMSE, and increased variance for the lead-in approach. Using ΔFRS-based predictions, the CoefD values and variance change remained close to zero.

How to apply the model in clinical trials to improve study power
The prediction model could be applied to compensate for heterogeneity of disease progression in a clinical trial with ALSFRS-R as the outcome parameter, using the follows steps: 1. Assessment of site of disease onset (bulbar or spinal) and measurement of NfL in serum or plasma at study baseline (If the measurements are performed with a different assay or if different pre-processing is used, a conversion factor may have to be established); 2. Insert the parameters into the prediction formula we developed: (S = 1 for spinal, S = 0 for bulbar), to compute the predicted ALSFRS-R slope for each participant of the study; 3. For each participant, compute the ALSFRS-R slope actually observed throughout the study; 4. For each participant, compute the difference between the ALSFRS-R slope observed throughout the study and the ALSFRS-R slope predicted using the formula; 5. Compare the mean or median difference between observed and predicted ALSFRS-R slope in the placebo and the active treatment groups.
As an alternative to steps 4 and 5, the linear mixed model as described in the methods can be implemented, and coefficient beta2 (interaction of treatment effect and time slope) be tested for significance.
In a conventional study, only the mean or median progression rate of the active treatment and placebo groups can be compared, which are strongly dependent on the Using the prediction model, it is possible to anticipate the natural course of the disease and include it in the analysis-this significantly increases the study's statistical power.

Discussion
Interventional trials in ALS suffer from heterogeneity of the disease, which considerably reduces the statistical power of the trials [8][9][10]. Predictive models that take into account disease heterogeneity are considered a promising tool to meet this challenge but have hardly been established to date [17][18][19][20][21]. In this study, we developed an NfL-based prediction model for monthly decrease of ALSFRS-R score and demonstrated its applicability and impact on trial power.
We showed that adding NfL levels to multivariate predictive models significantly and consistently increased their predictive performance compared to models restricted to clinical parameters. Küffner et al. [18] have tested a combination of complex prediction models for ALSFRS-R slopes that incorporated hundreds of clinical variables and basic laboratory parameters-but not neurochemical biomarkers. The predictive quality of this approach is similar to the accuracy provided by blood NfL as a standalone marker [24][25][26][27][28]. This, together with our findings, highlights the predictive value of NfL, especially in comparison to clinical parameters. Interestingly, we found that the NfL-based predictions could be improved by a logarithmic transformation of NfL levels and by including the interaction between NfL levels and the site of disease onset. Our results are consistent with a recent finding that NfL could be used to increase the accuracy of a model predicting progression rates [20]. In addition, we further confirmed the finding of previous studies that blood NfL levels measured longitudinally are steady on a patient level in a timeframe most relevant for interventional trials [20,[24][25][26]36]. In summary, these results show that the Nfl-based predictive models meet the basic requirements for application in clinical studies.
Importantly, we demonstrated that the NfL-based prediction model is transferable to new datasets and that the predictions do not systematically depend on the progression rate or disease duration. Using mixed-effects models to simulate randomization and treatment effects of an actual trial, we observed that the prediction model could significantly increase the statistical power by up to 61%. As far as we know, only two previous studies had quantified the impact of prediction models for ALS disease progression on the statistical power. Küffner et al. [18] reported a 20% increase of trial power using the abovementioned combination of complex models without neurochemical biomarkers. In contrast, a recently published NfL-based prediction model only yielded a modest increase of 8% in the trial power [20]. However, this model was developed incorporating a relatively large proportion of patients in later disease stages and thus tended to underrepresent patients with faster disease progression [20]. By developing our NfL model in patients at the time of diagnosis, we aimed to base the model on the broadest possible progression spectrum and supposed that this had a crucial effect, though, at the same time, it reduced the number of evaluable patients. In summary, our results can serve as a proof of concept for the use of NfL-based prediction models to target the challenge of disease heterogeneity in interventional trials.
Eventually, the application of the NfL-based model is only useful if it is advantageous over the strategies currently used. By analyzing the predictive value of our NfL model in direct comparison to the ΔFRS and a lead-in period approaches used in recent trials, we observed superiority of the NfL-based approach [30][31][32][37][38][39]. Surprisingly, we also found that a lead-in period was not a reliable method to anticipate disease progression in our cohorts. Although there could be other reasons to conduct a lead-in period, predicting the ALSFRS-R course seems not to justify the delay in starting the intervention and the costs incurred [40]. Replacing a lead-in period by a single NfL measurement would result in a shorter interval from disease onset to therapy, potentially being favorable to detect a drug's efficacy because of less advanced motorneuron degeneration [9].
Our study has some limitations. Although the model shows a reliable test performance at the group level, we observed clinically relevant deviations in some patients, making the model less suitable for individual predictions. The predictive accuracy might be further improved by incorporating respiratory parameters and genetic status or by adjusting for confounders of NfL levels such as age and related morphologic brain changes [41], renal function, and blood volume [41][42][43]; however, the data available for our cohorts did not allow further investigations. Methods to reduce the noise of the ALSFRS-R itself during assessment might also enhance the development of prediction models [1,[44][45][46]. Here, we restricted our analyses to a linear model as it facilitates direct interpretability of the influence of the candidate predictors on the accuracy of the prediction. Non-linear and nonparametric approaches are harder to interpret but might outperform linear models in terms of accuracy [7,47] and hence would be an interesting topic for future studies. An additional topic for future research could be predictive models for survival, which could be used for studies with survival as the primary endpoint or for reasons of stratified randomization and anticipation of dropouts. An exploratory analysis of the utility of NfL for this purpose is provided in Additional file 3. A further limitation is the relatively small sample size in each cohort. Multicenter evaluation in future studies or incorporation of the prediction model in actual phase 3 trials could help gain more insights into the strengths and limits of the model. Due to the limited blood sample availability for model validation, the validation cohorts represent only an excerpt of the original placebo cohorts. In the V2 cohort, this caused a shift to younger patients with a rather slow disease progression and revealed that the NfL model may perform better in cohorts including a broader progression spectrum. Eventually, the blood samples in the V2 cohort were EDTA plasma, while the DC and V1 cohorts provided serum samples. Although the NfL assay is equally usable for serum and EDTA-plasma and quality controls did not reveal deviations exceeding the interassay coefficients of variability, it represents a potential confounder.

Conclusions
Blood NfL is a valuable prognostic biomarker. Using the NfL-based prediction model to compensate for clinical heterogeneity could considerably improve the trial power and help distinguish treatment effects from the inter-individual variance of disease progression in future randomized controlled trials. The complementary implementation of NfL-based prediction models in ALS trials could provide further insights into the possible applications.