Science News

Enhancing parkinson disease detection through feature based deep learning with autoencoders and neural networks

March 13, 2025

The implementation of the proposed method was conducted in the Jupyter Notebook integrated development environment (IDE). The system configuration consists of a Windows 10 operating system running on a machine with 16GB of RAM and a Core i5 processor. Additionally, there is a substantial 1 TB hard disk drive, offering ample storage capacity. This setup appears to be well-suited for a range of tasks, including data analysis and ML applications. To facilitate these tasks, a selection of essential Python packages is installed. Among these are sklearn for ML, pandas for information visualization, seaborn for manipulating information, numpy for numerical calculations, scipy for scientific computing, python’s standard library, and Keras for DL. With this combination of hardware and software resources, we have a robust environment capable of handling a variety of data-driven projects, from data preprocessing and analysis to model development and evaluation, making it a versatile setup for data scientists and ML practitioners. This experimental environment provided the necessary computational resources to develop, test, and evaluate the proposed method for its intended purpose. Python, with its extensive libraries and packages, offers a versatile platform for various ML and data analysis tasks, making it a suitable choice for conducting research and experiments. The Jupyter Notebook IDE, with its interactive and document-oriented interface, facilitates code development, experimentation, and result visualization, enhancing the efficiency of the research process. The specific hardware configuration mentioned ensures that the experiments were conducted on a reasonably capable system capable of handling the computational demands of the proposed method.

Loading a dataset from the IPVS Dataset, which contains audio recordings of both patients with PD and healthy individuals, represents a crucial initial step in leveraging ML for the classification or analysis of PD. The dataset’s audio format encapsulates valuable vocal characteristics and patterns that can be indicative of the disease’s presence or severity. However, before any meaningful analysis or modeling can take place, it is essential to involve in preprocessing, a critical phase that often includes techniques like the Modified Band Pass Filter. Despite the fact that the study does not consider the research obstacle of recording audio data, the following measures have been taken to prevent the vagueness of noises, inconsistency in quality of microphones, and patients’ behaviors during the recording. The FB-DNN model described in this paper uses configurable implementations of noise reduction filters and normalization for noise removal as part of the data preprocessing step. Such steps make it possible to extract the features that are disease related and not skewed by recording disorderliness. Furthermore, the data used in this research is a corpus built from restricted recordings that followed protocol to limit the variance from variations in microphone quality and recording backgrounds. The instructions for recording were given to patients and made an attempt was made to regulate the speed, loudness and clarity of speech among participants. Moreover, robust feature extraction is applied to the model and can handle minor inconsistencies in audio quality, using for instance dynamic filters, and feature normalization. Figure 5 shows the input audio.

The Modified Band Pass Filter is a signal processing method used to enhance specific frequency components in audio signals. In the context of PD analysis, it can be employed to isolate relevant vocal features and suppress noise. By applying this filter to the raw audio data, unwanted frequency components outside a predefined range are attenuated or eliminated. This process can improve the signal-to-noise ratio, making it easier to identify patterns and characteristics in the vocal recordings that are associated with the disease. Overall, the combination of loading audio data from the IPVS Dataset and preprocessing it with the Modified Band Pass Filter sets the stage for further analysis and modeling. It allows for the extraction of pertinent audio features that can subsequently be used in ML algorithms to classify or diagnose PD. This multi-step approach demonstrates the power of combining data acquisition from specialized sources with advanced signal processing techniques to drive advancements in healthcare and medical diagnostics. Figures 6 and 7 shows the preprocessed and various features of dataset.

The correlation matrix shown in Fig. 8, for a dataset containing 19 features from the IPVS Dataset offers a comprehensive insight into the interrelationships among these variables. This matrix quantifies the degree and direction of linear associations between pairs of features, which can be crucial for understanding the dataset’s underlying patterns. The matrix’s cells are every occupied by a correlation coefficient that indicates the degree and direction of the association among two characteristics; these coefficients are commonly calculated using Pearson’s correlation. Correlation coefficients near to 1 indicate a highly favorable relationship between the two variables. In contrast, negative correlations with values around − 1 show that one trait tends to improve while the other deteriorates. Close to zero indicates almost little linearity. When analyzing correlation coefficients of a variable, or a system of variables, it becomes easier for researchers and data analysts to discover interdependence as well as patterns within the data. For example, they may find the correlation between features and learn they are highly related which outcomes feature selection and dimensionality reduction techniques. On the other hand, they may discover other variables that have low association suggesting that the variables are orthogonal. Such information is useful in further analysis like, feature extraction, selecting the best model, and or developing hypotheses, which will be tested. Altogether, the correlation matrix of these 19 features becomes an initial and efficient method for analyzing the concrete characteristics of the IPVS Dataset as well as reveals many potentially significant and promising pathways to investigate issues related to PD and voice aspects. To make the understanding of Feature-Based Deep Neural Network (FB-DNN) decision formation easier and to increase the model explainability SHAP values or, Shapley Additive explanations used. SHAP values provide the assessment at feature level that shows how the FB-DNN detects Parkinsonian vocal impairments. The features found to be most important included jitter, shimmer and the extent of the fundamental frequency related with the clinical understanding of the disease, as is manifest in patients’ speech. In order to obtain an overall feature importance, a feature importance in the global scale produced, and it shows that features such as jitter improve the classification of Parkinsonian cases whereas features such as stable pitch intervals deteriorate such classification. Moreover, individual force plots extracted for illustrating the effect of the features on the respective prediction for proving that the model learned to lock onto specific patterns in each case.

Within the context of our research, we introduced a novel and powerful model named the Feature-Based DNN, or FB-DNN. This innovative approach represents a significant advancement in the realm of DL and feature extraction for complex datasets. Our approach revolves around the use of an Autoencoder, a well-known DL framework that can learn on its own and automatically extract useful characteristics from raw data. Prior to feeding the data into the DNN, we harnessed the power of the Autoencoder to perform feature extraction. This strategic step aimed to enhance the quality and representativeness of the feature set by focusing on the most salient and discriminative attributes within the dataset.

In the pursuit of model refinement and optimization, we embarked on an extensive training process spanning a substantial 1000 epochs. With more time to hone its internal visualizations, our FB-DNN was better able to pick up on subtleties and nuances in the training data. This extended training period was crucial for the model to reach a state of convergence, where its predictive accuracy could be maximized. During this comprehensive training regimen, we meticulously tracked and documented the model’s performance, yielding two key visualization plots: “Accuracy vs. Epoch” and “Loss vs. Epoch” shown in Figs. 9 and 10. These plots served as indispensable tools for monitoring and assessing the model’s progress over the course of training. The “Accuracy vs. Epoch” plot provided a clear representation of how the model’s classification accuracy evolved as the epochs advanced. Meanwhile, the “Loss vs. Epoch” plot depicted the trajectory of the model’s loss function over time, offering insights into its ability to minimize prediction errors. Our research represents a groundbreaking fusion of Autoencoder-driven feature extraction and DNN, culminating in the FB-DNN model. The model’s effectiveness and dynamics have been better understood after 1000 epochs of training and careful investigation of accuracy and loss patterns. These findings highlight the opportunity for enhanced feature-based categorization tasks and pave the way for future developments in DL and artificial intelligence.

To control overfitting during the DNN training (performed over 1000 epochs) and improve the model’s ability to generalize to unseen data some regularization and validation techniques applied. During training, figuring out overfitting, a dropout of 0.3 applied to the hidden layers randomly shift the neurons off the active network. Weight regularization was done through appropriate addition of a small penalty term (lambda = 0.001) in the loss function to avoid large weights. Batch Normalization applied after each hidden layer to enhance learning and increase the rate of convergence and it helps to avoid overfitting. The training further supervised by a validation set and training stopped for a proper validation in case of extreme overtraining (when the validation loss stopped improving for 50 epochs). During training, the model validated on another dataset of documents specifically for this purpose. The values of validation loss and accuracy tracked for analysis of the trends of all training and validation sets.

Performance Evaluation.

In the context of assessing the performance of a diagnostic or classification model for PD, several key metrics play a crucial role in evaluating its effectiveness. Measures such as these shed light on how well the model can distinguish between people with and without PD. Here is a detailed description of these metrics:

Classification models, such as those used to categorize cases of PD, may be evaluated with the use of a confusion matrix, as illustrated in Table 1. It helps evaluate the accuracy of the model by detailing the forecasts and the actual class labels. The confusion matrices of many models are displayed in Fig. 11. PD categorization is an example of a binary designation issue, and the confusion matrix for such a task normally has four values:

True Positives (TrPos): These are instances where the model’s diagnosis of PD was spot-on.

True Negatives (TrNeg): Such instances represent successes for the model in ruling out PD.

False Positives (FaPos): These are examples of Type I errors, in which the model mistakenly projected that a patient had PD.

False Negatives (FaNeg): This is an example of a Type II mistake, where the model wrongly indicated that a patient did not have PD.

Table 1 Confusion Matrix.

Accuracy:

For categorization tasks like PD diagnosis, accuracy is a simple and widely employed statistic. It is a metric that evaluates the percentage of cases (positive and negative) that were correctly categorized. When the model’s accuracy is high, it usually means that its forecasts are spot on.

$$\:Formula:\frac{TrPos\:+\:TrNeg}{TrPos\:+\:TrNeg\:+\:FaPos\:+\:FaNeg}$$

(12)

Precision:

Precision, additionally referred to as positive predictive value, measures the accuracy with which a model makes positive predictions. It indicates how precise the model is when it predicts an individual has PD. High precision implies that the model has fewer false-positive predictions.

$$\:Formula:\frac{TrPos}{TrPos\:+\:FaNeg}$$

(13)

Recall/Sensitivity:

The recall metric assesses how well a model does at properly identifying PD patients from among all the real positive cases. The likelihood of the model making a false-negative prediction—that is, mistakenly concluding that an individual does not have Parkinson’s when they do—is measured.

$$\:Formula:\frac{TrPos}{TrPos\:+\:FaNeg}$$

(14)

Specificity:

The model’s specificity is determined by how well it separates true negatives from false positives. Testing how well the model can avoid making false positives (wrongly diagnosing someone with Parkinson’s when they do not have the disease) is performed.

$$\:Formula:\frac{TrNeg}{TrNeg\:+\:FaPos}$$

(15)

F1 Score:

The F1 Score is the mathematical mean of the accuracy and recall subscores. It’s helpful when working with unbalanced datasets, when one class much dominates the other, and it strikes a balance among these two measurements. The F1 Score attempts to strike a balance between accuracy and recall by assigning a high value to both.

$$\:Formula:\:2*\frac{Precision\:*\:Recall}{Precision\:+\:Recall}$$

(16)

Each of these metrics serves a specific purpose in evaluating the effectiveness of a PD categorization model. Precision and recall analyze the compromise among false positives as well as false negatives, while accuracy offers a comprehensive picture of correctness. A model’s capacity to accurately identify healthy people may be evaluated using specificity, and the F1 Score provides a balanced measure of accuracy and recall that is useful in clinical contexts. Together, these indicators provide a thorough evaluation of the model’s diagnostic efficacy and provide direction for its ongoing development and improvement.

Table 2 Performance evaluation of various models.

The Table 2offers a comprehensive comparison of five distinct ML models: XG Boost³⁵, Decision Tree, Neural Network (NN)³⁶, DNN, and FB-DNN with respect to their performance in classifying individuals with PD. Accuracy, Precision, Recall/Sensitivity, Specificity, and F1 Score are some of the crucial performance indicators used in the analysis. These measures are crucial markers of the models’ efficacy, revealing their advantages and disadvantages in detecting PD. FB-DNN achieves the highest AUC-ROC score of 95.60%, reflecting its strong ability to distinguish between positive and negative cases across various threshold values.

Accuracy shown in Fig. 12, the first metric, quantifies the overall correctness of the models’ predictions. Among the models, FB-DNN emerges as the frontrunner with an impressive accuracy of 96.15%. This signifies that FB-DNN excels in making accurate predictions, showcasing its robustness and reliability in the context of PD classification. DNN and NN also exhibit commendable accuracy rates, standing at 91.43% and 88.18%, respectively, demonstrating their efficacy in this task. On the other hand, XG Boost and Decision Tree achieve slightly lower accuracy scores of 79.63% and 85.45%, respectively, indicating that they are relatively less accurate in comparison. The second parameter, depicted in Fig. 13, is the model’s precision, or its ability to correctly anticipate favorable outcomes. Here, FB-DNN stands out with the highest precision score of 98.00%. This implies that when FB-DNN predicts a patient as having PD, it is highly likely to be accurate. DNN, NN, and Decision Tree also exhibit commendable precision scores, showcasing their reliability in identifying true positive cases. XG Boost, although not the highest, still maintains a respectable precision score of 86.21%.

Recall/Sensitivity shown in Fig. 14, the third metric, assesses the models’ capability to correctly identify positive cases from all actual positive instances. In this regard, FB-DNN and NN both achieve high recall rates of 98.00% and 92.71%, respectively. This implies that these models excel in capturing a substantial portion of true positive cases, which is vital in medical diagnoses. Decision Tree also performs well, highlighting its ability to effectively identify positive cases. XG Boost and DNN exhibit decent recall rates of 88.24% and 94.90%, respectively, showcasing their proficiency. Specificity shown in Fig. 15, the fourth metric, measures the models’ ability to correctly identify negative cases from all actual negative instances. In this aspect, FB-DNN and Decision Tree display comparable scores, reflecting their capacity to avoid false-positive predictions in the context of individuals without PD. XG Boost as well as NN, however, show relatively lower specificity scores, indicating a higher likelihood of false positives in these models.

A model’s ability to minimize both false positives and false negatives is measured by the F1 Score, a combination of accuracy and recall that is depicted in Fig. 16. FB-DNN excels with an F1 Score of 98.00%, showcasing its harmonious balance between precision and recall. This is particularly noteworthy in a medical context where the consequences of false positives or false negatives can be significant. DNN and NN also exhibit impressive F1 Scores of 95.39% and 93.20%, respectively, underlining their proficiency. Decision Tree and XG Boost, while achieving respectable F1 Scores of 91.41% and 87.22%, respectively, may have slight trade-offs between precision and recall.

In conclusion, the table provides a comprehensive overview of the effectiveness of five different ML models in the critical task of classifying PD. It is evident that the FB-DNN stands out as the top performer across multiple metrics, particularly in terms of accuracy, precision, recall, and F1 Score. These results underscore the significant potential of the FB-DNN model in the diagnosis of PD, making it a promising candidate for clinical applications and further research in the field. Additionally, while other models also demonstrate strong performance, they may have trade-offs between precision and recall, emphasizing the importance of selecting the most suitable model based on specific clinical or diagnostic requirements.Overall, this comprehensive evaluation illuminates the strengths and limitations of each model, offering valuable insights into their respective contributions to PD classification, and guiding decisions for future research and clinical applications in this domain.

Table 3 Performance comparison of features derived from different techniques.

Table 3 compares the performance of each feature set and shows that SAE-extracted features perform better than raw audio features and outperform features extracted through PCA in all aspects. The nonlinear and hierarchical relationship between the features captured by the SAE enriches the model by improving the performance of the classification. These results validate SAE in obtaining the features relevant to Parkinson’s disease detection task and its efficiency and applicability. The usefulness of the features extracted using SAE can further explained by the promising performance of the FB-DNN model. Practical clinical usefulness of detecting and screening Parkinson’s has benefits that have vital effects in practical clinical application since the proposed Feature-Based Deep Neural Network has a high accuracy of 96.15%. This high level of accuracy is beneficial regarding model applicability in medical diagnostics because it reduces the number of false positive and false negative results while recognizing Parkinsonian features. The performance of the model yields the advantage of early and accurate diagnosis of Parkinson’s disease so that effective preventions and treatment measures can be implemented that have the possibility of halting the disease progression and improving the quality of life for patients.

Although Base models like SVM and KNN are good for some classification problems, selecting DNN in this research is reasonable since they work well with large datasets and complex relationships. In response to the perceived lack of comparison, a performance comparison made between DNN, SVM and KNN using the same set of data.This study employs Feature-Based Deep Neural Networks (FB-DNN) as well as other algorithms like Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Random Forest, and Principal Component Analysis (PCA) for detecting Parkinson’s disease (PD). There are several benefits of utilizing SVM, which is a powerful algorithm designed to identify the best fit of hyperplane for classifying a data set; this algorithm becomes relevant in both small data set and the realization of linearity in data classes, though it is rather less scalable in large and complex data sets. The K-Nearest Neighbors (KNN) algorithm is perhaps one of the simple but very powerful learning algorithms, which classify a data point closest to like kind of data points, and is not capable of learning. The ensemble method Random Forest is a method based on decision trees, where the result of multiple trees combined as the result in order to increase the accuracy level and allow the dealing with imbalanced data, yet this algorithm largely depends on feature engineering. The well-known dimensionality reduction method, Principal Component Analysis, condenses the feature space by projecting the data to the orthogonal components, which retains the most variance; however, PCA is limited to transformations through linear functions, unable to model nonlinear patterns in the PD-related audio data. On the other hand, Deep Neural Networks (DNN) can used in a needed high-dimensional and complex data since they can model non-linear relations and differentiate hierarchical features. The proposed FB-DNN improves this capability by including, as a feature-extraction stage, a Stacked Autoencoder (SAE). The SAE used to bring out semantically salient representations from the raw audio data eliminating irrelevant dimensions and keeping only informative ones related to diseases. These features then classified by the DNN, which has been shown particularly useful in detecting fine and complex voice features associated with Parkinson’s disease. Comparing to conventional methods, FB-DNN obtains better results by integrating SAE and DNN, which leads to the FB-DNN algorithm considered as one of the advanced methods in the development of PD.

Other architectures including SVMs, KNN, Random Forests and typical DNNs not considered for the proposed model since they are not capable of managing the non-linearity and complexities involved in the data. KNN and SVMs, though useful for handling fewer features, do not have capabilities to handle raw audios with complex temporal and frequency signatures. However, Random Forests work adequately for imbalanced data and not specifically trained to look for hierarchical or non-linear patterns in data as is important in this case, identifying low-level Parkinsonian vocal characteristics.

Other traditional deep learning architectures like CNNs, RNNs, and the transformer models not implemented in this work because they do not address certain requirements of PD diagnosis using audio data. CNNs are well suited for spatial data such as images, but are not optimal for capturing the temporal and frequency-domain patterns that are inherent in raw audio data. Performing such additional preprocessing to CNNs would deprive them of significant disease-specific vocal features extracted from spectrograms. RNNs and its flavors like LSTM meant for modelling sequential regime in data. Though they are useful when applied to time series, tasks their training is cumbersome, prone to vanishing gradient problems and less suitable for the size of the datasets available for this study and their main advantage of hierarchical feature extraction. Transformer based models are relatively new, and combined with the success in many-domain these networks are heavily compute demanding and to avoid overfitting we require large amount of data. Due to the limitations of the dataset and more specifically for the scope of fast feature extraction and classification, it was not necessary to use transformers in this context. Due to its combination of a feature extraction layer in the form of Stacked Autoencoder (SAE) and a classifier in the form of Deep Neural Network (DNN), the FB-DNN deemed a better option. The feasibility of directly analyzing the raw audio material whereby the features obtained are nonlinear and hierarchical also fits well with the need for detecting disease patterns of Parkinson’s disease.

Table 4 Performance comparison of DNN vs. Base models.

Table 4 shows the effectiveness of the models as follows: it can be noted that both the basic models of SVM and KNN have reasonable AUC, ACC, F1 score; however, the DNN model outperformed these basic models on all the mentioned metrics. This enhancement can ascribe to the fact that the DNN has features that enable it to extract hierarchical and nonlinear features from the given dataset, which is essential for identifying subtle features of Parkinson’s disease (PD) audio data. In addition, the Feature Based Deep Neural Network (FB-DNN) performs considerably better than all the simpler models and the baseline DNN, which shows the relevance and necessity of complicated architectures for enhancing the model’s high accuracy and stability. These results underscore the need of using methods such as DNN or FB-DNN when analysis of patterns that are more complicated is required. Even though the measurement from simpler models may be adequate for some applications, such models do not work well in this particular setting and have lower levels of accuracy for the metrics.

Table 5 Regularization techniques and their impact.

From the Table 5, it is clear that the use of regularization techniques has enhanced the validation accuracy as well as the differences between training/validation accuracy, thus suggesting that the model has well generalized. Together with the complementary evaluation on unseen data we show that these measures are sufficient to maintain the model’s reliability and to prevent overfitting – the major concern when considering the model for a real-world application.

Although the present work has reported results based on the Italian Parkinson’s Voice and Speech (IPVS) dataset that has given substantial information about the effectiveness of the proposed model, it is critical to understand the performance of the model-on larger and more diverse datasets for better generalization. Cross-validation used during model training to enable the examination of overfitting. A simple cross validation strategy, the 5-part method was adopted, where the data was partitioned into 5 subsets of data and each subset was in turn utilized for validation whilst the other four were employed for training. It allowed avoiding specific dependence on the train-test split of the model.

Table 6 Cross-validation results.

Table 6 depicts the cross-validation of the model’s performance by splitting the data into five folds. The overall accuracy rate for all participants stood at 95.6% while the individual scores varied between 95.5% and 96.2%. Precision was also more volatile ranging a bit above an average of 97.48% showing how the model avoided false positive results. Recall here was at 96.94%, which meant that true positive well captured by the model. Precision was averaged at 94.46% and recall 97.89%, while the F1 Score a harmonic mean for the two, equaled to 97.21%. Most importantly, the model achieves the highest scores on all metrics in Fold 5 with the percentage of 98% of recall and F1 Score. These results evidence the reliability and performances of the used model in the classification of the examination.

Table 7 Ablation study results.

To evaluate the importance of the components in the proposed Feature-Based Deep Neural Network, an ablation study was performed in Table 7 where individual modules of the system were removed to test their performance such as Stacked Autoencoder for feature extraction, Modified Band Pass Filter as data preprocessing, and the regularization used in DNN. The performance dropped considerably when the SAE removed; the overall accuracy went down from 96.15 to 89.50%, and the F1-score was 90.75%. This shows that the SAE became effective in extracting the hierarchical and nonlinear properties, which are fundamental in discriminating minor Parkinsonian vocal abnormalities. Likewise, the malfunction in the Modified Band Pass Filter, which we used to filter relevant frequency bands and enhance the clarity of the signal, reduced the accuracy to 87.30%. Other important factors comprised of regularization techniques such as dropout and L2 regularization where, with their removal, the accuracy reduces to 85.70% thus it was evident that overfitting has to regulate especially in datasets such as the IPVS where there is very limited samples. Suboptimal tracking and monitoring of raw audio features further led to the Generalization of the pipeline with the lowest performance accuracies of 83.25%, finding vindication in the integrated preprocessing and feature extraction. These results support the design decisions made in the FB-DNN and demonstrate how each module strengthens the model’s ability to detect Parkinson’s disease while also making it more resilient.

In this study, we employ another Parkinson’s Speech with Multiple Types of Sound Recordings Dataset³⁷, including training and test data obtained from the Department of Neurology at Cerrahpasa Faculty of Medicine, Istanbul University. The training sample consists of data recordings of 20 PwP (6 female, 14 male) and 20 HC individuals (10 female, 10 male). Sustained vowels, numbers, words, and short sentences were recorded from each participant yielding 26 voice samples per participant. From these recordings, twenty-six linear and time-frequency based features were extracted which provided a wealth of data for analysis. Moreover, the ANN case study is also derived from the dataset where the cross-entropy loss with appropriate labels based on UPDRS expert scores are provided for each PWP to perform the classification and regression analysis.

Table 8 Comparison of FB-DNN performance on two datasets.

Table 8 shows the comparison of FB-DNN performance on two datasets. The proposed FB-DNN has ample value in identification and treatment of Parkinson’s disease based on clinical aspects. The model has designed to analyze moment-to-moment changes in fundamental frequency, and hence has great utility in the detection and diagnosis of Parkinson’s disease; it is essential to detect the disease at its early stage and the presented model can successfully detect even minor irregularities in voice, such as jitter or shimmer. These vocal biomarkers, which are usually manifest early in the disease, is cost efficient and easy to implement thus replacing costly neuroimaging or invasive diagnostics that are often unavailable due to high costs in many centers. One advantage of the FB-DNN is that the model is very sensitive to the features characteristic of Parkinsonism, which will allow clinicians to identify patients at the initial stage of the disease and prescribe appropriate treatment in a timely manner, and thus possibly stall the development of the disease. Furthermore, added versatility of the model in performing on telemedicine platforms increases its usefulness in use of remote monitoring and diagnosing due to scarce access to specialized hospital clinics in underprivileged or rural areas.

The developed FB-DNN model employs several approaches to deal with noisy audio data; thus, this model is worthwhile to apply to real-world environments. First, the preprocessing pipeline is most specifically for reduction of noise, through methods for example in form of dynamic filter and signal normalization. These methods assist in keeping only Parkinson’s relevant features from the acoustical signal and minimizing the noise present in the signal. Second, although noise effect is not investigated in this paper, the training and testing datasets included real recordings and they always contain different level of noise. The FB-DNN presented impressive results in terms of accuracy: 97.00%, precision: 98.50%, and F1 score: 98.35%, suggesting that the model does not fundamentally rely on the absence of noise in the dataset. In order to improve noise tolerance in the future, this research will consist of experimental studies with different amounts of noise and types of noise on purpose to examine how the developed model performs under these conditions. Also, new data augmentation, which utilizes artificially noisy samples, and adversarial training will be used in order to enhance generalization capability of the presented model for noisy circumstances.

Table 9 Comparison of computational complexity across different models.

Table 9, which compares models according to their computational cost, ability to generalize, and interpretability, reveals further differences between the methods. SVM and KNN pose lower computational complexity, something which makes it possible for them to be applied in environments where computational resources are scarce in or where the task performed is less complicated. Thus, KNN is not very accurate in terms of generality and its performance experiences vast fluctuations when tested on large or homogeneous data sets. However, models such as CNN and RNN are highly generalized and can handle large and diverse data, albeit at a cost because of the high computational needs; RNN is particularly very demanding in this regard as it is inherently sequential. DNN also proved to have high computational density accompanied by moderately good generalization capabilities and low interpretability drawing attention to its disadvantage of being a black box type. FB-DNN has been recognized as a model with average computational complexity as compared to other models and high level of generality. This balance makes FB-DNN to have no extremely high resource demands like those seen in both CNN or RNN but performs well in different datasets. Moreover, FB-DNN has interpretation capability that is moderate, which, compared with SVM and DNN, is slightly lower but higher than the latter. This makes FB-DNN a good solution for actual problems, as it classes the computational costs and shows the performance-debit equation for real-world DV problems, where scalability is more important than ultra-high precision. In considering the problems faced by the traditional models and on considering the avoided complications in adopting a more advanced architecture, it is easy to see how FB-DNN is capable of being a relatively solid and highly effective approach.

Challenges and limitation.

One of the main challenges in this research is the oversight of variations in audio features across different stages of Parkinson’s disease progression. Although the proposed FB-DNN model shows high accuracy in the detection of Parkinson’s disease, the current work did not compare the performance of the model in various stages of the disease such as early, middle, and advanced stages. This is important because depending on stage and type of disease, patients may have mild or severe audio impairments. For example, fine motor abnormalities of the voice, which signal early Parkinson’s disease or are represented by mild tremor or fluctuations in pitch, can be sometimes overlooked and, as a result, have higher values of lower sensitivity in early-stage identification. To overcome this limitation, future research studies will continue the investigation of this model using datasets that are annotated according to the disease stage, including early, middle, and late stages of Parkinson’s disease. Specifically, stage-specific FB-DNNs can be trained and validated on such stage-annotated databases which will help to enhance the sensitivity of the model to the early, barely noticeable symptoms of a disease. Further, other complex methods like the domain adaptation or transfer learning could help improve on the models ability to generalize to different disease stages.

Despite the fact that the deep learning models, including the proposed FB-DNN, can work as “black boxes are addressed and efforts were made to make it more explainable to ensure that it can be accepted in medical field. The important thing about the FB-DNN is that it uses techniques that allow for explanation so that practitioners can know how the FB-DNN was arriving at its particular prediction. The model also includes an analysis of feature importance to determine which signal attributes, including jitter, tremors, or pitch variability, are most relevant to the classification of Parkinson’s disease. It also assists in making the results easier to explain by elucidating the correlation between the distilled highlights and the model’s decision. Furthermore, saliency maps or feature attribution methods will be incorporated in future work to give clinicians more visual insights into the way certain features affect the model.

Although the proposed FB-DNN model has all the benefits such as high accuracy, precision, and robustness for PD detection through audio-based features, few challenges and limitations are as follows. First, the model cannot be generically applied to other groups of people, or other types of data sets, for instance, the Parkinson’s Speech data set. The primary source of data is the audio recorded in the controlled environment, which could be different from the real-life situation when environmental noise and variations in recording quality might seriously influence data quality. Further work will involve testing the model using a more diverse dataset with, for example, multiple languages, multiple demography, and more realistic recording conditions to make it practical. Second, although the model reaches a sufficient level of computational efficiency, the difficulties of its use in resource-limited environments, including rural clinics or telemedicine platforms, cannot be excluded. Other strategies expected to drop the computational load include model pruning, quantization, and optimization specifically for edge devices. Moreover, although the FB-DNN gives high accuracy, a ‘black box’ nature of this framework may hamper its implementation in clinical practice where interpretability plays a decisive role. The use of tools like SHAP and other supplementary tools like LIME will enhance model interpretability and add credibility for the clinician. Finally, the study does not compare the effectiveness of the developed model throughout the stages of Parkinson’s disease. Still, early signs that are generally mild and less easily distinguishable may prove to be a problem to the model’s specificity. Subsequent research will incorporate datasets marked according to stage to test the model’s capacity for discriminating stages such as early, middle, and late. Moreover, there are technical concerns to be solved in data collection, such as patient cooperation, noises from the environment, and different characteristics of microphones, for high sound quality, which should be solved systematically through preprocessing and noise reduction. These challenges are key problem areas and the foundation for future enhancements of the FB-DNN framework.

Source link