Accelerating the performance of machine learning classifiers using bacterial colony optimization for heart disease prediction
Authors
Md. Muktar Hossain
(Computer Science and Engineering)
Tanver Ahmed
(Computer Science and Engineering)
Abstract
Objectives: Cardiovascular Disease (CVD) remains one of the leading causes of global mortality, accounting for millions ofdeaths annually. Early and accurate diagnosis plays a critical role in reducing mortality and healthcare burden. However,conventional diagnostic approaches often suffer from misdiagnosis, delayed treatment, and increased medical costs.Machine Learning (ML) has shown significant potential in supporting clinical decision-making for early CVD detection.Nevertheless, ML models often face challenges such as computationally expensive parameter tuning and susceptibility tolocal minima. This study aims to address these challenges by proposing a bio-inspired optimization framework to enhancediagnostic accuracy and efficiency. Methods: This study employs Bacterial Colony Optimization (BCO) to optimize thehyperparameters of ten machine learning classifiers: Logistic Regression, Support Vector Machine (SVM), K-NearestNeighbors, Multilayer Perceptron, Naïve Bayes, Random Forest (RF), Decision Tree, Extreme Gradient Boosting (XGBoost),Light Gradient Boosting Machine, and AdaBoost. Principal Component Analysis (PCA) is integrated to handle featuredimensionality and multicollinearity. Experiments were conducted using the Cleveland Heart Disease dataset (CLE) and theIEEE DataPort dataset (HGR), applying a rigorous 5-fold Cross-Validation (CV) strategy to ensure reliability and stability.Results: Experimental findings demonstrate that the integration of PCA, BCO, and ML classifiers significantly improvesprediction performance compared to baseline models. The BCO-optimized RF model achieved the highest mean accuracy of92.02% (95% CI: 89.93–94.10) on the HGR dataset, outperforming the baseline accuracy of 91.26%. Similarly, the BCO-SVMmodel achieved a mean accuracy of 85.79% on the CLE dataset. Confidence interval analysis further confirmed enhancedmodel stability and reduced prediction variance. Conclusion: The proposed framework effectively enhances CVD diagnosisby improving classification accuracy and stability. By efficiently exploring the search space and mitigating local minimalimitations, the framework provides a statistically robust and clinically reliable decision-support tool for early cardiovascularrisk detection.