Conference Paper
2026

Analysis of Lung Cancer Risk Associated with Long-Term Smoking Using Machine Learning Approaches

Authors
Abdullah Tamim (Center for Interdisciplinary Research (CIR))
Abstract
Lung cancer remains one of the deadliest cancers worldwide, and cigarette smoking continues to be its most significant risk factor. A major reason for its high fatality rate is that many patients are diagnosed only after the disease has progressed to its later stages, where treatment becomes far less effective. The disease is also more common among older adults, with most diagnoses occurring in individuals aged 65 and above. To support earlier and more accurate risk assessment, this study explores the use of machine learning for predicting lung cancer stages based on a range of clinical, behavioral, and environmental factors. The research uses the Smoking and Cancer Risk Analysis dataset, which includes 3,000 patient records and 17 attributes. The dataset covers demographic information, smoking intensity and duration, alcohol use, physical activity, diet quality, secondhand smoke exposure, air pollution levels, BMI, chronic symptoms, family medical history, and the final cancer stage. Its combination of numerical and categorical features makes it suitable for statistical analysis and predictive modeling. Five widely used machine learning algorithms Logistic Regression, Random Forest, Gradient Boosting, AdaBoost, and Extra Trees, were trained and evaluated using 3-fold cross-validation to ensure consistent and reliable performance. All models demonstrated strong predictive capabilities, with Binary LR achieved an accuracy of 99.5%. Multinomial LR showed slightly lower but strong performance, with 97.7% accuracy and Random Forest performing slightly better at 98.17%. Gradient Boosting reached 97.33%, followed by Extra Trees at 96.5% and AdaBoost at 95.33%. Overall, the findings highlight the effectiveness of tree-based ensemble methods, particularly Random Forest and Gradient Boosting, for accurately assessing lung cancer risk associated with long-term smoking behavior.
Publication Details
Published In:
2nd Undergraduate Conference on Intelligent Computing and Systems (UCICS 2026)
Publication Year:
2026
Publication Date:
January 2026
Type:
Conference Paper
Total Authors:
1