DISEASE PREDICTION FROM COVID-19 MEDICAL DATA USING DATA MINING ALGORITHM

Nafis Md Zawad(1),


(1) American International University-Bangladesh
Corresponding Author

Abstract


The study was designed to introduce a technique for disease prediction by using a data mining algorithm. Here in this paper, a significant discussion has been made on the Novel Corona Virus and the creation of a model for disease prediction. The novel Coronavirus (COVID-19) pandemic has created chaos in the world. People from both developed and developing countries are facing many death tolls due to insufficient ways to prevent COVID-19. It is observed that the environment requires a quick and effective way to control the spread of COVID-19 across the globe. The use of non-clinical methods like data mining techniques can be an effective way to combat the spreading of Covid-19. To minimize the immense pressure on the healthcare system, improved ways of patients’ detection and diagnosis of the nature of the Covid-19 pandemic need to be ensured. In this study, an epidemiological dataset, and data mining models were applied for forecasting the extent of Covid-19 patients. To construct the models, the decision tree and logistic regression were used. Besides, a random forest algorithm was applied to the dataset by using ‘Python Programming Language’. The results reveal that the model created with a ‘Random Forest Data Mining Algorithm’ is more effective in predicting the likelihood of Covid virus-infected patients with the correctness (accuracy) of up to eighty percent (80%).

Keywords


Covid-19; Data Mining Algorithm; Disease Prediction; Medical Data; Random Forest

References


Afzal, A. (2020). Molecular diagnostic technologies for COVID-19: Limitations and challenges. Journal of Advanced Research, 26, 149–159. https://doi.org/10.1016/J.JARE.2020.08.002

Ahamad, M. M., Aktar, S., Rashed-Al-Mahfuz, M., Uddin, S., Liò, P., Xu, H., Summers, M. A., Quinn, J. M. W., & Moni, M. A. (2020). A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert Systems with Applications, 160, 113661. https://doi.org/10.1016/J.ESWA.2020.113661

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879

Al-Turaiki, I., Alshahrani, M., & Almutairi, T. (2016). Building predictive models for MERS-CoV infections using data mining techniques. Journal of Infection and Public Health, 9(6), 744. https://doi.org/10.1016/J.JIPH.2016.09.007

Alzahrani, A., & Kanan, A. (2022). Machine Learning Approaches for Developing Land Cover Mapping. Applied Bionics and Biomechanics, 2022. https://doi.org/10.1155/2022/5190193

Ayon, S. I., Islam, M. M., & Hossain, M. R. (2022). Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques. IETE Journal of Research, 68(4), 2488–2507. https://doi.org/10.1080/03772063.2020.1713916

Cao, R., & Xu, L. (2009). Improved C4.5 algorithm for the analysis of sales. 2009 6th Web Information Systems and Applications Conference, WISA 2009, 173–176. https://doi.org/10.1109/WISA.2009.36

Desuky, A. S. (2022). Two Enhancement Levels for Male Fertility Rate Categorization Using Whale Optimization and Pegasos Algorithms. In Advances in Medical Technologies and Clinical Practice (pp. 234–256). https://doi.org/10.4018/978-1-6684-5092-5.CH011

Diagnosis and treatment protocol for novel coronavirus pneumonia (Trial version 7). (2020). Chinese Medical Journal, 133(9), 1087–1095. https://doi.org/10.1097/CM9.0000000000000819

European Centre for Disease Prevention (ECDC). (2020). Options for the use of rapid antigen tests for COVID-19 in the EU/EEA and the UK Key messages.

Everitt, B. S. , L. S. L. M. S. D. (2011). Miscellaneous Clustering Methods. 215–255. https://doi.org/10.1002/9780470977811.CH8

Fadugba, S. E., Shaalini, V. J., & Ibrahim, A. A. (2021). Analysis and applicability of a new quartic polynomial one-step method for solving COVID-19 model. Journal of Physics: Conference Series, 1734(1). https://doi.org/10.1088/1742-6596/1734/1/012019

Fan, H., Chen, Y., Huang, S., Zhang, X., Guan, H., & Shi, D. (2019). Post-fault Transient Stability Assessment Based on k-Nearest Neighbor Algorithm with Mahalanobis Distance. 2018 International Conference on Power System Technology, POWERCON 2018 - Proceedings, 4417–4423. https://doi.org/10.1109/POWERCON.2018.8602125

Ferreira, D., Oliveira, A., & Freitas, A. (2012). Applying data mining techniques to improve diagnosis in neonatal jaundice. BMC Medical Informatics and Decision Making, 12(1), 143. https://doi.org/10.1186/1472-6947-12-143/TABLES/2

Gandhi, R. (2018, May 5). Naive Bayes Classifier. What is a classifier? Towards Data Science. https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c

Guan, W., Ni, Z., Hu, Y., Liang, W., Ou, C., He, J., Liu, L., Shan, H., Lei, C., Hui, D. S. C., Du, B., Li, L., Zeng, G., Yuen, K.-Y., Chen, R., Tang, C., Wang, T., Chen, P., Xiang, J., … Zhong, N. (2020). Clinical Characteristics of Coronavirus Disease 2019 in China. New England Journal of Medicine, 382(18), 1708–1720. https://doi.org/10.1056/NEJMOA2002032/SUPPL_FILE/NEJMOA2002032_DISCLOSURES.PDF

Haque, M. R., Islam, M. M., Iqbal, H., Reza, M. S., & Hasan, M. K. (2018). Performance Evaluation of Random Forests and Artificial Neural Networks for the Classification of Liver Disorder. International Conference on Computer, Communication, Chemical, Material and Electronic Engineering, IC4ME2 2018. https://doi.org/10.1109/IC4ME2.2018.8465658

Harrison, O. (2018, September 11). Machine Learning Basics with the K-Nearest Neighbors Algorithm | by |. Towards Data Science. https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761

Hasan, N., Chaudhary, K., & Alam, M. (2021). Unsupervised machine learning framework for early machine failure detection in an industry. Https://Doi.Org/10.1080/09720529.2021.1951434, 24(5), 1497–1508. https://doi.org/10.1080/09720529.2021.1951434

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844. https://doi.org/10.1109/34.709601

Hungund, B. (2020). COVID-19 Symptoms Checker. https://www.kaggle.com/datasets/iamhungundji/covid19-symptoms-checker?select=Cleaned-Data.csv

Hussain, S., Muhammad, L. J., Ishaq, F. S., Yakubu, A., & Mohammed, I. A. (2019). Performance evaluation of various data mining algorithms on road traffic accident dataset. Smart Innovation, Systems and Technologies, 106, 67–78. https://doi.org/10.1007/978-981-13-1742-2_7

Indhumathi, K., & Kumar, K. S. (2022). Seasonal Infectious Disease Prediction based on Electronic Patient Health Records using Boosted Random Forest Algorithms. 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2022, 2025–2032. https://doi.org/10.1109/ICACITE53722.2022.9823453

Islam, M. M., Iqbal, H., Haque, M. R., & Hasan, M. K. (2018). Prediction of breast cancer using support vector machine and K-Nearest neighbors. 5th IEEE Region 10 Humanitarian Technology Conference 2017, R10-HTC 2017, 2018-January, 226–229. https://doi.org/10.1109/R10-HTC.2017.8288944

Ismail, A. A. A. (2020). Serological tests for COVID-19 antibodies: Limitations must be recognized. Annals of Clinical Biochemistry: International Journal of Laboratory Medicine, 57(4), 274–276. https://doi.org/10.1177/0004563220927053

Iwendi, C., Bashir, A. K., Peshkar, A., Sujatha, R., Chatterjee, J. M., Pasupuleti, S., Mishra, R., Pillai, S., & Jo, O. (2020). COVID-19 patient health prediction using boosted random forest algorithm. Frontiers in Public Health, 8. https://doi.org/10.3389/FPUBH.2020.00357

Jaafar, H., Ramli, N. H., & Abdul Nasir, A. S. (2018). An Improvement To The k-Nearest Neighbor Classifier For ECG Database. IOP Conference Series: Materials Science and Engineering, 318(1), 012046. https://doi.org/10.1088/1757-899X/318/1/012046

Jibril, M. L., & Sharif, U. S. (2020). Power of Artificial Intelligence to Diagnose and Prevent Further COVID-19 Outbreak: A Short Communication. ArXiv Preprint. https://doi.org/10.48550/arxiv.2004.12463

Keeling, M. J., Hollingsworth, T. D., & Read, J. M. (2020). Efficacy of contact tracing for the containment of the 2019 novel coronavirus (COVID-19). Journal of Epidemiology and Community Health, 74(10), 861–866. https://doi.org/10.1136/JECH-2020-214051

Kohavi, R., & Quinlan, R. (1999). Decision tree discovery. https://www.semanticscholar.org/paper/Decision-tree-discovery-Kohavi-Quinlan/487203d0d87cc706ed90e40d3bc181e5779f1b87

Kumar, V., Lalotra, G. S., & Kumar, R. K. (2022). Improving performance of classifiers for diagnosis of critical diseases to prevent COVID risk. Computers & Electrical Engineering, 102, 108236. https://doi.org/10.1016/J.COMPELECENG.2022.108236

Lai, C. C., Shih, T. P., Ko, W. C., Tang, H. J., & Hsueh, P. R. (2020). Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): The epidemic and the challenges. International Journal of Antimicrobial Agents, 55(3), 105924. https://doi.org/10.1016/J.IJANTIMICAG.2020.105924

Li, Y. C., Bai, W. Z., & Hashikawa, T. (2020). The neuroinvasive potential of SARS-CoV2 may play a role in the respiratory failure of COVID-19 patients. Journal of Medical Virology, 92(6), 552–555. https://doi.org/10.1002/JMV.25728

Mahamunkar, G. S., & Netak, L. D. (2022). Comparison of Various Deep CNN Models for Land Use and Land Cover Classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13184 LNCS, 499–510. https://doi.org/10.1007/978-3-030-98404-5_46

Majali, J., Niranjan, R., Phatak, V., & Tadakhe, O. (2015). Data Mining Techniques For Diagnosis And Prognosis Of Cancer. IJARCCE, 4(3), 613–615. https://doi.org/10.17148/IJARCCE.2015.43147

Muhammad, L. J., Abba Haruna, A., Mohammed, I. A., Abubakar, M., Badamasi, B. G., & Musa Amshi, J. (2019). Performance evaluation of classification data mining algorithms on coronary artery disease dataset. 2019 9th International Conference on Computer and Knowledge Engineering, ICCKE 2019, 1–5. https://doi.org/10.1109/ICCKE48569.2019.8964703

Muhammad, L. J., Islam, M. M., Usman, S. S., & Ayon, S. I. (2020). Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery. SN Computer Science, 1(4). https://doi.org/10.1007/S42979-020-00216-W

Ng, E. (2019). Validation of a Protein Biomarker Panel for Early Hepatocellular Carcinoma Detection at the Point-Of-Care [Doctoral Thesis, Stanford University]. http://purl.stanford.edu/wq623yg2914

Pan, L., Mu, M., Yang, P., Sun, Y., Wang, R., Yan, J., Li, P., Hu, B., Wang, J., Hu, C., Jin, Y., Niu, X., Ping, R., Du, Y., Li, T., Xu, G., Hu, Q., & Tu, L. (2020). Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: A descriptive, cross-sectional, multicenter study. American Journal of Gastroenterology, 115(5), 766–773. https://doi.org/10.14309/AJG.0000000000000620

Perkel, J. M. (2018). Why Jupyter is data scientists’ computational notebook of choice. Nature, 563(7729), 145–146. https://doi.org/10.1038/D41586-018-07196-1

Pounis, G. (2018). Statistical analysis of retrospective health and nutrition data. Analysis in Nutrition Research: Principles of Statistical Methodology and Interpretation of the Results, 103–144. https://doi.org/10.1016/B978-0-12-814556-2.00005-1

Prasad, R., Anjali, P., Adil, S., & Deepa, N. (2019). Heart disease prediction using logistic regression algorithm using machine learning. International Journal of Engineering and Advanced Technology, 8(3 Special Issue), 659–662.

Rothe, C., Schunk, M., Sothmann, P., Bretzel, G., Froeschl, G., Wallrauch, C., Zimmer, T., Thiel, V., Janke, C., Guggemos, W., Seilmaier, M., Drosten, C., Vollmar, P., Zwirglmaier, K., Zange, S., Wölfel, R., & Hoelscher, M. (2020). Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany. New England Journal of Medicine, 382(10), 970–971. https://doi.org/10.1056/NEJMC2001468/SUPPL_FILE/NEJMC2001468_DISCLOSURES.PDF

Saeed, A. A. (2021). Predictions of α-Decay Half-Lives for Neutron-Deficient Nuclei with the Aid of Machine Learning. Kwara State University.

Seo, S., Kim, Y., Han, H. J., Son, W. C., Hong, Z. Y., Sohn, I., Shim, J., & Hwang, C. (2021). Predicting Successes and Failures of Clinical Trials With Outer Product–Based Convolutional Neural Network. Frontiers in Pharmacology, 12, 1423. https://doi.org/10.3389/FPHAR.2021.670670/BIBTEX

Singh, Y. V., Singh, P., Khan, S., & Singh, R. S. (2022). A Machine Learning Model for Early Prediction and Detection of Sepsis in Intensive Care Unit Patients. Journal of Healthcare Engineering, 2022. https://doi.org/10.1155/2022/9263391

Venkatalakshmi, B., & Shivsankar, M. V. (2014). Heart Disease Diagnosis using Predictive DataMining. International Journal of Innovative Research in Science, Engineering and Technology, 3(3), 1873–1877. http://www.cs.waikato.ac.nz/ml/weka

Villavicencio, C. N., Macrohon, J. J. E., Inbaraj, X. A., Jeng, J. H., & Hsieh, J. G. (2021). Covid-19 prediction applying supervised machine learning algorithms with comparative analysis using weka. Algorithms, 14(7). https://doi.org/10.3390/A14070201

Wang, G.-Q., Zhao, L., Wang, X., Jiao, Y.-M., & Wang, F.-S. (2021). Diagnosis and Treatment Protocol for COVID-19 Patients (Tentative 8th Edition): Interpretation of Updated Key Points. Infectious Diseases & Immunity, 1(1), 17. https://doi.org/10.1097/ID9.0000000000000002

Wölfel, R., Corman, V. M., Guggemos, W., Seilmaier, M., Zange, S., Müller, M. A., Niemeyer, D., Jones, T. C., Vollmar, P., Rothe, C., Hoelscher, M., Bleicker, T., Brünink, S., Schneider, J., Ehmann, R., Zwirglmaier, K., Drosten, C., & Wendtner, C. (2020). Virological assessment of hospitalized patients with COVID-2019. Nature 2020 581:7809, 581(7809), 465–469. https://doi.org/10.1038/s41586-020-2196-x

World Health Organization (WHO). (2022, November 11). WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/

Yahaya, B. Z., Muhammad, L. J., Abdulganiyyu, N., Ishaq, F. S., & Atomsa, Y. (2018). An Improved C4.5 Algorithm Using L’ Hospital Rule for Large Dataset. Indian Journal of Science and Technology, 11(47), 1–5. https://doi.org/10.17485/IJST/2018/V11I47/132538

Yi, C., Zhen, J., Li, Y., Yi, Y., Yin, P., & Min, H. (2018). A novel method to improve transfer learning based on mahalanobis distance. 2017 IEEE International Conference on Robotics and Biomimetics, ROBIO 2017, 2018-January, 2279–2283. https://doi.org/10.1109/ROBIO.2017.8324758

Zhang, J., Jun, T., Frank, J., Nirenberg, S., Kovatch, P., & Huang, K. lin. (2021). Prediction of individual COVID-19 diagnosis using baseline demographics and lab data. Scientific Reports 2021 11:1, 11(1), 1–8. https://doi.org/10.1038/s41598-021-93126-7


Full Text: PDF

Article Metrics

Abstract View : 278 times
PDF Download : 67 times

DOI: 10.56327/ijiscs.v6i3.1312

Refbacks

  • There are currently no refbacks.