Diabetes disease is caused by increase in blood sugar in human body. Pancreas secretes insulin to regulate glucose in blood. When pancreas not able to secrete or enough insulin or not able to use insulin led to diabetes. Biomarkers related to diabetes are Diet, life style, age, pregnancies, physical activity, tension, blood pressure etc. Diabetes is mainly responsible for diseases like kidney failure, eyes issues, nerves damage, heart attack etc. In current scenario tests are the only methods to detect diabetes, but this is a time-consuming process. Machine learning helps in early detection of diseases through identification of hidden pattern and analyses of various biomarkers. The purpose of this research is to propose a model for early detection of diabetes on PIMA dataset, which is sourced from Kaggle repository and to identify highly related biomarkers related to diabetes. Data set includes 768 records and 8 attributes. In this paper nine supervised classification algorithms are used like Logistic regression, KNeighbours Classifier, Support vector classifier, Extra tree, Bayes classifier, Gradient boosting classifier, Random and Decision classifier. Logistic regression performed best with accuracy of 82% when compared with others classification algorithms. Two biomarkers identified as Glucose and BMI which are directly linked with increase in diabetes. Higher values of glucose and BMI higher the risk of diabetes.
Keywords: Machine learning, Diabetes, Ensemble, Supervised learning, Artificial intelligence, Health analytics, Adaboost