##plugins.themes.bootstrap3.article.main##

Ravikumar R N

Sheikh Md. Jaimul Jakir

N. Sivakumar

Pappu Kumar Rai

Anand Singh

Mariappan Kandasamy

Abstract

Prostate cancer is considered to be among the most common and deadly diseases all over the globe. In the present work, different machine learning models were applied to the prostate cancer database for diagnosis classification. The Kaggle dataset has 100 observations with 10 variables of which 8 are continuous variables and 1 is categorical. The traditional set of classification algorithms including k Nearest Neighbor (kNN), Decision Tree (DT), Naïve Bayes (NB), Support Vector Machine (SVM), Random Forest, Logistic Regression and Gradient Boosting, Ada Boost, and Extra Trees were used. The effectiveness of the models was assessed before and after hyperparameter optimization. When it came to iteratively calculating the accuracy the best models that were attained were only Logistic Regression followed by AdaBoost getting the best of 90% while the Accuracy stochastically varied for both models as seen in the following graphs. Other models had diverse performance with KNN, decision tree, SVM, random forest, and gradient boosting models ranging between 73% and 83 % accuracy. Overall, Naive Bayes had the lowest scores, but it had the same accuracy of 77% at all iterations. The outcomes of this analysis underline the ability of the proposed machine learning models in identifying the results of the prostate cancer diagnosis builds the foundation on which further research might be based; Thus, the Models that showed the best performance are AdaBoost and logistic regression.

##plugins.themes.bootstrap3.article.details##