##plugins.themes.bootstrap3.article.main##

Rajesh Govind Talekar

Avinash Vasantrao Khambayat

Abstract

Big data analytics is rapidly becoming a critical area of research in computer science and various industries worldwide, demonstrating significant success across sectors like social media, economy, finance, healthcare, and agriculture. We examine the most commonly used machine learning algorithms for big data analytics, emphasizing their ability to manage the unique challenges posed by big data, such as high velocity, large volume, uncertainty, non-stationary characteristics, and real-time data requirements. With the daily generation of gigabytes of data, traditional machine learning techniques fall short due to the distinctive features of big data. Additionally, conventional storage and processing methods are insufficient for the demands of big data environments. This paper explores the challenges associated with applying traditional unsupervised machine learning techniques to big data analytics and presents potential solutions. Our study investigates traditional clustering techniques within big data analysis, emphasizing Spark-enabled K-means clustering for efficient processing. By integrating Silhouette analysis, optimal cluster configurations are identified, enhancing clustering accuracy for large-scale datasets. The work is validated across multiple datasets, enhancing clustering accuracy for diverse and large-scale data collections. Our work highlights effective strategies such as parallel processing, and the use of GPUs and Spark framework as feasible approach to address the various challenges. The findings and insights provided contribute to the ongoing efforts to develop and improve analytical methods capable of handling the complexities and scale of big data.

##plugins.themes.bootstrap3.article.details##