##plugins.themes.bootstrap3.article.main##

B. Lavanya

U. Vageeswari

Abstract

Due to the remarkable advancement of information technology, the amount of unordered text information in the computer database is constantly expanding, making it difficult to organise, analyse, summarise, and classify text. The process of retrieving important data from unstructured text is called text mining. The latent Dirichlet Allocation (LDA) technique which is an unsupervised machine learning technique, is frequently employed for topic modelling. The result from an LDA makes perfect sense for categorization. Domain-specific and Out of Vocabulary (OOV) terms abound in the LDA model. This research proposes an unsupervised framework for text categorization using LDA with enhanced vocabulary handling and domain knowledge. Domain-specific terms are eliminated, and the most comparable LDA Dictionary words are used in place of OOV words. Two datasets with various data categories were used in the experiment. On both datasets, the proposed model performs better than alternative models. By using the suggested framework, Accuracy, Purity, Precision, Recall, and F1- scores were all improved.

##plugins.themes.bootstrap3.article.details##