Latent Dirichlet Allocation, Vocabulary Handler, and Domain Knowledge-Based Framework for Text classification
##plugins.themes.bootstrap3.article.sidebar##
Download : 33 times
##plugins.themes.bootstrap3.article.main##
B. Lavanya
U. Vageeswari
Abstract
Due to the remarkable advancement of information technology, the amount of unordered text information in the computer database is constantly expanding, making it difficult to organise, analyse, summarise, and classify text. The process of retrieving important data from unstructured text is called text mining. The latent Dirichlet Allocation (LDA) technique which is an unsupervised machine learning technique, is frequently employed for topic modelling. The result from an LDA makes perfect sense for categorization. Domain-specific and Out of Vocabulary (OOV) terms abound in the LDA model. This research proposes an unsupervised framework for text categorization using LDA with enhanced vocabulary handling and domain knowledge. Domain-specific terms are eliminated, and the most comparable LDA Dictionary words are used in place of OOV words. Two datasets with various data categories were used in the experiment. On both datasets, the proposed model performs better than alternative models. By using the suggested framework, Accuracy, Purity, Precision, Recall, and F1- scores were all improved.
##plugins.themes.bootstrap3.article.details##
This work is licensed under a Creative Commons Attribution 4.0 International License.