Feature Selection Approach for Sentiment Analysis using Machine Learning
Text mining has been used in a broad range of systems such as text summarization, classification of texts, retrieval of named entities, or opinion and emotional assessment. Classification of text is the job of assigning to free-text documents user defined classifications. That is, it is a method of supervised learning. While the feasible classifications are unknown in text clustering (sometimes referred to as file clustering) and need to be recognized by combining documents. Document clustering is used to group data files into appropriate subjects. Each of these groups is referred to as clusters. It is a method of unsupervised learning. Its high dimensions is the main challenge in document clustering. Efficient algorithms are required, that can resolve this large-scale cluster. For efficient categorization of texts, the large dimension of information is a huge task. Each document in one corpus of documents includes very irrelevant, noisy data that ultimately decreases text categorization effectiveness. Most methods of categorizing texts decrease this high amount of characteristics by removing stopwords or stems. This is in some way efficient, but the residual characteristics are still enormous. For efficient text categorization it is important to use the selection of functions to handle the high dimensionality of data. The selection feature in the text classification relies on the identification of data appropriate to the classifier without influencing its precision. This article provides an overview of the techniques of function choice.