摘要: | Recently, there is an increasing interest in sentiment analysis and its applications in various fields. The purpose of this research is to analyze the potential emotion by means of sentiment analysis into large volume of financial texts to predict the stock-index trend. In the recent researches into sentiment analysis, supervised methods were proven to be able to reach the high accuracy of classification, but the input data of the methods should be pre-processed manually and they couldn’t discover the unknown classes. This research put forward a hybrid solution which combines supervised and unsupervised methods. Firstly, we introduce the unsupervised methods to find out the themes, viz. the topics within the documents. Then, we calculate the sentiment index to judge the document’s emotional direction. After that, we figured out which topics within the documents’ sentiment index are leading indicators in Taiwan’s electronic sub-index (TE). Finally, we used supervised methods by combining the sentiment index of leading indicators with the other 24 indirect sentiment indexes to build the classification model of TE. As a result, we found that Latent Dirichlet Allocation (LDA) method has a better clustering performance than that of TFIDF-Kmeans, and also outcomes a higher accuracy for clustering topics than the NPMI-Concor. By comparing sentiment index with MACD, i.e. a technical index that most investors of in the stock market use, we proved that the coincidence of sentiment index and TE is higher than that of MACD line and TE. We also found that the sentiment indexes from topics of business management and macro-economics are leading indicators and the prediction model of TE which comprises the sentiment index is better than that only comprises the technical indicators. At last, we employed supervised methods viz. kNN, Naïve Bayes, SVM, and Logistic regression, to build the classification models and found that SVM has the best performance with respect to precision, recall and f-measure. The study is the first of its kind thus far. |