Access the full text.
Sign up today, get DeepDyve free for 14 days.
Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis KELVIN DU, School of Computer Science and Engineering, Nanyang Technological University, Singapore FRANK XING, Department of Information Systems and Analytics, National University of Singapore, Singapore ERIK CAMBRIA, School of Computer Science and Engineering, Nanyang Technological University, Singapore Combining symbolic and subsymbolic methods has become a promising strategy as research tasks in AI grow increasingly complicated and require higher levels of understanding. Targeted Aspect-based Financial Sentiment Analysis (TABFSA) is an example of such complicated tasks, as it involves processes like information extraction, information speciication, and domain adaptation. However, little is known about the design principles of such hybrid models leveraging external lexical knowledge. To ill this gap, w deine e anterior, parallel, and posterior knowledge integration prop and ose incorporating multiple lexical knowledge sources strategically into the ine-tuning process of pre-trained transformer models for TABFSA. Experiments on the FiQA Task 1 and SemEval 2017 Task 5 datasets show that the knowledge-enabled models systematically improve upon their plain deep learning counterparts, and some outperform state-of-the-art results reported in terms of aspect sentiment analysis error. We discover that parallel knowledge integration is the most efective and domain-speciic lexical knowledge is more important according to our ablation analysis. CCS Concepts: · Computing methodologies→ Natural language processing; Neural networks; · Information systems → Information retrieval . Additional Key Words and Phrases: inancial sentiment analysis, neural networks, knowledge enabled system, deep learning, transformer models 1 INTRODUCTION Sentiment analysis analyzes people’s sentiments, attitudes, opinions, emotions, evaluations, and appraisals towards various entities such as events, topics, services, products, individuals, organizations, issues, and their attributes38 [ ]. The main objective of sentiment analysis is to classify the polarity of a given piece of text, which can be performed at the document [16, 56], sentence [53, 81], or aspect level24 [ , 50]. The objective can be more challenging when sentiment analysis is applied to professional language domains, such 45].asEarly inance [ research in FSA primarily focused on the document- or sentence-level sentiment polarities. Homor weveer, it is common for a single sentence to have multiple targets or aspects with diferent polarities for sentiment analysis of inancial texts. Targeted Aspect-based Financial Sentiment Analysis (TABFSA), which aims to extract entities and aspects and detect their corresponding sentiment in inancial texts, is thus a challenging but pragmatic task. The task involves target-aspect identiication as well as polarity detection. Three examples of TABFSA are provided in Fig. 1. For the two examples in (a) and (b), sentence-level sentiment analysis will assign a polarity value over the complete text, and mostly the opposite sentiment will nullify each other, resulting in overall neutral sentiment. Authors’ addresses: Kelvin Du, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore, zidong001@e.ntu.edu.sg; Frank Xing, Department of Information Systems and Analytics, National University of Singapore, Singapore, Singapore, xing@nus.edu.sg; Erik Cambria, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore, cambria@ntu.edu.sg. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2023 Copyright held by the owner/author(s). 2158-656X/2023/1-ART https://doi.org/10.1145/3580480 ACM Trans. Manag. Inform. Syst. 2 • Du, Xing, and Cambria. London open: Taylor Wimpey and Ashtead drive markets higher, Barclays falls (a) Example of multiple targets, single aspect and their sentiments Taylor Wimpey Market Positive Ashtead Market Positive Barclays Market Negative J&J raises dividend but cuts 2020 earnings outlook over coronavirus outbreak (b) Example of single target, multiple aspects and their sentiments J&J Dividend Positive J&J Earning Outlook Negative Whitbread boss Andy Harrison defends sales fall as 'just a blip' (c) Example of single target, single aspect and its sentiment Whitbread Sales Negative Fig. 1. Example sentences with their target companies (blue), aspects (orange), and associated polarities detected. In contrast, TABFSA framework will provide positive sentiment to target łTaylor Wimpery" and A ł shtead" for the market aspect and negative sentiment to target łBarclays" for the market aspect (Fig. 1). Similarly, a positive sentiment will be assigned to target łJ&J" for the dividend aspect, but a negative sentiment for the earning outlook aspect. From this perspective, TABFSA has its practical signiicance. There is a great deal of professionalism involved in inance and it is vital that the information used in inance is accurate and precise. Otherwise, the wrong decision would be made which could result in economic losses. TABFSA can enhance the quality of inancial sentiment analysis, which is critical for downstream applications, such as inancial forecasting and inancial decision-making. It is common for two entities to have opposite sentiments in one sentence, for example. In this case, a market prediction based on sentence-level sentiment is inaccurate, but TABFSA can address this issue and extract sentiment for each entity for subsequent market predictions. There are two main sub-tasks for Targeted Aspect-based Sentiment Analysis (TABSA): the irst sub-task is to extract aspects mentioned in the sentence, and the second is to detect the sentiment for the corresponding targets and aspects. Generally, aspects can be extracted through frequency-based, syntax-based, unsupervised, and supervised machine learning methods, while sentiment polarity can be classiied through lexicon-based or supervised machine learning approaches 62].[Current methods may not require large-scale labeled data to generate predeined aspects. Instead, aspects are learned from a few keywords as supervision 29]. The [ aspect extraction and sentiment detection sub-tasks could be performed either in a separate 61] or a[joint manner70 [ ]. TABSA has been studied and performed for various domains such as movies, products, hotels, restaurants and healthcare, but it remains much unexplored in the inance domain except for a few commercial26 pr].oducts [ We resort this observation to the following three reasons. Firstly, as previous literature has pointed out 2, 41[], there is a lack of high-quality and large-scale open source inance domain-speciic annotations. The research in ine-grained inancial sentiment analysis has only gained more attention after the release of the łSemEval 2017 Task 5" and łFiQA Task 1" datasets.Secondly, lexical resources are limited and scattered. Since inance is ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 3 a highly professional domain, general-purpose sentiment lexicons usually fail to consider the domain-speciic connotations and the heavy reference to prior knowledge. For example, a word like łliability" is considered negative in general-purpose sentiment analysis but is frequent and has a neutral meaning in the inancial context. This makes it diicult to generalize the sentiment classiiers and underlines the need for inance domain-speciic sentiment analysis 43].[ Lastly, sentiment intensity scores are more consequential and nuanced for inancial sentiment analysis than other domains. Whereas most of the current TABSA studies still adopt a polarity detection fashion (i.e., classiication to positive or negative). We propose knowledge-enabled (k-) transformer models to address the aforementioned challenges, which aims to answer the following research questions: (1) Can integration of lexical knowledge improve the performance of pre-trained language models in TABFSA tasks? (2) The methods to integrate knowledge into the ine-tuning process of pre-trained language models can be generally categorized into three types: anterior, parallel, and posterior integration. When multiple sources of lexical knowledge is provided, among anterior, parallel or posterior integration which is a more efective approach to incorporate knowledge? (3) To improve the domain application of pre-trained language models, one method adopted by researchers is to train domain-speciic pre-trained language models such as FinBERT but it requires large domain-speciic corpus and considerable computing resources. Does incorporation of inancial knowledge produce better model performance than retraining of inance domain-speciic language models in TABFSA task? In particular, our contributions can be summarized from four perspectives: (1) We deinedanterior, parallel, and posterior knowledge integration and conducted extensive experiments to examine the best approach to incorporate multiple lexicon knowledge into the ine-tuning process of transformer models and identiied that the parallel approach is more efective in combining multiple lexical knowledge sources and pre-trained language models. (2) We proposed incorporating heterogeneous sentiment knowledge (both from domain-speciic and general- purpose lexicons) into the ine-tuning process of pre-trained transformer models and demonstrated its efectiveness in complementing all the model training. (3) We demonstrated that the incorporation of lexical knowledge produces better model performance than retraining of inance domain-speciic language models in TABFSA. The lack of knowledge in the FSA task makes knowledge integration valuable. We achieved the best results to our knowledge over strong benchmark models on the two ine-grained inancial sentiment analysis datasets, i.e., SemEval 2017 Task 5 and FiQA Task 1. 2 RELATED WORK 2.1 Financial Sentiment Analysis FSA is a powerful tool for inancial forecasting and decision making. The application scenarios include corporate disclosures, annual reports, earning calls, inancial news, social media interactions, 68, and 71, 74 mor ]. Many e [ exciting observations have been reported, e.g., negative sentiment predicts short-term returns and volatility 32, [ 72], and strong sentiments for both directions seem to be more pronounced in fraudulent company rep 22orts ]. [ Luo et al. [44] categorized inancial sentiment indicator into market-derived and human-annotated sentiments. The market-derived sentiments were computed from market dynamics, such as price movement and trading volume, thus may include noise from other sources. In this study, we investigate the subjective human-annotated sentiments, which were speciically labeled by professionals 48] or inv [ estors themselves73[]. Instead of sentence- level sentiment polarity annotations, such as from the Financial PhraseBank 48], we fo[ cus on more ine-grained inancial sentiment analysis datasets with targeted and targeted aspect-based sentiment intensity scores, i.e., SemEval 2017 Task 5 by [3] and FiQA Task 1 by47 [ ] to the best of our knowledge. They are more useful for market ACM Trans. Manag. Inform. Syst. 4 • Du, Xing, and Cambria. predictions as the opposite sentiment expressed in one news headline for diferent targets tends to drive their market movement to the opposite direction. In the remainder of this section, we review TABSA techniques and their performances, experimented with these two datasets, including lexicon-base 75], machine d [ learning-based or deep learning-based methods, and hybrid methods. 2.2 SemEval 2017 Task 5 The SemEval 2017 Task 5 contains two tracks: News Statements/Headlines and Microblog Messages. The news headlines were crawled from diference sources such as Yahoo Finance and the microblog messages were 2 3 obtained from StockTwits and Twitter. The evaluation of sentiment score prediction is based on weighted cosine similarity, which aims to compare the proximity between predicted results and gold standard. The FSA techniques in earlier studies include lexicon-based, machine learning-based, deep learning-based methods, and hybrid methods. Lexicon-based methods detect the sentiment of the text by analyzing the semantic orientation of the words in the text. For example, the general-purpose lexicon-based sentiment analyzer includes Te42 xtBlob ], [ SnowNLP and SentiWordNet4[]. Machine learning approaches construct features and use classiication or regression algorithms to determine sentiment. In contrast, deep learning approaches construct complicated representations from textual data with a high level of abstraction, using the Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) and their variants and have achieved remarkable performances in sentiment analysis. In the microblog messagestrack, an ensemble of various regressors (i.e. AdaBoost, Bagging, Random Forest, Gradient Boosting, LASSO, Support Vector Regression and XGBoost) based on linguistic, sentiment lexicon, domain-speciic features and word embeddings [33] ranks irst (Cosine=0.778), followed by a hybrid of deep learning and lexicon-based technique that combined CNN, LSTM, feature-driven MLP, and vector-averaging MLP, proposed by[20] (Cosine=0.751). In thenews headlines track, CNN-based methods performed well. The highest score (Cosine=0.745) was reported by 49[], which combined Glo58 Ve[] and DepecheMood[65] to represent words and fed into CNN followed by global max-pooling. The output was then concatenated with30 VADER[ ] sentiment scores for two levels of dropout and fully-connected lay 34ers. ] combine [ d the representations learned from CNN and bidirectional GRU with an attention mechanism with hand-engineered lexical, sentimental and metadata features, and obtained weighted cosine similarity scores of 0.723 and 0.744 for Microblogs and News Headline tracks, respectively. Later 1],pr [ esented a method ensembling results generated from LSTM, CNN, Gated Recurrent Unit (GRU) and SVM, using a Multi-Layer Perceptron (MLP), and achieved the state-of-the-art performance for microblogs data (Cosine=0.797) and news headlines (Cosine=0.786). Recently, MetaPr 51] o [ was proposed to improve inancial news headline sentiment analysis by identifying and interpreting metaphors. Metaphors commonly appear in the inancial text, causing errors for classiiers. By paraphrasing metaphors into their literal counterparts, three state-of-the-art sentiment classiiers achieved large improvements. 2.3 FiQA Task 1 The FiQA (Financial Opinion mining and Question Answering challenge) Task 1 measures sentiment prediction performances mainly with Mean Squared Error (MSE) and aspect identiication with F1-Score, which is diferent from SemEval 2017 Task 5. Since FiQA Task 1 provided both target and aspect labels, in addition to machine learn- ing and deep learning methods, many hybrid-based and pre-trained language models were also proposed to solve the target-aspect identiication problem. In particular 12] establishe , [ d a strong baseline with a traditional feature engineering-based machine learning approach (MSE=0.0958) by treating aspect extraction as a classiication task http://inance.yahoo.com/ http://stocktwits.com/ https://twitter.com https://pypi.org/project/snownlp/ ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 5 and sentiment detection as a regression task, using Support Vector Classiier (SVC) and Support Vector Regres- sor (SVR), respectively. The generated features included n-gram, tokenization, word replacements, and word embeddings using Word2Vec and Term Frequency-Inverse Document Frequency (TF-IDF). When target-aspect identiication is jointly considered, the biLSTM-CNN prop 31ose ] trdeate byd[ aspect extraction as a multi-class classiication problem, as this task does not involve multiple aspects for one target. First, it adopted bidirectional LSTM to extract aspects using word embeddings such as Glo58 Ve[], Google-News-Word2Vec[54], Godin[ 21], FastText[7], and Keras in-built embedding layer. Meanwhile, a multi-channel CNN is used for sentiment analysis task with enhanced vector combined from the dependency tree, sentence word vector and snippet and target vector. The Bayesian optimization was used for hyperparameters tuning to ind out the most optimal parameters. This method achieved 0.69 F1 for aspect extraction and 0.112 MSE for sentiment analysis. This result was further pushed forward by an ensemble approach with an MSE of 0.0926 60].[ This method ensembled CNNs and RNNs with a voting strategy and a ridge regression for aspect and sentiment predictions. Regarding pre-trained language models, embeddings from Language Models (ELMo) 59] and [ ULMFiT27 [ ] are suitable for TABFSA. Yang et al. 78][reported a good MSE of 0.08, using ULMFiT on the FiQA Task 1. With ULMFiT, users can transfer word representations with vectors and use a single pre-trained model architecture called AWD-LSTM for all intended tasks. ULMFiT difers from ELMo, in that ELMo requires users to concatenate the outputs of each trained layer ultimately and then use the resulting ixed embeddings to perform downstream tasks, while ULMFiT ine-tunes a whole language model to its target domain, then connects it directly to downstream tasks. A more recent ine-tuned language model FinBERT 2] [reported the best performance (MSE=0.07, R =0.55) on the FiQA Task 1. 2.4 Knowledge Incorporation The incorporation of lexical knowledge is demonstrated to be useful in sentiment 9, 37 analysis ] and low- [ resource learning tasks52[]. However, it was most commonly used as an input for RNNs 5, 46 [ ] and CNNs [63] but not much explored for the large pre-trained language model ine-tuning process. Although BERT-liked pre-trained language models are capable of capturing general language representations, they lack domain-speciic knowledge [39]. To improve the domain application of pre-trained language models, researchers have attempted to pre-train domain-speciic language models such as FinBERT 2, 41], [although the process requires a large domain-speciic corpus and signiicant computing resources. Concurrently, other research has been conducted to study the techniques to incorporate domain-speciic knowledge, such as knowledge graphs, into the pre-trained language model ine-tuning process in recent years 23].[For example, [39] integrated knowledge graph, a hybrid language and knowledge resource namely HowNet15[], and a never-ending Chinese knowledge extraction system called CN-DBPedia 76[], by introducing a sentence tree for sentiment analysis, semantic similarity, Question & Answering (Q&A) and Named Entity Recognition(NER) tasks. The results show that the selected knowledge graphs have no signiicant efect on the sentiment analysis but better performance in semantic similarity, Q&A and NER tasks. To improve the performance of aspect-based sentiment analysis, 82] le[veraged sentiment knowledge graph to provide external domain knowledge for BERT. Nevertheless, few of the earlier research attempted to incorporate lexicon knowledge, e.g., a common knowledge base available for many domains, into pre-trained language models. In most cases, the knowledge is also single-sourced and hence does not deal with redundancy and contradiction problems. 3 OUR APPROACH Our study focuses on the sentiment detection part, not the target-aspect identiication part of TABFSA, where the model is trained to predict the sentiment score given inancial news headlines and posts and their corresponding https://keras.io/layers/embeddings/ ACM Trans. Manag. Inform. Syst. 6 • Du, Xing, and Cambria. targets and aspects. With the aim being a comprehensive framework to utilize external knowledge, we search for an efective coupling of deep text representations and multiple knowledge sources individually developed. 3.1 Transformer Models Although BERT is famous for TABSA 2, 70 [ ], they would sufer from a pre-train ine-tuning discrepancy because the dependency between masked positions are neglected during the training phase. XLNet 80], which [ is an extension of the Transformer-XL model, on the other hand, can address this issue by using an autoregressive method to learn the bidirectional contexts. Experiments show that XLNet has signiicantly outperformed BERT in 20 NLP tasks, including sentiment analysis and question answering. RoBERTa or Robustly optimized BERT approach is another widely used pre-trained language mo19 del ], which [ is proposed by 40[], is a replication study of BERT pre-training. It proposed an improved way of training BERT which includes (1) longer model training, with larger batches and byte-level Byte-Pair Encoding (BPE), over more data; (2) training on longer sequences; (3) removal of the next sentence prediction objective; and (4) changing the masking pattern applied to the training dynamically. In our study, we have adopted BERT, XLNet, and RoBERTa as baseline models to examine the efectiveness of knowledge incorporation into transformer models. 3.2 Lexical Knowledge Due to the lack of high-quality lexical resources for inancial text, we incorporate multiple knowledge sources following three criteria: (1) both inancial domain-speciic and general-purpose lexicons are selected to balance precision and coverage; (2) the lexicons selected cover both sentiment and more ine-grained emotion knowledge; (3) lexicons that are created from social media text such as tweets and microblogs are purposely chosen for the sake of similar language style to our evaluation datasets. In sum, we consider three inance domain lexicons plus six general-purpose lexicons. Finance domain lexicons consist of HFD, LM, and SMSL. HFD (Henry’s Financial Dictionary) includes 104 positive and 85 negative words, and is the irst dictionary explicitly created for the inancial domain. It is used to measure the tone of earnings press releases, which are an essential element of the irm-investor communication process [25]. HFD has been widely used for inancial sentiment analysis. The weakness of HFD is its limited number of words and low coverageLM . (Loughran and McDonald) sentiment word list is created from the annual reports released by irms and includes 354 positive, 2,355 negative, 297 uncertainty, 904 litigious, 19 strong modal, 27 weak modal, and 184 constraining words 43].[ To our knowledge, LM is the most commonly used lexicon created for the inance domain. SMSL (Stock Market Sentiment Lexicon) is created from labeled StockTwits: tweets from a microblogging platform specialized in the stock market. SMSL includes 20,550 words and phrases and shows competitive results in measuring investor sentiments [57]. In addition, general-purpose lexicons are used to increase coverage, which comprise SenticNet 8], VADER[[30], GI [66], NRC [55], OPL [28] and MPQA [69]. SenticNetis a general-purpose sentiment knowledge base with 200,000 commonsense concepts in its latest version 8]. Each [ concept in SenticNet is associated with rich di- mensional emotion information (see Fig. 2). For example, in senticnet[’sales_fall’] record = [’-0.78’, ’-0.84’, ’0’, ’0’, ’#sadness’, ’#anger’, ’negative’, ’-0.81’], ‘-0.78’ is the introspection (joy-versus-sadness) value, ‘-0.84’ is the temper (calmness-versus-anger) value, ‘0’s are the attitude (pleasantness-versus-disgust) and sensitivity (eagerness-versus-fear) values, ‘#sadness’ is the primary mood, ‘#anger’ is the secondary mood, ‘negative’ is the polarity label and ‘-0.81’ the polarity value VADER . is speciically tailored for sentiments expressed in social me- dia30 [ ]. It records 7,520 emoticons, emojis, and words and their sentiment scor GIes. (Harvard General Inquirer) is an early lexicon for text analysis: its basic spreadsheet has 11,788 entry words and their attributes, including positive, negative, strong, weak, active, and passiv66 e ]. etcThe [ positive and negative attributes of all words are used for our study.NRC Emotion Lexicon 55[] includes 14,182 words, their associated sentiments (binary), and ACM Trans. Manag. Inform. Syst. serenity annoyance anxiety responsiveness Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 7 emotion labels (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust). OPL (The Opinion Lexicon) is created by [28] which includes 2,006 positive and 4,783 negative words. Finally MPQ,Athe (Multi-perspective Question Answering) Subjectivity Lexicon develop69 ed] by has[ 8,222 words along with their POS tagging, polarity (positive, negative or neutral) and intensity (strong or weak). speciic induced SENSITIVITY ATTITUDE two key path under connects TEMPER INTROSPECTION algorithm’s output transition diametri- ecstasy bliss calculated end to the enthusiasm delight calmness joy are both pleasantness eagerness Towards w-intensity (al- nearest morphism. If only as- , and path (e.g., disgust fear sadness anger loathing terror key polar the con- grief rage and the els (posi- Emotions56 [ ], consist- INTROSPECTION TEMPER endent but assigned ATTITUDE SENSITIVITY concept to wise, est emo- Fig. 2. Emotion dimensions included in SenticNet. 3.3 Knowledge-enabled Transformer Models We categorize the methods to integrate knowledge into three types: anterior, parallel, and posterior integration. 3.3.1 Anterior Integration. Anterior knowledge integration is the most popular approach, which is to study how to incorporate the knowledge into the sentence, such as forming a sentence tree with the branch being the incorporated knowledge and feeding it into transformer models like BERT. Anterior knowledge integration augments sentences with richer sentiment information and can be helpful for model training and ine-tuning. The main challenge of anterior integration is the potential risk of changing the meaning of the original sentence, and thus both [39] and [82] have introduced the soft-position and visible matrix to limit the impact of knowledge. We adopted the techniques introduced by 39[] and [82] and studied the anterior integration of lexicon knowledge into inancial sentiment analysis as a baseline. First, the sentence tree is constructed as shown in�Fig. 4, where represents tokens in a sentence and� and � represent the queried lexical knowledge�for , which could be �1 �2 � łis positive", łis neutral" or łis negative" for instance. Fig. 5 has illustrated how the embedding representation is generated from a sentence tree. The soft-position index is represented by the red number and the hard-position index is signiied by the green number in the sentence tree. The token embedding is formed by lattening the tokens in the sentence tree into a sequence of tokens by their hard-position index. The position embedding is generated from the soft-position index along ACM Trans. Manag. Inform. Syst. acceptance d ke isli contentment melancholy 8 • Du, Xing, and Cambria. BERT-base OR Financial News FinBERT-base Average Headlines/Posts Pooling … … OR XLNet-base OR Auxiliary Sentences Containing Targets RoBERTa-base Dense Dense and Aspects Layer Layer CNN Lexical Knowledge D K C C K Z P Embedding LSTM Mutual Attentive Concatenate Convolution Chunk Max Dropout Concatenation Information CNN to refined knowledge with different Pooling and selector to generate embedding K with kernel size to concatenate to filter D and contextual contextual generate Z form P generate embedding embedding C to refined C form a two- knowledge channel contextual embedding K knowledge embedding Fig. 3. Architecture of the proposed knowledge-enabled transformer models. Fig. 4. Construction of Sentence Tree. with the token embedding. The tokens in the original sentence are tagged as A, while the tokens in the auxiliary sentence are tagged as B for segment embedding. 3.3.2 Parallel Integration. Parallel Integration aims to develop a diferent model architecture for the knowledge base and train in parallel with pre-trained language models. Our parallel integration model architecture is illustrated in Fig. 3. Speciically, BERT-base-cased (12-layer, 768-hidden, 12-heads, 109M parameters), XLNet- base-cased (12-layer, 768-hidden, 12-heads, 110M parameters) or RoBERTa-base (12-layer, 768-hidden, 12-heads, 125M parameters) is used to generate deep text representations. For the TABSA task, the input to the large language models (BERT/XLNet/RoBERTa) is a sentence pair. We use the same notations as67by ], that [ are � = {� ,� , ...,� } for a inancial news headline or post sentence� , and ��(�) for its auxiliary sentence containing 1 2 � the corresponding targets and aspects. The auxiliary sentence takes a format what of ł do you think of aspect for target?" or ł what do you think of target?" Then, the input is in a format of: ł[�cls] [sep] ���(�) [sep]" for ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 9 Fig. 5. The process of converting a sentence tree into an embedding representation for BERT. BERT, and ł<s> � </s> </s> ���(�) </s>" for RoBERTa and ł� [sep] ���(�) [sep][cls]" for XLNet. The output 768×1 � ∈ R is average-pooled from the last hidden state. In terms of external knowledge embedding, the nine selected lexicons are processed and forme master d as a dictionar,ywhere the key is a word or phrase, and the value is a list of associated sentiment and emotion scores. In our study, the master dictionary has 212,109 words and phrases, where each has 25 scores . The scores are normalized to [−1, +1] via min-max scaling, wher −1 eand +1 represent the most extreme sentiments. As an example, the score for łhappy" and łstrong" are shown in Fig. 6. The word łhappy" has not only sentiment but also emotion score, in contrast to łstrong" which usually carries only sentiment score. For each � ∈ wor �,d the external knowledge embedding � (� ) is looked up from such dictionary, and in case the word is not found, returned with zeros. The coverage of master dictionary by lexicon on SemEval 2017 Task 5 and FiQA Task 1 dataset are summarized in Fig. 7 Fig. 6. Examples from Master Dictionary. Because the original knowledge embedding uses lexicons individually developed, it contains conlicting information and noise from alien language styles. To this end, we further apply feature selection techniques to training data to reine the most relevant knowledge for the learning process. We experimented with two popular methods to rank the feature importance, i.e., using random forest regressor 17] or [ mutual information 6, 64 [ ]. Random forest measures the mean decrease in impurity, while mutual information captures various forms of dependency between variables, which is diferent from F-test, which captures linear dependency only. Mutual information is non-negative and a larger value represents higher dependency between the variables. We estimate mutual information based on entropy estimates �fr-near om est neighbor distances� = ( 3), as a larger� could introduce bias 36[]. Mutual information is chosen in our study 18]as sho [ ws that the mutual information criterion Among those, 9 dimensions are contributed by SenticNet, 7 by NRC, and 1 by each else lexicon. ACM Trans. Manag. Inform. Syst. 10 • Du, Xing, and Cambria. Fig. 7. Statistics of Coverage by Master Dictionary. % refers to the percentage of records in the dataset which have sentiment or emotion score from that lexicon. can select features that minimize MSE and MAE in regressions. As illustrated in Fig.3, the mutual information selection is performed on lexical knowledge emb� edding during the training process. Each text in the training dataset can be represented by a 25-dimensional embedding and also has a corresponding ground truth sentiment score. Our method involves computing the mutual information between ground truth sentiment scores and each of the 25 dimensions during training, ranking the importance of the dimensions, and selecting only lexical scores that have higher mutual information with ground truth sentiment score in the training dataset. For instance, the original lexical knowledge embedding of 25 dimensions will be reduced to 5 dimensions if the number of lexicons to be selected is set to 5. During this process, a reined knowledge embedding � will be generated, which will then be fed into an attentive CNN. � ×� The reined knowledge embedding for each sentence�is (�) ∈ R , where � > �is the maximum length of the sentences and � is the number of sentiment and emotion scores across selected lexicons : (� −�)×� � (�) = � ⊕ � ... ⊕ � ⊕ 0 . (1) � � � 1 2 � Inspired by the literature on implementing the attention mechanism in deep neural networks, for�each word we generate a context vector � using the attention layer to determine which word and lexicon should have more � ×� emphasis, and thus each sentence also has a contextual embedding �(�) ∈ R . � = � · � (2) � �,� � �≠� In our case the optimal � ranges from 3− 8 out of the 25 dimensions. ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 11 The attention weight � can be obtained by normalizing the score of a word�pair (� ,� ) from a MLP through �,� � � softmax function, where giv�en = �− �− 1 and � ∈ [0, 1), a decay factor to penalize the output score for reducing the impact of noise information that would be produced when the length of the sentence grows: � � �(� ,� ) = (1 − �) · � ���ℎ(� [� ⊕ � ]) (3) � � � � � After that, we experimented with two methods to perform the subsequent convolution. 1-Dimensional Convolution. For 1-dimensional convolution, the knowledge embedding is concatenated with contextual embedding directly to form a 1-channel embedding and feed into CNNs to generate feature represen- tation. The convolving kernel sizes are set � =to2, 3, 4, 5, and the number of ilters� is experimented with 4. 2� ×ℎ Each convolution involves a ilter � ∈ R , where � is the total number of sentiment and emotion scores across lexicons and ℎ is the number of words for a sliding window. A new featur � , wher e e � ∈ [1, 2, ...,� − ℎ + 1], is generated from a sliding window over w�ords as: ( �:�+ℎ−1) � = � · � + � , (4) � � � � ( �:�+ℎ−1) where � is a bias term. � � (1)×(� −ℎ+1) The convolved feature vector� ∈ R is represented by: � · · · � � = . (5) 11 1(� −ℎ+1) The convolved features� are activated by the ReLU function and chunk-max-pooled. The pooled feature maps �× � are concatenated to form� ∈ R , where � is the length of pooled vector and � ∈ [1, 2, 3, 4]. Convolution Kernel SMSL SenticNet VADER … NRC HFD SMSL SenticNet VADER … NRC HFD Lencore Lencore Start Start Position Position cuts cuts 2015 2015 budget budget plans plans to to divest divest from from End End Position Position Lonmin Lonmin 1-D Convolution 2-D Convolution Fig. 8. 1-Dimensional and 2-Dimensional Convolution. 2-Dimensional Convolution. The diference between 1-dimensional and 2-dimensional convolution is illustrated in Fig. 8. Unlike 1-dimensional convolution, which only captures global similarities and diferences of lexicons, the 2-dimensional convolution also can capture local characteristics of the most efective lexicons that are adjacent to each other. For 2-dimensional convolution, we concatenate the knowledge embedding with contextual embedding to form a 2-channel embedding and feed it into CNNs to generate feature representation. The convolving kernel sizes are set to� = (3, 2), (3, 3), (3, 4), (3, 5), and the number of ilters� is experimented to be 4. Each convolution �×ℎ involves a ilter � ∈ R , where � is the number of lexicon scores and ℎ is the number of words for a sliding ACM Trans. Manag. Inform. Syst. 12 • Du, Xing, and Cambria. window. A new featur�e , where � ∈ [1, 2, ...,� − � + 1] and � ∈ [1, 2, ...,� − ℎ + 1], is generated from a sliding � � window over words� as: (�:�+�−1,�:�+ℎ−1) � = � · � + � , (6) � � � � � � � (�:�+�−1,�:�+ℎ−1) where � is a bias term. � � (� −�+1)×(� −ℎ+1) The convolved feature vector� ∈ R is represented by: � · · · � 1(� −ℎ+1) � · · · � 2(� −ℎ+1) � = . (7) . . . . . . . � · · · � (� −�+1)1 (� −�+1) (� −ℎ+1) The convolved features� are activated by ReLU function and global-max-pooled along �but chunk-max-pooled �× � along�. The pooled feature maps are concatenated to form� ∈ R , where � is the length of pooled vector and � ∈ [1, 2, 3, 4]. Subsequently, a second convolution is applie �dtotofurther extract and downsize features, and in parallel LSTM is used to extract the sequential information. This way, we difer 35]fr byom using [ attentive convolution and chunk max pooling followed by additional CNN and LSTM layers in parallel to extract features further. � ×1 � ×1 ��� � The output � ∈ R from CNN and� ∈ R from LSTM are concatenated with� from a transformer model, where� is the number of channels produced by the last convolution�and is the dimension of the ��� � last hidden state of LSTM. Finally, this representation is passed through two linear layers with dropouts and sizes of (768+ � + � , 768 + � + � ), (768 + � + � , 1). The output is, therefore, in a format of: ��� � ��� � ��� � � = � · � [� · tanh(� ⊕ � ⊕ � ) + � ] + � , (8) 2 1 1 2 where � ,� ,� ,� are weights and bias terms to be optimized with MSE loss. 1 2 1 2 To enable a fair comparison, all other settings are kept the same as the vanilla transformer models, except for the knowledge integration component. 3.3.3 Posterior Integration. Posterior integration is deined as the addition of knowledge to the output embedding from transformer models. The most straightforward approach is a direct concatenation without further processing, which is formulated as follows and used in our study as a baseline: � = � · � [� · tanh(� ⊕ � ) + � ] + � , (9) 2 1 1 2 where � is the output from the transformer model, � is the reined lexical knowledge embedding, � and ,� ,� ,� 1 2 1 2 are weights and bias terms to be optimized with MSE loss. 3.4 The Catastrophic Forgeting Problem A critical issue in ine-tuning of pre-trained language models is the variability in error between diferent runs with the same coniguration but diferent random seeds. Catastrophic forgetting and small training data size are two hypotheses for the origin of ine-tuning instability 13, 14]. To[deal with the catastrophic forgetting problem, Howard and Ruder [27] proposed three training techniques: slanted triangular learning rates, gradual unfreezing, and discriminative ine-tuning. A more recent m10 eth ],oho d [wever, recalls knowledge from pre-training without the original data by using pre-training simulation mechanism and learns downstream tasks gradually by using an objective shifting mechanism. Speciically, it applies the idea of multi-task learning, which trains the model on the source and target tasks simultaneously to improve model performance with loss functions deined as follows: ���� = ����� + (1 − �)����, (10) � � � ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 13 where ���� is the loss function for the target task. ���� is the loss function for the source task. � ∈ (0, 1) is � � a hyperparameter balancing the two tasks.���� optimizes the negative log posterior probability of the model parameters �, given data of source tasks� . Similarly ���, � optimizes the negative log posterior probability of � � the model parameters, given data of target tasks � . The irst challenge for multi-task learning is that the pre-training data is unavailable, and thus pre-training simulation is introduced as a quadratic penalty between the model parameters and the pre-trained parameters to approximate the optimization objective of the source task: ∗ 2 ���� = −����(�|� ) ≈ � (� − � ) , (11) � � � 1 ∗ where � is the coeicient of quadratic penalty � is, the model parameters, and� is the local minimum of the parameter space of source task� . The next challenge is that the optimization objective of adaptation ���� , which is is inconsistent with multi- task learning. To address it, the objective shifting is introduced to allow the loss function ����togradually shift to with an annealing coeicient: ���� = �(�)���� + (1 − �(�))����, � � � 1 (12) �(�) = , 1 + exp(−� · (�− � )) where �(�) is computed as the sigmoid annealing function � ∈with (0, 1] and � being the parameters controlling the annealing rate and timesteps, respectively. This method has achieved state-of-the-art results on the benchmark datasets. Therefore, we apply this łrecall- and-learn" training strategy 10][ to prevent catastrophic forgetting in our ine-turning process for all pre-trained language models. 4 EXPERIMENTS 4.1 Datasets The SemEval 2017 Task 5 dataset was developed for ine-grained sentiment analysis on inancial news and microblogs11[]. The training data includes 1,142 inancial news headlines and 1,694 posts with their target entities and corresponding sentiment scores but without aspects labeled. The test data has 491 inancial news headlines and 794 posts. The task is to extract and detect the targets and their corresponding sentiment scores. The data were manually annotated by three independent inancial experts according to the annotation guidelines deined by [11]. The inal dataset was created by the fourth domain expert by consolidating the ratings. Inter- annotator agreements are used to assess the quality of the annotations. Speciically, for each pair of annotators, the Spearman’s Rank Correlation on sentiment scores was computed, and then averaged across them 11].[ An example is shown in the textbox below. "id": 2, "company (target)": "Morrisons", "title": "Morrisons book second consecutive quarter of sales growth", "sentiment": 0.43 The FiQA Task 1 dataset was from an open challenge 47],[ which consists of 498 inancial news headlines and 675 posts with manually annotated target entities, aspects, and corresponding sentiment scores. Although smaller than SemEval 2017 Task 5, FiQA Task 1 pre-deines four Level 1 aspects and 27 Level 2 aspects, as shown in Table 1. The task, therefore, is to extract and detect both the targets, aspects and their corresponding sentiment scores. The following box is an example from the FiQA Task 1 dataset. ACM Trans. Manag. Inform. Syst. 14 • Du, Xing, and Cambria. "sentence": "Royal Mail chairman Donald Brydon set to step down", "info": [ "snippets": "[’set to step down’]", "target": "Royal Mail", "sentiment_score": "-0.374", "aspects": "[’Corporate/Appointment’]" ] 4.2 Benchmarks We benchmark our knowledge-enabled models with plain BERT-base-cased, FinBERT-base-cased, XLNet-base- cased, and RoBERTa-base models. BERT variants67[, 70, 77] are chosen because many are developed for the (T)ABSA task and some achieved state-of-the-art results. Moreover, FinBERT 2] p[erformed further pre-training to address the domain-speciic language style and was ranked the irst for sentiment analysis on Financial PhraseBank . Similarly, the pre-trained language model input is a sentence pair, in which one sentence is the auxiliary sentence containing the target and aspect, and the other is the inancial news headline or post. The average pooling is performed on the last hidden state, followed by dropout. Finally, a last linear layer is added with the size of (768 × 1). The loss function minimizes MSE. 4.3 Other Experimental Details The FiQA Task 1 dataset is split into 90% for training and 10% for test by performing a 10-fold split. The validation dataset, which is 25% of the training data, is used to select the best model, and the test dataset is used to report the inal performance scores. Since the gold standard is not released, we perform a 10-fold cross-validation on two diferently-seeded runs for evaluation, and the mean score is reported. As for SemEval 2017 Task 5 dataset, it is split into 75% training and 25% validation to train the model 10 times with diferent random seeds, and the gold standard dataset is used to report the mean performance score. Our models are conigured and trained on an NVIDIA Tesla-P100-PCIe-16GB processor with a maximum of 100 epochs, an initial learning rate of 3e-5 with a linear schedule with warm-up strategy, and Recall Adam as the optimizer. 5 RESULTS & ANALYSIS In consistent with previous studies 11, 47[], we respectively report cosine similarities for SemEval 2017 Task 5, and MSE for FiQA Task 1 (see Table 2 and Table 3). Additionally, we include , which R measures the percentage of variance explained by the model under evaluation. Under all columns and metrics, RoBERTa and XLNet outperform https://paperswithcode.com/sota/sentiment-analysis-on-inancial-phrasebank Level 1 Level 2 Corporate Reputation, Company Communication, Appointment, Financial, Regulatory, Sales, M&A, Legal, Dividend Policy, Risks, Rumors, Strategy Stock Options, IPO, Signal, Coverage, Fundamentals, Insider Activity, Price Action, Buyside, Technical Analysis Economy Trade, Central Banks Market Currency, Conditions, Market, Volatility Table 1. Level 1 and Level 2 Aspects in FiQA dataset ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 15 Model Headline Post 2 2 Cosine R Cosine R Lexicon-based [42] 0.1861 0.033 0.3032 0.052 Regression ensemble [33] 0.7100 - 0.7780 - MLP ensemble [1] 0.7860 - 0.7970 - FinBERT [2] 0.7969 0.635 0.7817 0.570 FinBERT [79] 0.7798 0.609 0.7626 0.536 BERT 0.7935 0.630 0.7886 0.581 k-BERT (anterior) [39, 82] 0.7809 0.610 0.7614 0.535 k-BERT (posterior) 0.7958 0.633 0.7903 0.584 k-BERT (parallel-1D) 0.7971 0.636 0.7916 0.587 k-BERT (parallel-2D) 0.7969 0.635 0.7912 0.586 XLNet 0.8199 0.676 0.8031 0.608 k-XLNet (anterior) [39, 82] 0.8014 0.644 0.7754 0.560 k-XLNet (posterior) 0.8215 0.675 0.8075 0.616 k-XLNet (parallel-1D) 0.8249 0.681 0.8074 0.615 k-XLNet (parallel-2D) 0.8270 0.685 0.8074 0.615 RoBERTa 0.8430 0.710 0.8085 0.617 k-RoBERTa (anterior) [39, 82] 0.8140 0.664 0.7754 0.560 k-RoBERTa (posterior) 0.8380 0.703 0.8063 0.614 k-RoBERTa (parallel-1D) 0.8495 0.722 0.8113 0.623 k-RoBERTa (parallel-2D) 0.8483 0.721 0.8126 0.624 Table 2. Performance of proposed knowledge-enabled transformer models in comparison to the state-of-the-art approaches on SemEval 2017 Task 5. Boldface indicated the top 2 result. We transcribe the results reporte 33]dand in [[1]. ł-ž means not reported. BERT and by signiicant margins even before the integration of sentiment knowledge. This conirms RoBERTa and XLNet as more efective deep representation models than BERT for the TABFSA task. Knowledge-enabled RoBERTa achieves state-of-the-art results on both SemEval 2017 Task 5 (Cosine[h]=0.8495, Cosine[p]=0.8126) and FiQA Task 1 (MSE= 0.0490, R=0.711). Those metrics are circa 5% improvement from the previous best results on SemEval 2017 Task 5 by [1] (Cosine[h]=0.7860, Cosine[p]=0.7970), and circa 30% improvement from the previous best results produced by FinBERT on FiQA Task 12by ] (MSE= [ 0.07, R =0.55). From this perspective, the incorporation of inancial knowledge can produce better model performance than retraining of inance domain-speciic language models in TABFSA task. The results also show that overall parallel integration is more efective than posterior, which outperforms anterior integration. Notably, the anterior incorporation of multiple lexicon knowledge has decreased the model performance. In terms of 1D and 2D convolution in the parallel approach, they have produced comparable results, which means 1D convolution is more eicient because it requires less learnable parameters and training time. 5.1 Ablation Analysis Ablation analysis is performed to validate the external knowledge embedding module. The results of models trained with 10 diferent random seeds for various transformer models and knowledge-enabled transformer models are provided in Table 4 and Table 5, which shows the positive impact of knowledge integration on model performance and stability. ACM Trans. Manag. Inform. Syst. 16 • Du, Xing, and Cambria. Model MSE R Lexicon-based [42] 0.1720 0.040 DNN ensemble [60] 0.0926 0.414 ULMFiT ine-tuning [78] 0.0800 0.400 FinBERT [2] 0.0700 0.550 FinBERT [79] 0.0636 0.613 BERT 0.0651 0.601 k-BERT (anterior) [39, 82] 0.0738 0.549 k-BERT (posterior) 0.0634 0.610 k-BERT (parallel-1D) 0.0624 0.616 k-BERT (parallel-2D) 0.0628 0.615 XLNet 0.0549 0.665 k-XLNet (anterior) [39, 82] 0.0627 0.619 k-XLNet (posterior) 0.0522 0.693 k-XLNet (parallel-1D) 0.0538 0.669 k-XLNet (parallel-2D) 0.0532 0.674 RoBERTa 0.0548 0.677 k-RoBERTa (anterior) [39, 82] 0.0602 0.642 k-RoBERTa (posterior) 0.0546 0.668 k-RoBERTa (parallel-1D) 0.0499 0.705 k-RoBERTa (parallel-2D) 0.0490 0.711 Table 3. Performance of proposed knowledge-enabled transformer models in comparison to state-of-the-art approaches in sentiment analysis task on FiQA Task 1. Boldface indicated the top 2 result. We transcribe the results rep60 orte ], [d78in ] [ and [2]. It is observed that the integration of external knowledge has improved both accuracy and stability across benchmark models. The knowledge selection through mutual information has further improved the model performance. Speciically, the FiQA Task 1 data has reported a 4% improvement in MSE for BERT with a smaller standard deviation (SD). The knowledge-enabled RoBERTa has decreased the MSE by 10% from 0.0548 to 0.0490, although the model is destabilized slightly. As for SemEval 2017 Task 5, the knowledge-enabled RoBERTa has improved cosine similarities from 0.8430 to 0.8483 for headline data and from 0.8085 to 0.8126 for post data. We have also included the CNN approach proposed by 35[] and presented the results under k-RoBERTa (parallel-CNN w/ MI [35]). Overall, the proposed k-RoBERTa (parallel-2D w/ MI) still produces better or comparable results. Lastly, we have conducted a paired T-test between RoBERTa and k-RoBERTa (parallel-2D w/ MI), and the result shows that it is very signiicant for FiQA Task 1 (p=0.002), signiicant for SemEval 2017 Task 5 Headline (p=0.074) and marginally signiicant for SemEval 2017 Task 5 Post (p = 0.172). 5.2 Visualization of Atention We visualize the average-pooled contextual embedding � generated by k-RoBERTa (parallel-2D) in Fig. 9. A darker color means more attention is placed on the word. For example, the words cut" ł and div ł est" have been given more attention which generally signiies a negative sentiment in inance. On the other hand, words such as łapprove" and łlead" are typically positive sentiments in inance which are also placed more attention in our examples. Meanwhile, we also show the visualization of attention �(�scor ,� es ) produced by k-RoBERTa � � (parallel-2D) in Fig. 10. Similarly, each row of the matrix represents �,aand vector a darker green cell indicates that more attention is being paid to the word in the corresponding column. As illustrated in Fig. 10, the negative ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 17 Cosine Headline Post Similarity Mean Median SD Mean Median SD BERT 0.7935 0.7904 0.0096 0.7886 0.7850 0.0108 k-BERT (parallel-2D w/o MI) 0.7932 0.7932 0.0064 0.7889 0.7888 0.0118 k-BERT (parallel-2D w/ MI) 0.7969 0.7958 0.0072 0.7912 0.7932 0.0104 FinBERT 0.7969 0.7987 0.0093 0.7817 0.7823 0.0093 k-FinBERT (parallel-2D w/o MI) 0.7954 0.7977 0.0063 0.7822 0.7813 0.0086 k-FinBERT (parallel-2D w/ MI) 0.8009 0.8019 0.0069 0.7853 0.7839 0.0105 XLNet 0.8199 0.8186 0.0151 0.8031 0.8025 0.0110 k-XLNet (parallel-2D w/o MI) 0.8261 0.8260 0.0083 0.8067 0.8066 0.0108 k-XLNet (parallel-2D w/ MI) 0.8270 0.8261 0.0091 0.8074 0.8098 0.0090 RoBERTa 0.8430 0.8423 0.0080 0.8085 0.8082 0.0136 k-RoBERTa (parallel-2D w/o MI) 0.8462 0.8462 0.0048 0.8117 0.8116 0.0175 k-RoBERTa (parallel-CNN w/ MI [35]) 0.8455 0.8481 0.0090 0.8128 0.8122 0.0110 k-RoBERTa (parallel-2D w/ MI) 0.8483 0.8500 0.0170 0.8126 0.8118 0.0125 Table 4. Ablation Analysis for SemEval 2017 Task 5. w/ MI means Mutual Information is adopted to select lexicons. w/o MI means all lexicons are used without any selection. The parallel-2D is our proposed model and parallel-CNN means the CNN proposed by [35] is adopted. MSE Mean Median SD BERT 0.0651 0.0602 0.0191 k-BERT (parallel-2D w/o MI) 0.0647 0.0628 0.0168 k-BERT (parallel-2D w/ MI) 0.0628 0.0573 0.0180 FinBERT 0.0675 0.0668 0.0172 k-FinBERT (parallel-2D w/o MI) 0.0672 0.0679 0.0163 k-FinBERT (parallel-2D w/ MI) 0.0646 0.0623 0.0157 XLNet 0.0549 0.0526 0.0147 k-XLNet (parallel-2D w/o MI) 0.0544 0.0528 0.0143 k-XLNet (parallel-2D w/ MI) 0.0532 0.0502 0.0119 RoBERTa 0.0548 0.0526 0.0173 k-RoBERTa (parallel-2D w/o MI) 0.0511 0.0488 0.0176 k-RoBERTa (parallel-CNN w/ MI [35]) 0.0500 0.0447 0.0185 k-RoBERTa (parallel-2D w/ MI) 0.0490 0.0420 0.0185 Table 5. Ablation Analysis for FiQA Task 1 Sentiment Analysis. w/ MI means Mutual Information is adopted to select lexicons. w/o MI means all lexicons are used without any selection. The parallel-2D is our proposed model and parallel-CNN means the CNN proposed by [35] is adopted. sentiment patterns abandon, cut, and divest are quite signiicant in the respective sentence. It can be concluded from Fig. 10 that the correlation of a pair of�w(or � ds ,� ) can be understood as the degree to which� depends � � � on � to indicate the sentiment of the corresponding sentence [83]. ACM Trans. Manag. Inform. Syst. 18 • Du, Xing, and Cambria. Predicted Sentence Sentiment Score Sentiment Score -0.314 -0.300 -0.594 -0.439 -0.248 -0.279 -0.448 -0.438 -0.542 -0.618 0.507 0.437 0.558 0.444 Fig. 9. Visualization of Average-Pooled Contextual Embedding � Generated by k-RoBERTa (parallel-2D). Fig. 10. Visualization of Atention Scor�es (� ,� ) Generated by k-RoBERTa (parallel-2D). � � 5.3 Knowledge uality Analysis One challenge in knowledge integration is called knowledge noise 39], which issue [means too much knowledge integration may divert the sentence from the correct meaning. The precision and coverage of lexicons impact the efectiveness of external knowledge integration into the ine-tuning process. It is observed that anterior integration is more sensitive to noise knowledge. In contrast, for parallel and posterior integration, with the increase in the number of lexicon scores incorporated by mutual information, the model performance initially increases but subsequently luctuates or even decreases, which means relevant knowledge is able to improve the ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 19 SemEval-2017 Task 5 Headline SemEval-2017 Task 5 Post FiQA Task 1 VADER 0.149 SMSL 0.940 SMSL 0.231 LM 0.145 SenticNet 0.821 HF D 0.104 SMSL 0.139 VADER 0.485 LM 0.084 OP L MP QA NRC 0.131 0.255 0.081 NRC 0.093 OP L 0.230 VADER 0.076 MPQA 0.091 SenticNet_Joy 0.226 OP L 0.058 GI 0.083 GI 0.218 GI 0.057 HF D 0.082 NRC 0.194 MP QA 0.034 SenticNet 0.076 HF D 0.177 SenticNet_Calmness 0.029 NRC_Fear SenticNet_Sadness NRC_Anticipation 0.060 0.154 0.015 Fig. 11. Mutual information of lexicons for SemEval 2017 Task 5 and FiQA Task 1 datasets. model performance but noise knowledge will potentially destabilize the model. There is a balance in suiciency and redundancy of knowledge to ensure the right coverage and precision to complement the learning process. Moreover, the closer to the accuracy bound of the deep neural network, the more challenging to improve the results by including external knowledge. We discover that the optimal dimension of lexicon scores ranges from 3 to 8, and their mutual information can be used to rank and pre-select relevant knowledge (see Fig. 11). In terms of selected lexicon scores, the experiment shows that sentiment and emotion knowledge are helpful, though generally sentiment scores are more critical than emotion scores. Furthermore, the importance of inance domain-speciic lexicons such as LM and SMSL are consistently higher than most of the general-purpose lexicons. In particular, SMSL, HFD and LM sentiment contribute to the best model performance in FiQA Task 1 and SMSL, SenticNet and VADER sentiment contribute to the best model performance in SemEval 2017 Task 5 Post dataset. The model performance is decreased after other lexicons are added subsequently. Meanwhile, SemEval 2017 Task 5 Headline dataset has the best model performance when VADER, LM, SMSL, OPL, NRC and MPQA sentiment are integrated. It is observed that the general-purpose lexicon also plays a critical role such as VADER in the SemEval-2017 Task 5 Headline and SenticNet in the SemEval-2017 Task 5 Post dataset. As for emotion dimensions, joy, sadness, and fear tend to be more relevant for the TABFSA task. 5.4 Case Study In most cases, incorporating external knowledge is beneicial for the accuracy of predicted sentiment scores. However, we also observed errors in some cases. We describe these two scenarios by comparing sentiment scores predicted by RoBERTa and knowledge-enabled RoBERTa. Scenario 1 Sentence: $NKE gapping up to all time highs Sentiment_Ground_Truth: 0.782 Sentiment_RoBERTa: 0.468 Sentiment_knowledge-enabled RoBERTa: 0.603 Lexicon_score_sum: [0.3, 0, 2.0] In Scenario 1, knowledge-enabled RoBERTa has improved the sentiment score signiicantly from 0.468 to 0.603, as words such as ‘up’ and ‘high’ are consistently positive in the selected lexicons, which results in a strongly positive tone, as shown in the sum of selected lexicon scores from SMSL, LM and HFD. Essentially, the lexicons selected by mutual information are not only relevant at a general level but also highly correlated with this particular sentence. ACM Trans. Manag. Inform. Syst. 20 • Du, Xing, and Cambria. In Scenario 2, however, knowledge-enabled RoBERTa is no better than the standalone RoBERTa. For this con- crete example, although the snippet of łinvalidated by US courtž is negative, the word ‘invalidated’ does not carry any sentiment in 2 out of the 3 selected lexicons. While ‘patent’, ‘drug’ and ‘court’ are positive words in SMSL, leading the overall sentiment prediction to a more neutral score. The sentiment of words ‘drug’ and ‘court’ in this context is considered noise knowledge mentioned earlier. Due to the existence of noise knowledge, lexicons that are selected by mutual information are more relevant at a broad level, but less correlated with this speciic sentence. Scenario 2 Sentence: AstraZeneca’s patent on asthma drug invalidated by US court Sentiment_Ground_Truth: -0.656 Sentiment_RoBERTa: -0.392 Sentiment_knowledge-enabled RoBERTa: -0.252 Lexicon_score_sum: [0.87, -1, 0.0] 6 CONCLUSION AND FUTURE WORK A framework that strategically combines symbolic (heterogeneous sentiment lexicons) and subsymbolic (deep language model) modules for TABFSA is proposed in this research. Speciically, we are pioneering in employing attentive CNN and LSTM to touch multiple knowledge sources and integrating with transformer models in parallel. Incorporating external knowledge into transformer models has achieved state-of-the-art performance on the SemEval 2017 Task 5 and the FiQA Task 1 datasets. Meanwhile, we have discovered and demonstrated that parallel integration is a more efective approach than anterior and posterior when multiple sources of lexical knowledge are incorporated. Lastly, the results show that incorporating inancial and general lexicon knowledge can improve model performances more than retraining inance domain-speciic language models in TABFSA task. We plan to investigate three further issues in future work: 1) inluence of domain-speciic lexicon coverage on their efectiveness, 2) alternative methods for knowledge embedding, and 3) what afects the efectiveness of diferent transformer architecture, e.g., RoBERTa vs. XLNet. ACKNOWLEDGMENTS Masked for peer review. REFERENCES [1] Md Shad Akhtar, Abhishek Kumar, Deepanway Ghosal, Asif Ekbal, and Pushpak Bhattacharyya. 2017. A multilayer perceptron based ensemble technique for ine-grained inancial sentiment analysis. EMNLP.In 540ś546. [2] Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language arXiv mopr dels. eprint arXiv:1908.10063 (2019). [3] Mattia Atzeni, Amna Dridi, and Diego Reforgiato Recupero. 2017. Fine-grained sentiment analysis on inancial microblogs and news headlines. In Semantic Web Challenges - 4th SemWebEval Challenge at ESWC 2017. 124ś128. [4] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. ProIn ceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) . [5] Lingxian Bao, Patrik Lambert, and Toni Badia. 2019. Attention and lexicon regularized LSTM for aspect-based sentiment analysis. In Proceedings of ACL: Student Research Workshop . 253ś259. [6] Roberto Battiti. 1994. Using mutual information for selecting features in supervised neuralIEEE net T learning. ransactions on Neural Networks 5, 4 (1994), 537ś550. [7] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the association for computational linguistics 5 (2017), 135ś146. [8] Erik Cambria, Yang Li, Frank Xing, Soujanya Poria, and Kenneth Kwok. 2020. SenticNet 6: Ensemble application of symbolic and subsymbolic AI for sentiment analysis. TheIn 29th ACM International Conference on Information and Knowledge Management (CIKM) . 105ś114. ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 21 [9] Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok. 2022. SenticNet 7: a commonsense-based neurosymbolic AI framework for explainable sentiment analysis. ProceIn edings of the 13th Conference on Language Resources and Evaluation (LREC 2022) . 3829ś3839. [10] Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, and Xiangzhan Yu. 2020. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting.EMNLP In . 7870ś7881. [11] Keith Cortis, Andre Freitas, Tobias Daudert, Manuela Huerlimann, Manel Zarrouk, Siegfried Handschuh, and Brian Davis. 2017. Semeval- 2017 task 5: Fine-grained sentiment analysis on inancial microblogs and International news. In Workshop on Semantic Evaluation . [12] Dayan de França Costa and Nadia Felix Felipe da Silva. 2018. INF-UFG at FiQA 2018 Task 1: predicting sentiments and aspects on inancial tweets and news headlines. Companion In Proceedings of the The Web Conference 2018 . 1967ś1971. [13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InNAACL-HLT. 4171ś4186. [14] Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. 2020. Fine-tuning pretrained language models: Weight initializations, data orders, and early arXiv stopping. preprint arXiv:2002.06305 (2020). [15] Zhendong Dong and Qiang Dong. 2003. HowNet-a hybrid language and knowledge resourceInternational . In Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 . IEEE, 820ś824. [16] Cuc Duong, Qian Liu, Rui Mao, and Erik Cambria. 2022. Saving Earth One Tweet at a Time through the Lens of Artiicial Intelligence. In 2022 International Joint Conference on Neural Networks (IJCNN) . Padua, Italy, 1ś9. https://doi.org/10.1109/IJCNN55064.2022.9892271 [17] Minhaz Bin Farukee, MS Zaman Shabit, Md Rakibul Haque, and AHM Sarowar Sattar. 2020. DDoS Attack Detection in IoT Networks Using Deep Learning Models Combined with Random Forest as Feature Selector International . In Conference on Advances in Cyber Security. 118ś134. [18] Benoît Frénay, Gauthier Doquire, and Michel Verleysen. 2013. Is mutual information adequate for feature selection in Neural regression? Networks 48 (2013), 1ś7. [19] Mengshi Ge, Rui Mao, and Erik Cambria. 2022. Explainable Metaphor Identiication Inspired by Conceptual Metaphor Theory. In Proceedings of AAAI . 10681ś10689. [20] Deepanway Ghosal, Shobhit Bhatnagar, Md Shad Akhtar, Asif Ekbal, and Pushpak Bhattacharyya. 2017. IITP at SemEval-2017 task 5: an ensemble of deep learning and feature based models for inancial sentiment analysis. International In Workshop on Semantic Evaluation . 899ś903. [21] Fréderic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle. 2015. Multimedia lab@ acl wnut ner shared task: Named entity recognition for twitter microposts using distributed word representations. Proceedings In of the workshop on noisy user-generated text. 146ś153. [22] Petr Hájek and Roberto Henriques. 2017. Mining corporate annual reports for intelligent detection of inancial statement fraud - A comparative study of machine learning metho Kno ds.wledge Based Systems 128 (2017), 139ś152. [23] Sooji Han, Rui Mao, and Erik Cambria. 2022. Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings. InProceedings of the 29th International Conference on Computational Linguistics . 94ś104. [24] Kai He, Rui Mao, Tieliang Gong, Chen Li, and Erik Cambria. 2023. Meta-based Self-training and Re-weighting for Aspect-based Sentiment Analysis.IEEE Transactions on Afective Computing (2023). https://doi.org/10.1109/TAFFC.2022.3202831 [25] Elaine Henry. 2008. Are investors inluenced by how earnings press releases are written? The Journal of Business Communication (1973) 45, 4 (2008), 363ś407. [26] Shuk Ying Ho, Ka Wai (Stanley) Choi, and Fan (Finn) Yang. 2019. Harnessing Aspect-Based Sentiment Analysis: How are Tweets Associated with Forecast Accuracy? Journal of the Association for Information Systems 20, 8 (2019), 1174ś1209. [27] Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classiication. ACL. 328ś339. In [28] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer rPr evie oceews. dings In of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining . 168ś177. [29] Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji, and Jiawei Han. 2020. Weakly-supervised aspect-based sentiment analysis via joint aspect-sentiment topic embedding.EMNLP In . 6989ś6999. [30] Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of socialICWSM media. text. In [31] Hitkul Jangid, Shivangi Singhal, Rajiv Ratn Shah, and Roger Zimmermann. 2018. Aspect-based inancial sentiment analysis using deep learning. In Companion Proceedings of the The Web Conference 2018 . 1961ś1966. [32] Fuwei Jiang, Joshua Lee, Xiumin Martin, and Guofu Zhou. 2019. Manager sentiment and stock Journal returns.of Financial Economics 132 (2019), 126ś149. Issue 1. [33] Mengxiao Jiang, Man Lan, and Yuanbin Wu. 2017. Ecnu at semeval-2017 task 5: An ensemble of regression algorithms with efective features for ine-grained sentiment analysis in inancial domain. International In Workshop on Semantic Evaluation . 888ś893. [34] Sudipta Kar, Suraj Maharjan, and Thamar Solorio. 2017. RiTUAL-UH at SemEval-2017 Task 5: Sentiment Analysis on Financial Data Using Neural Networks. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) . Association for Computational Linguistics, Vancouver, Canada, 877ś882. https://doi.org/10.18653/v1/S17-2150 ACM Trans. Manag. Inform. Syst. 22 • Du, Xing, and Cambria. [35] Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classiication. Proceedings In of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 1746ś1751. [36] Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Physical Review E69, 6 (2004), [37] Bin Liang, Hang Su, Lin Gui, Erik Cambria, and Ruifeng Xu. 2022. Aspect-based sentiment analysis via afective knowledge enhanced graph convolutional networks. Knowledge-Based Systems 235 (2022), 107643. [38] Bing Liu. 2015. Sentiment Analysis - Mining Opinions, Sentiments, and Emotions . Cambridge University Press. [39] Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, and Ping Wang. 2020. K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artiicial Intelligence , Vol. 34. 2901ś2908. [40] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining ApprarXiv oach. preprint arXiv:1907.11692 (2019). [41] Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. 2020. FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining. International In Joint Conference on Artiicial Intelligence (IJCAI) . 4513ś4519. [42] Steven Loria et al. 2018. textblob Documentation. Release 0.15 2, 8 (2018). [43] Tim Loughran and Bill McDonald. 2011. When is a liability not a liability? Textual analysis, dictionaries, The Journal of and 10-Ks. inance 66, 1 (2011), 35ś65. [44] Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention.. IJCAI. 4244ś4250. In [45] Yu Ma, Rui Mao, Qika Lin, Peng Wu, and Erik Cambria. 2023. Multi-source Aggregated Classiication for Stock Price Movement Prediction. Information Fusion 91 (2023), 515ś528. [46] Yukun Ma, Haiyun Peng, and Erik Cambria. 2018. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. AAAI In . 5876ś5883. [47] Macedo Maia, Siegfried Handschuh, André Freitas, Brian Davis, Ross McDermott, Manel Zarrouk, and Alexandra Balahur. 2018. WWW’18 open challenge: inancial opinion mining and question answ Companion ering. In Proceedings of the The Web Conference 2018 . 1941ś1942. [48] Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki Wallenius, and Pyry Takala. 2014. Good debt or bad debt: Detecting semantic orientations in economic teJournal xts. of the Association for Information Science and Technology 65, 4 (2014), 782ś796. [49] Youness Mansar, Lorenzo Gatti, Sira Ferradans, Marco Guerini, and Jacopo Staiano. 2017. Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring Sentiment towards Brands from Financial News Headlines. Proceedings In of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 817ś822. https://doi.org/10.18653/v1/S17-2138 [50] Rui Mao and Xiao Li. 2021. Bridging towers of multi-task learning with a gating mechanism for aspect-based sentiment analysis and sequential metaphor identiication. ProceInedings of the AAAI Conference on Artiicial Intelligence , Vol. 35. 13534ś13542. [51] Rui Mao, Xiao Li, Mengshi Ge, and Erik Cambria. 2022. MetaPro: A computational metaphor processing model for text pre-processing. Information Fusion 86-87 (2022), 30ś43. https://doi.org/10.1016/j.infus.2022.06.002 [52] Rui Mao, Chenghua Lin, and Frank Guerin. 2018. Word Embedding and WordNet Based Metaphor Identiication and Interpretation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics , Vol. 1. 1222ś1231. [53] Rui Mao, Qian Liu, Kai He, Wei Li, and Erik Cambria. 2023. The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection. IEEE Transactions on Afective Computing (2023). https://doi.org/10.1109/ TAFFC.2022.3204972 [54] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jef Dean. 2013. Distributed representations of words and phrases and their compositionality Advances . In in neural information processing systems . 3111ś3119. [55] Saif M. Mohammad and Peter D. Turney. 2013. Crowdsourcing a Word-Emotion Association LeComputational xicon. Intelligence29, 3 (2013), 436ś465. [56] Rodrigo Moraes, João Francisco Valiati, and Wilson P GaviãO Neto. 2013. Document-level sentiment classiication: An empirical comparison between SVM and ANN.Expert Systems with Applications 40, 2 (2013), 621ś633. [57] Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2016. Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems85 (2016), 62ś73. [58] Jefrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. Proceedings In of the 2014 conference on empirical methods in natural language processing (EMNLP) . 1532ś1543. [59] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations.NAA In CL-HLT. 2227ś2237. [60] Guangyuan Piao and John G Breslin. 2018. Financial aspect and sentiment predictions with deep neural networks: an ensemble approach. In Companion Proceedings of the The Web Conference 2018 . 1973ś1977. [61] Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Sebastian Riedel. 2016. SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods. In COLING. 1546ś1556. ACM Trans. Manag. Inform. Syst. Incorporating Multiple Knowledge Sources for Targeted Aspect-based Financial Sentiment Analysis • 23 [62] Kim Schouten and Flavius Frasincar. 2015. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE)28, 3 (2015), 813ś830. [63] Bonggun Shin, Timothy Lee, and Jinho D. Choi. 2017. Lexicon Integrated CNN Models with Attention for Sentiment Analysis. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis . Association for Computational Linguistics, Copenhagen, Denmark. [64] Sahar Sohangir, Dingding Wang, Anna Pomeranets, and Taghi M Khoshgoftaar. 2018. Big Data: Deep Learning for inancial sentiment analysis.Journal of Big Data5, 1 (2018), 1ś25. [65] Jacopo Staiano and Marco Guerini. 2014. Depechemood: a lexicon for emotion analysis from crowd-annotate arXiv d news. preprint arXiv:1405.1605(2014). [66] Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilvie The General . 1966. Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge, MA. [67] Chi Sun, Luyao Huang, and Xipeng Qiu. 2019. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence. In Proceedings of NAACL-HLT . 380ś385. [68] Marjan van de Kauter, Diane Breesch, and Véronique Hoste. 2015. Fine-grained analysis of explicit and implicit sentiment in inancial news articles.Expert Systems with applications 42, 11 (2015), 4999ś5010. [69] Theresa Wilson, Janyce Wiebe, and Paul Hofmann. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In HLT/EMNLP. 347ś354. [70] Zhengxuan Wu and Desmond C. Ong. 2021. Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis. AAAI In . 14094ś14102. [71] Frank Xing, Erik Cambria, and Roy Welsch. 2018. Natural language based inancial forecasting:Aartiicial survey. Intelligence Review 50, 1 (2018), 49ś73. [72] Frank Xing, Erik Cambria, and Yue Zhang. 2019. Sentiment-aware volatility forKno ecasting. wledge Based Systems 176 (2019), 68ś76. [73] Frank Xing, Lorenzo Malandri, Yue Zhang, and Erik Cambria. 2020. Financial Sentiment Analysis: An Investigation into Common Mistakes and Silver Bullets. COLING In . 978ś987. [74] Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Intelligent asset allocation via market sentiment ieee ComputatioNal views. iNtelligeNCe magaziNe 13, 4 (2018), 25ś34. [75] Frank Z Xing, Filippo Pallucchini, and Erik Cambria. 2019. Cognitive-inspired domain adaptation ofInformation sentiment lexicons. Processing & Management56, 3 (2019), 554ś564. [76] Bo Xu, Yong Xu, Jiaqing Liang, Chenhao Xie, Bin Liang, Wanyun Cui, and Yanghua Xiao. 2017. CN-DBpedia: A never-ending Chinese knowledge extraction system. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems . Springer, 428ś438. [77] Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2019. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. In Proceedings of NAACL-HLT . 2324ś2335. [78] Steve Yang, Jason Rosenfeld, and Jacques Makutonin. 2018. Financial aspect-based sentiment analysis using deep representations. arXiv preprint arXiv:1808.07931(2018). [79] Yi Yang, Mark Christopher Siy Uy, and Allen Huang. 2020. Finbert: A pretrained language model for inancial communications. arXiv preprint arXiv:2006.08097(2020). [80] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NeurIPS In . 5754ś5764. [81] Pu Zhang and Zhongshi He. 2015. Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classiication. Journal of Information Science 41, 4 (2015), 531ś549. [82] Anping Zhao and Yu Yu. 2021. Knowledge-enabled BERT for aspect-based sentiment analysis. Knowledge-Based Systems (2021), 107220. [83] Zhiwei Zhao and Youzheng Wu. 2016. Attention-Based Convolutional Neural Networks for Sentence Classiication.. INTERSPEECH In . 705ś709. ACM Trans. Manag. Inform. Syst.
ACM Transactions on Management Information Systems (TMIS) – Association for Computing Machinery
Published: Jun 23, 2023
Keywords: Financial sentiment analysis
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.