Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Read the News, Not the Books: Forecasting Firms Long-term Financial Performance via Deep Text Mining

Read the News, Not the Books: Forecasting Firms Long-term Financial Performance via Deep Text Mining Read the News, Not the Books: Forecasting Firms’ Long-term Financial Performance via Deep Text Mining SHUANG (SOPHIE) ZHAI, University of Oklahoma, USA ZHU (DREW) ZHANG, Iowa State University, USA In this paper, we show textual data from irm-related events in news articles can efectively predict various irm inancial ratios, with or without historical inancial ratios. We exploit state-of-the-art neural architectures, including pseudo-event embeddings, Long Short-Term Memory Networks, and attention mechanisms. Our news-powered deep learning models are shown to outperform standard econometric models operating on precise accounting historical data. We also observe forecasting quality improvement when integrating textual and numerical data streams. In addition, we provide in-depth case studies for model explainability and transparency. Our forecasting models, model attention maps, and irm embeddings beneit various stakeholders with quality predictions and explainable insights. Our proposed models can be applied both when numerically historical data is or is not available. CCS Concepts: · Computing methodologies→ Natural language processing; Neural networks; · Applied computing → Economics. 1 INTRODUCTION News plays an important role in understanding and predicting irm performance (Clarke . 2020). et A als a communication channel, news articles report irm events and provide decision insights. Although it can be implemented manually with low eiciency, can news articles predict irm inancial prospects automatically and on a large scale? Furthermore, how can users better understand the automated process? This paper investigates these questions and proposes deep learning models to forecast irm inancial ratios, using unstructured (text) and structured (accounting) data. We also open up the proposed deep learning models to provide easy-to-understand model insights for end-users. The Financial Technology (Fintech) industry is on the rise (Deloitte 2017), and the global Fintech market is expected to grow progressively and reach around $305 billion. by The2025 signiicant market share and novel technology needs demand additional research in this area. The sheer volume of information is overwhelming for stakeholders to digest. Although the information overload problem is commonly acknowledged in inancial and broader settings (Hemp 2009), few techniques are readily available to help stakeholders with less sophisticated technical skills overcome this challenge. Therefore, tools and models that can facilitate this need automatically are in demand. In addition, not every stakeholder is proicient in analyzing traditional numerical data, such as accounting data. Therefore, tools and models that can provide linkage between textual and numerical information are greatly preferred. Third, although many stakeholders heavily rely on inancial reports which disclose both textual narratives and accounting igures to seek business https://www.prnewswire.com/news-releases/intech-industry-report-2020-2025---trends-developments-and-growth-deviations-arising- from-the-covid-19-pandemic-301080282.html Authors’ addresses: Shuang (Sophie) Zhai, sophie.zhai@ou.edu, University of Oklahoma, USA; Zhu (Drew) Zhang, zhuzhang@iastate.edu, Iowa State University, USA. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 2158-656X/2022/5-ART $15.00 https://doi.org/10.1145/3533018 ACM Trans. Manag. Inform. Syst. 2 • Zhai and Zhang insights, signiicant prediction disadvantages exist in inancial reports. First, a signiicant portion of corporate inancial reports is released after the actual events. While some reports include a forward-looking statement, such as 10-K, it is not enough for an informed understanding of the company dynamics. Therefore, most of the time, the speciic information provided to investors is after the actual events. Second, private companies are not legally required to disclose their inancial data. Investors of those companies are obligated to seek other information references, such as insider or news, to gain irm insights. However, investors can beneit from news articles in many aspects. News article reports timely corporate event (Lippmann 1946), is published online, and mostly free. Therefore, Main Street investors have easy access to the same information as advanced investors do. Readers can also obtain alternative views to seek expert opinions and community reactions. If investors systematically follow a company’s news, trends can emerge and lead to more educated decisions. More importantly, when accounting data is not available, news articles are naturally considered indispensable sources of up-to-date information for stakeholders who are constantly looking for inancial opportunities. These beneits are considered essential for investors and stakeholders. Public companies’ inancial performance is primarily anchored on stock returns, and private irms’ inancial details are hard to get. Evaluating a irm’s inancial performance is either with a limited scope or not available at all. However, it is vital to assess a irm’s inancial status through a broad spectrum of measures. Therefore, inancial ratios are practically useful to appraise a irm’s inancial performance in various aspects, such as proitability, eiciency, solvency, and others. Financial ratios are reliable measures of irm fundamentals. They can provide a tangible basis for market participants’ decision-making, especially for long-term-minded investors with low trading frequency. Other active stakeholders, such as credit rating agencies and insurance companies, are eyeing irms’ inancial images more broadly than stock returns. Outside of the stock market, private companies participate in the economy and undergo inancial scrutiny. Market participants need methods to assess private irm performance as well. We observe very limited data-driven research that studies multiple corporate inancial ratios from the existing literature, and several research gaps are identiied. Unlike inancial market studies, corporate inance literature is mostly dominated by theory testing research and lacks the study of problems related to the prediction of irm inancial ratios. Given inancial ratios’ indispensable value for the wide range of stakeholders, their forecasting capabilities are highly desirable. Almost all inancial-ratio-related studies use numerical variables as input. For reasons we articulated above, the power of textual data awaits to be exploited. News data has inherently very high dimensionality. It becomes the Achilles’ heel for classical econometric models. As a result, the model that works primarily on text representations is limited, and the integration model with textual and numerical data is yet to emerge in the leading economics literature. On the contrary, deep learning research represents state-of-the-art advancement in machine learning research. It learns implicit representation in the text by forgoing explicit feature engineering and tackles high-dimensionality issues by operating on ixed-size dense vectors nicely. Given the following motivations, we focus on the long-term corporate fundamentals rather than frequent short- term trading activities (Harfor.d2018). et al A irm’s long-term performance relects corporate governance and keeps managerial misbehavior under control (Harfor.d2018). et al For example, long-term irm health encourages the company innovation and improves its credit risk proile (Driss . 2021).etThe al increase of long-term debt is viewed as over-leverage and perceived as harming shareholders’ wealth (D’Mello . 2018). et al In addition, the persistence of long-term debt ratios can indicate a irm’s investment constraints, and market conditions (Bontempi et al. 2020). Financial ratios are representative measures for corporate fundamentals. Analyzing inancial ratios is a decisive step to understand a irm’s inancial health and help manage risks. Meanwhile, inancial ratios are indicators for critical corporate events, such as bankruptcy (Hosaka 2019), business crisis (Lin . 2011), et and al inancial failure (Edmister 1972, Zeytinoglu and Akarim 2013). Hence, forecasting irm inancial ratios bring competitive advantages for companies, investment agencies, and individual investors. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 3 The data sources of this study are chosen with purpose. Public news is a powerful source of inancial intelligence, and inancial ratios are concrete measures of irm’s inancial strengths and vulnerabilities. Both data streams have their unique value. Therefore, when inancial ratios are not available, the predictions made by news articles are valuable. When both data streams are available, the predictions made by integrating the two data streams are also helpful. As a result, studying how the news-based model performs is charming, and the integration between news and irm ratios is intriguing. Meanwhile, long-term irm inancial performance prediction becomes even more practically useful because most investors are not frequent traders. Therefore, we automate the news-powered and full-ledged irm long-term inancial performance forecasting task by exploiting state-of-the-art natural language processing and machine learning capabilities. Speciically, we propose deep learning models to help stakeholders acquire long-term irm intelligence, rather than market prediction, and expect to contribute on both methodological and application fronts in: • Proposing novel, low-barrier, and long-horizon (e.g., a year) models for all to forecast irms’ full-ledged inancial performance. • Providing low-latency and forward-looking intelligence to make almost-real-time investment decisions rooted in the timeliness of news data. • Enhancing model transparency, explainability, and accountability in inancial decision-making through provided model insights. 2 RELATED WORK Our study is relevant to three streams of literature. The irst stream is related inance literature involving textual data, which includes stock market studies, irm inancial ratio studies, and corporate inance studies. The second stream is time series models, which represent traditional treatment of sequential numerical data. The third stream is deep-learning-, and natural-language-processing- related modeling techniques, which constitute the methodological foundation of our research models. 2.1 Financial Studies using Text Input 2.1.1 Stock Market Variable as Dependent Variable. A public irm’s stock market performance is often the irst indicator of its overall inancial health. Therefore, numerous studies have been dedicated to the understanding or prediction of stock markets. Due to the vast body of literature in this area, we only review those directly relevant to our research, i.e., studies based on publicly available non-numerical information as input. A representative set of theory-testing studies in this area are summarized in .TFuele able 9d by recent advancements in Artiicial Intelligence (AI), researchers have also extensively studied how to predict stock market behavior using machine learning models. We summarize a representative set of such studies in T.able 10 2.1.2 Firm Financial Ratio as Dependent Variable. Besides stock returns, there exist other quantitative measures of a company’s inancial health, many of which form various inancial ratios. A comprehensive taxonomy of irm inancial ratios is deined by Wharton Research Data Services (WRDS ), and it constitutes the focal phenomenon of our study. A set of representative theory testing studies that involve inancial ratios are summarized in Table 11. Please see Appendix A. Please see Appendix A. https://wrds-www.wharton.upenn.edu/documents/793/WRDS_Industry_Financial_Ratio_Manual.pdf Please see Appendix B. We italicized the dependent variable or independent variable when it is either a inancial ratio or some variations (e.g., the nominator or denominator). ACM Trans. Manag. Inform. Syst. 4 • Zhai and Zhang 2.1.3 Corporate Finance Studies. Textual data have gained popularity in economics research (Gentzko.w et al 2019, Loughran and McDonald 2016). Several recent inance studies exploited textual data to investigate irm activities, such as irm organization (Hoberg and Phillips 2018), corporate cultur . 2018), e (Licorp et al orate innovation (Bellstam et . 2017), al competitor identiication (Pant and Sheng 2015), and merger and acquisition (Routledge et al . 2016), accompanied by machine learning techniques. Notably, economics-rooted studies are still mostly driven by theory testing, viewing AI-based tools merely as modeling aids instead of centerpieces. We believe the community should diversify its methodology portfolio by embracing forecasting frameworks based on machine learning backbones. 2.2 Time Series Models 2.2.1 AutoRegressive Integrated Moving Average Model (ARIMA). Time series models study temporal dynamics in sequence data. The ARIMA Model (Pankratz 1983) is the de facto standard for uni-variate time series analysis in econometrics. Therefore, it is a natural baseline model for our study. In ARIMA(p,d ,q) model,p denotes the order of the autoregression, d denotes the order of diferencing, and q denotes the order of the moving average. 2.2.2 Vector Autoregression Model (VAR). The VAR model is a multi-variate time series model that describes the inter-dependency between the participating time series when they are inluencing each other. In VAR p model, denotes the order of the autoregression, and k denotes the number of variables . 2.3 Deep Learning Techniques 2.3.1 Deep Learning Models. Deep learning (LeCun et .al 2015) models are uniquely positioned to consume high-dimensional text data (Abbasi . 2019, et alAbrahams et al. 2015, Chou et al. 2010, Gruss et al. 2018, Li and Qin 2017, Zhou et al . 2018) and capture non-linear temporal dynamics. They are a family of machine learning techniques composed of multiple linear and non-linear processing layers to learn representations of data with various abstraction levels. They discover intricate structures in large data sets by using the backpropagation algorithm (Rumelhart et. al 1986) and learn model parameters to compute the representation in each layer from the representation in the previous layer. Deep learning models, including Transformer-based models .(Devlin et al 2018, Vaswani et al . 2017), have dramatically improved the state-of-the-art in many domains, such as language modeling, speech recognition, question answering (Liu et al. 2020), visual object detection, and genomics. 2.3.2 Sentence Embedding. A centerpiece of natural language processing is semantic representations of words and sentences. Deep learning methods model them as word and sentence embeddings. An embedding is a mapping from a discrete object, such as a word or sentence, to a dense vector of real numbers. Glove (Pennington. et al 2014) and Word2Vec (Mikolov et al. 2013) are widely used word embedding techniques. While one can trivially combine word embeddings to produce a sentence embedding, researchers have studied techniques to generate more robust sentence representations. A building block of our models, Smooth Inverse Frequency (SIF) (Arora et al. 2017) is an unsupervised, PCA-based technique to produce sentence embeddings for downstream NLP applications. An illustration of the conceptual process can be found in Figure 1. 2.3.3 Long Short-Term Memory Networks (LSTMs). Long Short-Term Memory (LSTM) networks, irst proposed by (Hochreiter and Schmidhuber 1997), are a representative family of deep learning architectures for sequence data. They model non-linear temporal dynamics and are capable of digesting vector input and emitting vector output at every time stamp. Numerous studies have been conducted using LSTMs, such as information extraction (Miwa and Bansal 2016), syntactic parsing (Dyer.et 2015), al speech recognition (Graves et. al 2013), machine We present more model details in Appendix C. We present more model details in Appendix D. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 5 Fig. 1. Deep Learning for Natural Language Processing: An Illustration Fig. 2. Problem Formulation translation (Bahdanau et .al 2015), and question answering (Wang and Nyberg 2015). We present more model details in Appendix E. A natural extension of the LSTM network is the bidirectional LSTM (BiLSTM) architecture, which supports the sequence pass through the architecture in both feed-forward and backward directions and →− ←− →− ←− aggregates hidden statesh and h at every time step byh = σ (h ,h ). t t t t t 3 PROBLEM FORMULATION In this study, we use structured numerical data (represented by inancial ratios) and/or unstructured text data (represented by events from news articles) to forecast irms’ long-term inancial health, as embodied in the ACM Trans. Manag. Inform. Syst. 6 • Zhai and Zhang Table 1. Financial Ratio Definition Category Output Variable Name Deintion price Valuation Price to Earning (dilutepd, e_exi earning excl.EI) sales Eiciency Sales to Working Capital sale_nwc working capital total liability Solvency Debt Ratio de_ratio shareholders’ equity income before EI and depre. Soundness Cash Flow Margin cfm sales net income Proitability Net Proit Margin npm sales common equity Capitalization Common Equity to Invested equity_invcap invested capital Capital cash and short-term invest. Liquidity Cash Ratio cash_ratio current liabilities inancial ratios (an instance of do ł wnstream modelsž in Figure 1). The problem formulation illustrated in Figure 2 is generalizable to a broad array of economic forecasting problems with similar conigurations and high-dimensional input. Speciically, we deine a irm-speciic pseudo-event as a sentence that mentions a irm in the news. There are two types of pseudo-events in our study. The following are several sample sentences, which we treat as pseudo-events (the italicized entities represent the focal companies under consideration). Meanwhile, inancial ratios are indicators of an organization’s inancial and operating strengths and vul- nerabilities. A irm’s inancial performance can be evaluated in various aspects. Valuation-focused inancial ratios assess the organization’s value and its investment potential. Eiciency-focused ratios evaluate how the organization uses its assets to generate sales. Solvency-focused and soundness-focused ratios inspect if the irm can meet its debt obligations. Proitability-focused ratios measure a irm’s ability to make proits against its costs. Capitalization-focused ratios measure how well a irm transfers its capital to irm value. Liquidity-focused ratios indicate how well a company can handle its short-term debt obligations. Based on the irm inancial ratio categorization in the Wharton Research Data Services (WRDS), we assess a irm’s inancial performance from valuation, eiciency, solvency, soundness, proitability, capitalization, and liquidity categories. There is a list of ratios in each category. It is impractical (and unnecessary) to work with all 70+ inancial ratios deined in the WRDS document. Therefore, given literatur and theepreference in real-world analysis, we choose one representative measure from each category. The aims are to keep generality and practical relevance. We summarize the inancial ratios (used as both output and input in our framework) in Table The task intuition is that the semantic signals encoded in irm pseudo-events have predictive power for the irm’s future inancial status, measured in the form of inancial ratios in Table 1. Formally, t+H t Y = f (E , R ) (1) (i )j (i )j (i ) j=t−M |S | (i )j E = д (S ) (2) (i )j (i )jk k=0 Please see Appendix B. https://online.hbs.edu/blog/post/how-to-determine-the-inancial-health-of-a-company. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 7 , where • Y denotes all target ratios, y denotes individual ratio Y = ,{y , ...,y }, p indexes target ratios, and p ∈ 1 |Y | {1, ...,|Y|}. • H denotes the prediction horizon, and M denotes the memory horizon, both measured in terms of the number of time windows. • R denotes all input ratios, r denotes individual ratioR, and = [r ; ...;r ], where [;] denotes concatenation. 1 |R| • i indexes companies, k indexes pseudo-events, andj indexes time windows. • E is the aggregate event embedding for company i in time windo j;w S is the encoding of a sentence k (i )j (i )jk that mentions companyi in time windo j.w • f is a learned function that maps all pseudo-event embeddings and ratio(s) to target value(s). • д is a function that aggregates multiple sentence embeddings into one pseudo-event embedding. In principle f and , д can be parameterized as any function approximator. 4 FORECASTING MODELS Two data streams are served as model inputs in our proposed models shown in Figure 3. One is the corporate inancial ratio, and the other is irm news articles. In the ratio stream, missing data is interpolated based on rules. In the news stream, company focused sentences (pseudo-events) are extracted before generating pseudo-event embeddings. The ratio stream is the input for three groups of models: time series models, ratio-based deep learning models, and ratio-text integration deep learning models. The news stream is the input for two groups of models: news-based deep learning models and ratio-news integration deep learning models. In addition, we analyze two types of model insights - Events Attention Map and Firm Embeddings - to demonstrate the interpretability of the proposed deep learning models and understand the model’s inner workings from a business perspective. Single-task models are designed to output individual target variables, and multi-task models are designed to output multiple target variables simultaneously. We summarize these model characteristics in Table 2. Since the task has a natural time series setup, we chose LSTM and/or BiLSTM networks, state-of-the-art deep learning models for sequential data, as our proposed deep learning models’ backbone. The proposed models will learn non-linear temporal dynamics from irm events and inancial ratios to predict the irm target ratio. We have introduced the related time series models, ARIMA and VAR, in Section 2.2. In the following sections, therefore, we will introduce the proposed deep learning models. 4.1 SR or SMR: Single-task Single-Ratio orMulti-Ratio Input Model In the SR model shown in Figure 4, each company i’s individual ratio histor r is use y d as the model input. The single ratio value r is extended to a dense vector in the v = Dense (r ) layer. The dense vectorv becomes (i )j (i )j (i )j (i )j the next LSTM layer’s input. After the sequential processing in the LSTM layer, denseuvector is generated (i )t from theu = LST M (v t ) layer, and it is used as input to predict the individual y atratio the end of (i )t (i ) (i )p j=t−M t+H forecasting horizon H in they = Dense (u ) layer. Unlike the SR model, in the irst Ratio Input layer, the (i )t (i )p SMR model concatenates all ratiosRto , instead of using only one ratio as the input. (i )j 4.2 MR: Multi-task Multi- Ratio Input Model Similar to the SMR model, the MR model in Figure 5 concatenates all R ratios intothe Ratios Input layer. Then, (i )j all input ratios are computed at once inv the= Dense (R ) layer, andv becomes the input for the next (i )j (i )j (i )j LSTM layer. The LSTM layer outputsu = LST M (v t ) after the sequential processing. Finally, unlike the (i )t (i ) j=t−M SR model which predict the target variables individually, MR model predicts all target variables simultaneously t+H in theY = Dense (u ) layer. (i )t (i ) ACM Trans. Manag. Inform. Syst. 8 • Zhai and Zhang Fig. 3. Research Framework Overview Fig. 4. SR and SMR Models Fig. 5. MR Model 4.3 SEMax: Single-task Event Max-pooling Model Unlike the SR and MR models that use ratios in the model input, SEMax model uses textual data from company pseudo-events in news to predict target variables. After the pre-processing of pseudo-event extraction and sentence embedding, pseudo-event representations as forms of embeddings become the SEMax model’s input. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 9 Table 2. Models Summary Model Model Full Name Single-Multi-Single Multi-Event Deep Group task task Ra- Ratio In- Learn- tio In- put ing In- put put Time Series ARIMA Auto-Regressive ✓ ✓ Integrated Moving Models Average Model VAR Vector AutoregRession ✓ ✓ Model Ratio-based SR Single-task Single- Ratio ✓ ✓ ✓ Input Model Models SMR Single-taskMulti- Ratio ✓ ✓ ✓ Input Model MR Multi-task Multi- Ratio ✓ ✓ ✓ Input Model SEMax Single-task Event ✓ ✓ ✓ Maxpooling Model Event-based SEA Single-task Event ✓ ✓ ✓ Models Attention Model MEAU Multi-task Event ✓ ✓ ✓ Attention Unweighted Model MEAW Multi-task Event ✓ ✓ ✓ Attention Weighted Model Ratio-Event SREA Single-task Single- Ratio ✓ ✓ ✓ ✓ Integration Input Event Attention Models Model SMRE Single-taskMulti- Ratio ✓ ✓ ✓ ✓ Input Event Attention Model MREA Multi-task Multi- Ratio ✓ ✓ ✓ ✓ Input Event Attention Model The challenge here is while LSTM layer only accepts a single vector as input per time window, multiple pseudo- events are typically existed per window. Therefore, before feeding into the LSTM layer, multiple event embeddings are required to be aggregated into one. In the literature, such operations are often termed "pooling" and represent the essence of theд function in Equation 2. Pooling technique combines "nearby" values of a signal (e.g., through averaging) or picking one representative value (e.g., through maximization or sub-sampling). In our case, within every time window, we max-pool on every dimension of all pseudo-event embedding vectors in the MaxPooling ACM Trans. Manag. Inform. Syst. 10 • Zhai and Zhang layer, as shown in the highlighted pseudo-event MaxPooling component in Figure 6. Formally, we deine the <d > <d > MaxPooling component as∀d ,v = max E , where d indexes individual dimensions in the embedding (i )j (i )jk space, k indexes pseudo-events, andv is the aggregated vector for all pseudo-events in time windo j after w the pooling operation. In practice, we use BiLSTM layers as the model backbone. After BiLSTM layers, the model generates a dense vector that repre- senting the semantics of the company from news articles after the f e = BiLST M (v t ) layer. We call this dense vector (i )t (i ) j=t−M f e Firm Embedding. It is essentially the irm semantic repre- (i )t sentation learned from textual data in the form of pseudo-events. Finally, the SEMax model predicts the individual target ratio from the learned irm embedding in the output layer. 4.4 SEA: Single-task Event Atention Model Notice the max-pooling trick in the previous SEMax model is quite naive and produces a crude aggregate of multiple event embed- dings. A potentially more powerful approach is to preserve each event’s entirety and adaptively assign weights to them. In other words, the model should pay more łattentionž to the more infor- mative pseudo-events. The intuition for attention mechanisms in natural language processing is to assign greater attention to text units (typically words or tokens) that contain more information Fig. 6. SEMax Model for the task at hand. Bahdanau et al . (2015) were the irst to apply attention mechanism in NLP, and more speciically, in machine translation. More recently, Vaswani et . (2017) al used multi-head attention alone to solve sequence prediction problems that are traditionally handled by other neural networks such as LSTMs and Convolutional Neural Networks. As with the SEMax model, the SEA model also takes textual information from pseudo-events to predict single target variables. Unlike the SEMax model that handles pseudo-events aggregation through max pooling, the SEA model tackles the pseudo-events aggregation task from the attention mechanism, shown in the highlighted Pseudo-event Attention Component in Figure 7. Formally, we deine our event attention mechanism as: u = siдmoid (W E + b ) (3) (i )jk (i )j (i )jk (i )j exp (u ) (i )jk α = P (4) (i )jk exp (u ) (i )jk v = α E (5) (i )j (i )jk (i )jk k=1 , where u is a pseudo-event’s raw attention scorαe, is the normalized attention weight, v andis the (i )jk (i )jk (i )j weighted sum vector to represent companyi at time windojw . Similar to the SEMax model, we also use BiLSTM as the backbone of the SEA model for sequential processing. After the BiLSTM layers, the SEA model again generates the Firm Embedding, a dense vector representation of the company, in the Dense layer, and uses it to predict the output layer’s target value. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 11 Fig. 7. SEA Model Fig. 8. MEAU and MEAW Models 4.5 MEAU or MEAW: Multi-task E vent Atention Unweighted or Weighted Model While we can perform learning and inference on individual inancial ratios, it is conceivable that multiple ratios can share common characteristics rooted in the company’s business operations. A family of machine learning techniques called Multi-Task Learning (MTL), which improves generalization by leveraging the domain- speciic information contained in the training signals of related tasks (Caruana 1997). Multi-task learning’s key rationale is learning the shared representation among all associated tasks, and it can also be viewed as a regularization technique. The company’s shared representation is learned based on the guidance from all related tasks. Furthermore, the layered architecture of deep learning models makes it quite feasible to practice multi-task learning. In Figure 8, the model is learned by optimizing an aggregate loss function that is a linear combination of the individual task loss. If every target variable gets the same weight, the model optimizes an unweighted loss in the output layer. Otherwise, diferent variables can get diferent weights, and the model will then optimize the weighted loss in the output layer. We report the implemented weight details of MEAU and MEAW models in Section 7.3. 4.6 SREA or SMRE: Single-task Single-Ratio or Multi-Ratio E vent Atention Integration Model In Figure 9, after the individual sequential processes of the ratio stream and pseudo-event stream, we start to combine the hidden states of these two data streams in the Merge (Multiply) lay m er as = Multiply (l ,h ) (i )j (i )j (i )j , where l is hidden states from the ratio LSTM layerh, andis hidden state from the pseudo-events BiLSTM (i )j (i )j layer (the layer before the last BiLSTM layer). Inspired by (Luong et al . 2015), we irst conduct a temporal self-attention on the integrated vemctor, and the (i )j exp (z ) (i )j attention scoreb is computed asz = ELU (W m + o ) and b = . To give the model more (i )j (i )j (i ) (i )j (i )j (i )j exp (z ) (i )j historical knowledage, we then apply the attention weights b on the ratio LSTM hidden states and calculate (i )j the integrated weighted vector j as j = b l . (i )t (i )t (i )j (i )j j=t−M ACM Trans. Manag. Inform. Syst. 12 • Zhai and Zhang Fig. 9. SREA and SMRE Models Next, we concatenate the integrated weighted vector j with the last hidden state of the pseudo-event BiLSTM (i ) layer to compute the context vectorC and compute the Firm Embedding C = [j ;h ] , where [;] denotes (i )t (i )t (i )t (i )t concatenation. Then, the model computes the Firm Embedding f e based on information from both ratio stream (i )t and pseudo-event stream. Similar to the SEA model, Firm Embedding is the input for the prediction of individual t+H target variables in the dense layerf as e = Dense (C ) and y = Dense (f e ). Similar to the SMR model, (i )t (i )t (i )t (i )p the SMRE model concatenates all ratiosRto in the Ratio Input layer. (i )j 4.7 MREA: Multi-task R atio Event Atention Integration Model Just like the diference between the SR and MR models, two model components are diferent in the SREA and MREA models. First, in the Ratios Input layer in Figure 10, all ratios are concatenated before feeding into dense vector in thev = Dense (R ) layer. Second, in the Multi-task Output layer, the shared company semantic (i )j (i )j t+H representation Firm Embedding is used to predict all target variables simultaneY ously=in Dethe nse (f e ) (i )t (i ) layer. 5 DATASET AND PREPROCESSING 5.1 Dataset Although our framework applies to public and private irms, we only have access to public irms’ inancial data. Hence, we focus on Fortune 1,000 companies in our study. For the news stream, we use news articles from a major business news service published between the year 2011 and 2015 . The descriptive statistics of the textual data are listed in Table 3. For the ratio stream, ticker ID and inancial ratios between the year 2011 and 2016 are collected from Wharton Research Data Services (WRDS). Here is another practical constraint we have to work with. Though the models are capable of handling much larger text corpora, this is the maximum amount of news articles we were able to collect. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 13 Fig. 10. MREA Model Table 3. Textual Data Statistics Year Number of Number of Number of Articles Pseudo- Companies events 2011 18,615 92,707 707 2012 14,840 77,043 697 2013 58,211 289,007 819 2014 49,711 183,976 817 2015 39,705 135,936 810 Total 181,082 778,669 927 (union) 5.2 Data Pre-processing 5.2.1 Event Pre-processing. We extract news content and publication time and parse the content into multiple sentences using Python NLTK. By consulting a pre-compiled gazetteer of company names, name variants, and abbreviations, we extract all sentences that contain mentions of each focal company, followed by simple noise- reduction heuristics. Each such sentence is treated as a pseudo-event. In the case of multiple in-list companies in one sentence, the corresponding pseudo-event will participate in multiple companies’ event sequences. After dropping no-news-coverage irms, 927 companies remain in our data. ACM Trans. Manag. Inform. Syst. 14 • Zhai and Zhang Table 4. ARIMA Best-performing parameters target variable p d q sale_nwc 1 1 2 pe_exi 1 1 0 de_ratio 1 0 2 cfm 1 0 0 npm 2 0 0 equity_invcap 1 0 0 cash_ratio 2 0 0 eps 1 1 0 Finally, we replace all company names and name variants with a special token ‘FOCOMP’ (meaning ’focal company’), to help the model generalize across irms by sharing statistical strength of similar event embeddings. 5.2.2 Ratio Pre-processing. To tackle the missing value problem in the ratio data downloaded from WRDS, we irst exclude companies with more than two missing values in the study period, then ill in the small number of missing values by linear interpolation between neighboring values. After eliminating companies with too many missing values in irm inancial ratios, 707 companies remain in our data. 5.2.3 Sentence (pseudo-event) Encoding. We use the SIF sentence embedding method (Arora et al . 2017) to encode each extracted pseudo-event into a 300-dimension dense vector. 5.2.4 Dataset Generation. To fully use all available data, we use the rolling window by one-month method to generate instances for each company. As illustrated in Figure 11, instances that have the forecasting periods between the year 2011 and 2014 are in the training set, instances that have forecasting periods within the year 2015 are in the validation set. To avoid peeking into the future and staying realistic to the real scenario, instances with a forecasting period at the end of 2016 are in the testing set. Fig. 11. Model Validation and Testing 6 MODEL INSTANTIATION, TRAINING, AND EVALUATION 6.1 Model Instantiation 6.1.1 ARIMA Model Instantiation. ARIMA model is well-known to be suited for small memory sizes, so we look for the baseline model among a set of small-memory-size ARIMA models. To ind the best performing ARIMA model parameters, we experiment with combinations of the following parameter settings: autoregressiv p e term in {1, 2, 3}, diferencing term d in {0, 1}, and the moving average term q in {0, 1, 2}. Among these variations, we ind the best-performing coniguration p,d ,(q) shown in Table 4 in the validation phase and use them in the inal testing. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 15 6.1.2 VAR Model Instantiation. Similar to the ARIMA model’s instantiation, we also need to ind the best performing VAR( p) model parameters. We experiment autoregressive term p in {1, 2, 3, 4, 5} on the diferencing terms {0, 1, 2}. Among these variations, we ind the best-performing model VAR(1)pwith =1 and diferencing = 2 to provide the lowest average of variables performance in the validation phase and use it in the inal testing. 6.2 Model Training In principle, one can stack a large number of BiLSTM layers to build a deep neural network. For practical computational complexity reasons, we only stack two BiLSTM layers in our text-based models and three BiLSTM layers in our integration models. In experiments, we deine time jwindo as a month w and memory sizeM = 12. In line with the vision of long-term forecasting illustrated in Figure 2, we set the forecasting H = 12. horizon We also setK=5, the median number of pseudo-events per company per month , and they are selected based on the L2 norm values of the pseudo-event embeddings sorted from the largest to the smallest. In all deep learning models, we use Exponential Linear Units (ELUs) (Clevert . 2016) et alas the activation function and employ Adam optimizer (Kingma and Ba 2015). 6.3 Model Evaluation We conduct the model evaluation shown in Figure 11 in the most stringent temporal out-of-sample fashion. Two phases are implemented in the entire process. In the validation phase, we use event data from 2011 to 2014 as training data and predict ratios during the year 2015. In this process, we also tune model hyper-parameters (e.g., number of epochs and learning rate) by validating against the target ratios. Once model parameters are determined based on validation performance, in the inal testing phase, we retrain the same model by using all data from the year 2011 to 2015 and evaluate the inal prediction at the end of 2016. We employ Mean Absolute P ˆ Y −Y 100% N i i Percentage Error (MAPE) MAPE = | | as the loss function and use it as the performance measure. i=1 N Y The inal result is the mean of ten runs of each model coniguration. 7 RESULTS, ANALYSIS, AND DISCUSSION Our model performance is summarized in Table 5. Table 5. Model Performance (MAPE, M=12, H=12) Output ARIMA VAR SR SMR MR SEMax SEA MEAU MEAW SREA SMRE MREA Vari- able sale_nwc 100.55% 2885.92% 25.79% 27.18% 30.11% 57.64% 55.95% 52.10% 52.61% 25.57% 27.36% 32.51% pe_exi 100.28% 1381.58% 50.87% 51.06% 53.46% 96.83% 96.64% 70.84% 69.88% 48.68% 54.15% 54.76% de_ratio 66.92% 246.32% 20.71% 20.73% 22.53% 64.07% 55.44% 51.07% 51.59% 20.65% 20.92% 27.76% cfm 187.38% 205.38% 43.08% 44.37% 45.65% 94.70% 89.10% 74.33% 72.20% 41.51% 45.57% 48.17% npm 163.54% 232.25% 69.59% 71.69% 73.07% 112.55% 95.85% 104.89% 98.67% 65.66% 79.95% 85.31% equity_ 33.15% 77.10% 19.77% 23.13% 24.79% 107.69% 90.76% 77.67% 73.16% 18.80% 23.35% 29.17% invcap cash_ratio 58.65% 144.58% 32.73% 38.33% 42.22% 102.11% 85.33% 90.50% 81.61% 32.45% 38.86% 51.46% 7.1 Time Series Models (ARIMA and VAR) Time series models ARIMA and VAR are baseline models that operate on inancial ratios. While the ARIMA model takes one ratio series at a time (single variable input and single variable output), the VAR model works on multiple ratio series simultaneously (multiple variables input and multiple variables output). Our experiment results show that the ARIMA model performs better than VAR, which indicates the interactions among all variables can ACM Trans. Manag. Inform. Syst. 16 • Zhai and Zhang negatively inluence the individual target variable’s prediction. Therefore, modeling all variables at the same time is not beneicial in our task. Furthermore, considering the higher diferencing value (d=2) in the VAR(1) model, we can also see that making the VAR model stationary is more challenging. Our observations are consistent with the indings of the previous literature (Bagshaw 1986, Litterman 1986). 7.2 Ratio-based Deep Learning Models (SR, SMR, and MR) When comparing ratio-based deep learning models and time series models, we observe that the deep learning models produce much stronger performance than traditional time series models, thanks to their non-linearity and larger memory. The results suggest that when forecasting performance is the top priority, deep learning models should be the preferred tool for numerical time series. Among the SR, SMR, and MR’s model performance, we notice again that to lump sum all variables together in the input is not a good approach, even in the more lexible deep learning architectures. So far, we have observed that traditional time series models are outperformed by their deep learning counter- parts. More importantly, both groups of models lack the capacity to consume textual data. To the latter point, we witness the unique text power from event-based deep learning models, SEMax, SEA, MEAU, and MEAW. 7.3 Event-based Deep Learning Models (SEMax, SEA, MEAU, and MEAW) While event-based deep learning models have no access to numerical ratio data, they show very competitive forecasting performance. They indicate that even if there is no numerical data available, predicting irm inancial ratios from textual data is still feasible. The event-based models’ performance are stronger than traditional time series models in 5 out of 7 target ratios (sale_nwc, pe_exi, de_ratio, cfm, and npm). For the rest two target variablesequity_invcap ( and cash_ratio), ARIMA model produces lower MAPE than event-based deep learning models. This outcome suggests the existence of diferent dynamics underlie diferent inancial Equity_invcap ratios. and cash_ratio are capitalization and liquidity category variables. Capitalization and liquidity ratios are tied to the irm’s internal operations. Therefore, they are more stable, receive less news coverage, and are more amenable to an inherently linear model such as ARIMA. In contrast, the ive outperforming ratios all involve sales and the stock market. They are more volatile, receive more news coverage, and consequently see more success with news event-based non-linear deep learning models. Also, among event-based deep learning models, the SEA model consistently outperforms the SEMax model. It emphasizes the superiority of the attention mechanism in SEA over the relatively naive max-pooling trick in SEMax. Furthermore, the MEAU model (each task-speciic loss gets an equal weight of 1) shares representations among multiple ratios, yields further performance gain over the SEA model on 5 out of 7 target ratios. To further boost the multi-task learning performance, we assign higher weights to the loss of the two łharderž target ratios,npm and cash_ratio. When moderate over-weighting is in place (roughly between 2 and 5), we observe another wave of error reduction on the MEAW model. When comparing performance between the multi-task deep learning models with the VAR model, we notice that in joint modeling of multiple target ratios, news event-induced shared representation is considerably more superior to over-parameterized interactions in the VAR model. 7.4 Ratio-Event Integration Deep Learning Models (SREA, SMRE, and MREA) The SREA model performance is lifted over SR by bringing in the news data. The most beneited target ratios are price to earnings ratio pe_exi ( ), cash low margincfm ( ), and net proit marginnpm ( ), variables happen to be among the winning ones from the event-based deep learning models. It strengthens our observation of the power of news data. On the other hand, the inferiorities of model SMRE and MREA further conirm that lumping all ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 17 ratio series together as model input is not a good idea, in the same ways it is not a good idea in VAR and MR. Overall, SREA is the winning model across all model families by combining news data with individual ratio series. 7.5 Model Findings We summarize our major indings below: • When working with numerical series alone, deep learning models outperform traditional time series models. • When numerical data is not available, event-based deep learning models leverage attention mechanism and multi-task learning framework and achieve competitive forecasting performance. • When numerical and event data are available, ratio-event integration models perform the best in the single-task framework. • Across all model families, uni-variate models outperform multi-variate models. Multi-task learning is only beneicial on event-based deep learning models considering the powerful shared representations induced from textual data. 8 MODEL EXPLAINABILITY INSIGHTS Deep learning models are often criticized as being big black boxes. While such criticism is not entirely unreasonable considering the nonparametric nature of deep learning models, this section represents our efort to open the boxes and gain insights into their inner workings. More speciically, we extract pseudo-event attention weights and irm embeddings at the testing phase, visualize and analyze them. To look for the purity of each target variable and focus on how news articles perform on one ratio at a time, we use the SEA model to derive the interpretability insights. 8.1 Event Atention Map The introduced attention mechanism provides an excellent form of model interpretability, in the sense that the normalized attention scores quantify the relative importance of the pseudo-events within each time window. We extract these scores from the corresponding pseudo-events in the Pseudo-event Attention layer of the proposed models. In this example, we work with the SEA model for ratio sale_nwc and focus on the company Apple Inc. A much more detailed example on Expedia is presented in Appendix F. We present the overview of pseudo-events attention map in Figure 12, and zoom into the speciics in Table 6 and Table 7. In Table 6, the most-attended pseudo-event is #4, which discusses Apple Inc.’s patent iling two years ago. Intuitively, the event likely creates a positive long-term impact on the irm’s inancial well-being. The model correctly assigns it a higher weight. Table 7 suggests a very interesting mistake made by the model. Pseudo-event #1 was deemed relevant to Apple Inc. by the event extraction algorithm due to company name matching and assigned higher weights by the attention mechanism due to semantic matching (it is undoubtedly sales-related). The only problem is that the sentence actually talks about Samsung (at the time, it was a supplier of chips used in Apple devices), the correct understanding of which can only arise from a broader context and requires NLP capabilities arguably beyond state-of-the-art (e.g., anaphora resolution across sentence boundaries) models. 8.2 Firm Embeddings Another useful insight of our model is irm embeddings. Technically, each irm embedding is a semantic represen- tation of the irm learned from the model. Understandings of these embeddings are usually derived from irms’ relative positioning in a high-dimensional space, typically visualized in a 2-D space. The notion of proximity can intuitively guide peer benchmarking for internal management and portfolio-building for outside investors. ACM Trans. Manag. Inform. Syst. 18 • Zhai and Zhang Fig. 12. Events Atention Map Visualization Example: Apple Inc. Overview Table 6. Events Atention Map Example: sale_nwc, Apple Inc. Month 1 Event Attention Text weight 1 0.17 Apple’s iPhone and iPad have won over users in recent years. 2 0.18 Apple was third with 10.6 percent. 3 0.17 Apple, the largest company by market value, reports results. 4 0.27 Cupertino, California-based Apple iled the patent application in June 2013. 5 0.20 Apple jumped 5.7 percent. Table 7. Events Atention Map Visualization Example: Apple Inc. Month 11 Event Attention Text weight 1 0.29 In addition to sales from its own handsets, it gains revenue from each iPhone sold because its chip unit manufactures the main pro- cessor used in Apple’s phones and tablets. 2 0.21 These stock grants are meant to reward them down the road for their hard work in helping to keep Apple the most innovative company in the world. 3 0.25 The company sought an order for Apple to produce the documents. 4 0.12 Apple, ..., added 2.5 percent to $388.83. 5 0.12 Apple, meanwhile, opened its iTunes store in 2003. A traditional way to understand irm relationships is through industry segmentation. Therefore, we refer to Fama-French 5-industry portfolios (Fama and French 2019) in Table 8 for irm’s industry. Since we replaced all company names with the special token ‘FOCOMP’, our model generates irm embeddings without knowing the company and certainly not knowing any industry membership. We extract each irm ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 19 Table 8. Fama-French 5-Industry Portfolios Industry Portfolio Consumer Consumer Durables, NonDurables, Wholesale, Retail, and Some Services Manufacturing Manufacturing, Energy, and Utilities HiTech Business Equipment, Telephone and Television Transmission Healthcare Healthcare, Medical Equipment, and Drugs Others Mines, Construction, Building Maintenance, Transportation, Hotels, Business Services, Entertainment, and Finance embedding by capturing the activations at the inal Dense layer and use t-SNE (van der Maaten and Hinton 2008) to visualize the irm embeddings in a 2-D space. We demonstrate irm embeddings encoded by two diferent ratios from the SEA model to relect the power of text. The clustering phenomena of Figure 13 and Figure 14 show the irm embeddings trained from texts in a non-trivial manner. The irm embeddings are trained from the textual information in the news that is relevant to the targeted ratio. The sentence weights are also assigned during the model training. The clustering behavior is a product of model training and demonstrating what has been learned from the training process. In addition, as indicated in Figure 14, the target ratio values of similar companies are not necessarily in a narrow range. 8.2.1 Sale_nwc-encoded firm embeddings. The irm embeddings encoded by sale_nwc are shown in Figure 13. We have two main observations. First, we observe that the overall irm embedding space agrees with the Fama-French industry segmentation quite well. Most consumer companies (in blue) are clustered in the top portion of the graph: one big cluster is on the right upper corner (region 1), and a smaller crowd inside and is on the mid-upper region (region 2). The healthcare industry (in gold) and the HiTech industry (in green) are clustered on the left bottom corner (region 3). The manufacturing companies are scattered across the space. Second, when there is a disagreement, the irm embeddings learned from deep learning structures can pick up interesting semantics. For example, although Netlix (in purple) is Fama-French classiied ‘Others’ category, its irm embedding is situated very close to Walt Disney and CBS (in green) in region 4, likely due to their similar product oferings. On the other hand, though oicially classiied in the HiTech industry, Walt Disney and CBS are located much closer to companies in the consumer industry than typical HiTech companies (e.g., Google and Microsoft in region 3). Another example we observe is Nike (in region 2), which, though oicially belongs to the manufacturing industry, falls into the consumer industry’s neighborhood. Practically, its łpersonaž in public perceptions does appear to be more like Macy’s and Foot Locker than Caterpillar. 8.2.2 De_ratio-encoded firm embeddings. We visualize the irm embeddings encodede_ratio d by in Figure 14 (debt ratio values are shown in the igure when applicable), and have the following observations: • Retail companies (region 3), such as Wal-Mart, Costco, Best Buy, the Home Depot, and Target, have similar debt ratios, and they are admittedly recognized closely by our irm embedding. • The solvency situation of HiTech companies (region 2) is relatively similar. Hence, they are clustered together, with IBM being an exception (our models did not succeed in forecasting its relatively high debt ratio). • Berkshire Hathaway (region 1), surprisingly, is quite far away from other inancial institutions. Does it make sense? As it turns out, its debt ratio (1.18) is much lower than companies such as Citigroup (7.44), JPMorgan Chase (9.49), and Morgan Stanley (10.12). The reason is Berkshire Hathaway is known for its conservative debt use, as its chairman and CEO Warren Bufett believes in running businesses with as little debt as possible. Our model picks this semantic up from texts precisely. ACM Trans. Manag. Inform. Syst. 20 • Zhai and Zhang To further examine the model behavior and comparison, we use the dense layer before the prediction layer to produce the plots for each target variable from the ratio-only model SR. The results show that ratio-only models do not provide interpretable and meaningful irm clusters as the texts-only SEA models do. The reason is that the irm embedding from the texts-only model and their corresponding clustering behaviors emerge from texts. According to the targeted ratio, the applicable information of a irm in the text makes its relative location. For example, the clustering behavior in Figure 13 has emerged from textual information related to sales and net working capital, and Figure 14 has emerged from texts relevant to debt. The above examples show that our news event-powered deep learning models learned sophisticated irm semantic representations much better than the hard industry segmentation, which otherwise requires non-trivial knowledge of irms and industries. Meanwhile, the non-trivial clusterings show the value of texts and the interpretable insights of each variable provided by the text models. Such interpretable and dynamic clustering phenomena will not exist when having ratios alone. 8.3 Case Study: Expedia Group Inc. In this example, we work with two target ratioscfm - (cash low margin) andnpm (net proit margin) to show their event attention maps. In addition, we demonstrate how the event attention maps can serve as sense-making tools to facilitate concept, impact factor, and hypothesis generation, in Figure 17. 8.4 Practical Relevance After looking at the interpretation of the model results, although it is not causal, we observe that our proposed text-based model provides better-predicted results when the predicted ratio is outside of the historical values range . In other words, if a irm experiences large swing in certain business aspects, such as sales, earnings, or operation, using text data will help the decision-maker make more accurate predictions. Meanwhile, while our model can be universally used on all companies, it is beneicial for industries with long-term business cycles, such as the health industry. We also observe that companies with higher market value tend to have more media coverage, and continuous media coverage helps the model extract useful information consistently. In addition, the high attention sentences learned from the text model can help decision-makers distinguish higher relevant information automatically and efectively in a timely manner. The customized irm clusters generated from our irm embedding will quickly help decision-makers grasp not obvious and non-trivial company similarities in a graph representation. The graphs are user-friendly and intuitive. Each igure is customized for each ratio. With the high emphasis on individual ratios, these graphs will support decision-makers to make clear interpretations and better goal-oriented decisions. Moreover, in the case study of Expedia Group Inc., we provided the sentence weights on diferent ratios for comparison. The business insights ofered can help decision-makers automatically identify highly relevant sentences and information on a broad spectrum of variables. We provide the case study details in Appendix F. We provide supporting details in Appendix G. We provide more details in Appendix H and Appendix I. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 21 ACM Trans. Manag. Inform. Syst. Fig. 13. Firm Embedding Encoded by sale_nwc 22 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. Fig. 14. Firm Embedding Encoded by de_ratio Read the News: Not the Books • 23 Finally, the abstract łtheory-alikež interpretation derived from high attention sentences, shown in Figure 17, can help decision-makers make logical references on various events and business aspects. As a result, the decision-makers can use the various tools provided in this study to support their decision-making under multiple and diferent business scenarios in reality. 9 CONCLUSIONS This paper shows the efectiveness of deep text mining models in forecasting irms’ long-term inancial per- formance, driven by irm-speciic pseudo-events. Overall, the news event-powered deep learning models yield impressively competitive forecasting performance compared to standard time series models based on historical accounting data. We consistently demonstrate the power of text in multiple model families, whether accounting data is present or absent. In addition, we ofer two model insights, events attention map and irm embedding, toward model interpretation, transparency, and accountability, especially when numerical data is not available. Our work has several considerable implications for irm stakeholders, the inancial industry, and the research community. Internally, the forecasting models and their artifacts provide decision support for irm executives when performing various tasks, such as strategic planning, inancial forecasting, and peer benchmarking. Externally, institutional investors can beneit from the models’ predictive outcome and explanatory insights. Such enhanced transparency and accountability are even more critical when dealing with private irms with no duty to disclose their accounting books. Main Street investors usually are not equipped or inclined to conduct sophisticated quantitative analysis based on (arguably clean) numerical accounting data. Our models and artifacts provide previously non-existent, interpretable decision without aid compromising the prediction quality. Third-party regulators and agencies represent another family of beneiciaries. For example, S&P Global Ratings’ analysis can leverage the enriched information channel and interplay with multi-faceted inancial pictures, both of which are essential components in our framework. On the methodological front, our problem formulation and model architecture are problem independent. Therefore, they have general applicability in solving a broad spectrum of economic forecasting problems. Finally, artifacts derived from our models can help members of the business research community, in particular theory-minded researchers, generate new concepts and hypotheses, and even conduct preliminary theoretical explorations. Although we present an early attempt at bringing large-scale textual data and state-of-the-art deep learning models into inancial forecasting, our work is not without its limitations. Several directions can be extended. First, the success of our models heavily relies on the quality of input. Room for improvement exists in event extraction and event encoding techniques. Second, other types of non-numerical data can participate in the modeling process, e.g., textual portions of SEC ilings, sell-side reports, etc. Third, multi-task learning potential is yet to be fully realized by designing more educated information-sharing model architectures. Fourth, the generalizability of our model architecture will need to be fully explored in broader economic forecasting settings. One inal note: this research aims not to establish a horse race and declare the winner between econometric models and AI-based models. While ARIMA or VAR is certainly not the former’s apex, our deep learning-based models also have great potential for improvement. We strive for, and what we hope to have achieved, to illustrate the empirical power of textual data in economic forecasting and demonstrate corresponding modeling capabilities. In practice, sophisticated researchers and agencies will likely exploit both families of models. The linear form of ARIMA models and their coeicients are deemed to be interpretable by many researchers, but not necessarily the best genre of interpretability for mathematically unsophisticated laypeople. Though we do not intend to engage in a philosophical debate around model interpretability, we believe our model artifacts provide alternative avenues of this important notion. ACM Trans. Manag. Inform. Syst. 24 • Zhai and Zhang REFERENCES Ahmed Abbasi, Jingjing Li, Donald Adjeroh, Marie Abate, and Wanhong Zheng. 2019. Don’t Mention It? Analyzing User- Generated Content Signals for Early Adverse Event Warnings. Information Systems Research30, 3 (2019), 1007ś1028. Alan S Abrahams, Weiguo Fan, G Alan Wang, Zhongju Zhang, and Jian Jiao. 2015. An integrated text analytic framework for product defect discoveryPr . oduction and Operations Management24, 6 (2015), 975ś990. Ronald W Anderson and Malika Hamadi. 2016. Cash holding and control-oriented inance Journal. of Corporate Finance41 (2016), 410ś425. Werner Antweiler and Murray Z Frank. 2004. Is all that talk just noise? The information content of internet stock message boards. The Journal of inance59, 3 (2004), 1259ś1294. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. Proceedings of the International Conference on Learning Representations (2017). Michael Bagshaw. 1986. Comparison of univariate ARIMA, multivariate ARIMA and vector autoregression for Federal ecasting. Reserve Bank of Cleveland Working Papers WP 86-02 (1986). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate.Proceedings of the International Conference on Learning Representations (2015). Gustaf Bellstam, Sanjai Bhagat, and J Anthony Cookson. 2017. Innovation in Mature Firms: A Text-Based Analysis. SSRN (2017). Maria Elena Bontempi, Laura Bottazzi, and Roberto Golinelli. 2020. A multilevel index of heterogeneous short-term and long-term debt dynamics.Journal of Corporate Finance64 (2020), 101666. Rich Caruana. 1997. Multitask Learning. Mach. Learn. 28, 1 (July 1997), 41ś75. https://doi.org/10.1023/A:1007379606734 Changling Chen, Jeong-Bon Kim, and Li Yao. 2017. Earnings smoothing: Does it exacerbate or constrain stock price crash risk?Journal of Corporate Finance42 (2017), 36ś54. Chen-Huei Chou, Atish P Sinha, and Huimin Zhao. 2010. A hybrid attribute selection approach for text classiication. Journal of the Association for Information Systems 11, 9 (2010), 1. Jonathan Clarke, Hailiang Chen, Ding Du, and Yu Jefrey Hu. 2020. Fake news, investor attention, and market reaction. Information Systems Research(2020). Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and accurate deep network learning by exponential linear units (ELUs). Proceedings of International Conference on Learning Representations (2016). Alex Coad. 2010. Exploring the processes of irm growth: evidence from a vector auto-regression. Industrial and Corporate Change 19, 6 (2010), 1677ś1703. Sanjiv R Das and Mike Y Chen. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the Management web. science53, 9 (2007), 1375ś1388. Deloitte. 2017. Report: Fintech by the numbers. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. Twenty-fourth In international joint conference on artiicial intelligence . Ranjan D’Mello, Mark Gruskin, and Manoj Kulchania. 2018. Shareholders valuation of long-term debt and decline in irms’ leverage ratioJournal . of Corporate Finance48 (2018), 352ś374. Hamdi Driss, Wolfgang Drobetz, Sadok El Ghoul, and Omrane Guedhami. 2021. Institutional investment horizons, corporate governance, and credit ratings: International evidence Journal . of Corporate Finance67 (2021), 101874. Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-Based Dependency Parsing with Stack Long Short-Term Memory. InProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Pap . Asso ers)ciation for Computational Linguistics, Beijing, China, 334ś343. http://www.aclweb.org/anthology/P15-1033 ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 25 Robert O Edmister. 1972. An empirical test of inancial ratio analysis for small business failur Journal e preof diction. Financial and Quantitative analysis 7, 2 (1972), 1477ś1493. Joseph Engelberg, R David McLean, and Jefrey Pontif. 2018. Anomalies and neThe ws.Journal of Finance73, 5 (2018), 1971ś2001. Eugene F. Fama and Kenneth R. French. 2019. Detail for 5 Industry Portfolios. http://mba.tuck.dartmouth.edu/pages/faculty/ ken.french/Data_Library/det_5_ind_port.html. Lily Fang and Joel Peress. 2009. Media coverage and the cross-section of stock returns. The Journal of Finance64, 5 (2009), 2023ś2052. Murray Z Frank and Tao Shen. 2016. Investment and the weighted average cost of capital. Journal of Financial Economics 119, 2 (2016), 300ś315. Diego Garcia. 2013. Sentiment during recessions. The Journal of Finance68, 3 (2013), 1267ś1300. Matthew Gentzkow, Bryan T Kelly, and Matt Taddy. 2019. Text as Data.Journal of Economic Literatur(2019). e Alex Graves, Abdel-rahman Mohamed, and Geofrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing . IEEE, 6645ś6649. Richard Gruss, Alan S Abrahams, Weiguo Fan, and G Alan Wang. 2018. By the numbers: The magic of numerical intelligence in text analytic systems. Decision Support Systems113 (2018), 86ś98. Jarrad Harford, Ambrus Kecskés, and Sattar Mansi. 2018. Do long-term investors improve corporate decision making? Journal of Corporate Finance50 (2018), 424ś452. Paul Hemp. 2009. Death by information overload. Harvard business review87, 9 (2009), 82ś9. Gerard Hoberg and Gordon Phillips. 2018. Conglomerate industry choice and product language Management . Science64, 8 (2018), 3735ś3755. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memorNeural y. computation9, 8 (1997), 1735ś1780. Tadaaki Hosaka. 2019. Bankruptcy prediction using imaged inancial ratios and convolutional neural Expnetw ert systems orks. with applications 117 (2019), 287ś299. Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. ProIn ceedings of the eleventh ACM international conference on web search and data mining . 261ś269. Charles Kang, Frank Germann, and Rajdeep Grewal. 2016. Washing away your sins? Corporate social responsibility, corporate social irresponsibility, and irm performance Journal.of Marketing80, 2 (2016), 59ś79. Hyung Cheol Kang, Robert M Anderson, Kyong Shik Eom, and Sang Koo Kang. 2017. Controlling shareholders’ value, long-run irm value and short-term performanceJournal . of Corporate Finance43 (2017), 340ś353. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations(2015). Sandy Klasa, Hernan Ortiz-Molina, Matthew Serling, and Shweta Srinivasan. 2018. Protection of trade secrets and capital structure decisions. Journal of Financial Economics 128, 2 (2018), 266ś286. Yann LeCun, Yoshua Bengio, and Geofrey Hinton. 2015. Deep learning. Nature 521, 7553 (27 5 2015), 436ś444. https: //doi.org/10.1038/nature14539 Kai Li, Feng Mai, Rui Shen, and Xinyan Yan. 2018. Measuring Corporate Culture Using MachineSSRN Learning. (2018). Xiao-Bai Li and Jialun Qin. 2017. Anonymizing and sharing medicalInformation text records.Systems Research28, 2 (2017), 332ś352. Fengyi Lin, Deron Liang, and Enchia Chen. 2011. Financial ratio selection for business crisis Expertprsystems ediction. with applications 38, 12 (2011), 15094ś15102. Walter Lippmann. 1946.Public opinion . Vol. 1. Transaction Publishers. Robert B Litterman. 1986. A statistical approach to economic forecasting. Journal of Business & Economic Statistics 4, 1 (1986), 1ś4. ACM Trans. Manag. Inform. Syst. 26 • Zhai and Zhang Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. 2020. Finbert: A pre-trained inancial language represen- tation model for inancial text mining. ProceeIn dings of the Twenty-Ninth International Joint Conference on Artiicial Intelligence, IJCAI . 5ś10. Tim Loughran and Bill McDonald. 2016. Textual analysis in accounting and inance:Journal A surveyof. Accounting Research 54, 4 (2016), 1187ś1230. Rui Luo, Weinan Zhang, Xiaojun Xu, and Jun Wang. 2018. A Neural Stochastic Volatility Mo Prodel. ceedings In of the Thirty-Second AAAI Conference on Artiicial Intelligence . Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Efective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Lisbon, Portugal, 1412ś1421. https://doi.org/10.18653/v1/D15-1166 Ronny Luss and Alexandre Aspremont. 2015. Predicting abnormal returns from news using text classiication. Quantitative Finance15, 6 (2015), 999ś1012. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jefrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality Pro.ce Inedings of the 26th International Conference on Neural Information Processing Systems (Lake Tahoe, Nevada). 3111ś3119. Marc-Andre Mittermayer and Gerhard F Knolmayer. 2006. Newscats: A news categorization and trading system. Sixth In International Conference on Data Mining . Ieee, 1002ś1007. Makoto Miwa and Mohit Bansal. 2016. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long.Papers) Association for Computational Linguistics, Berlin, Germany, 1105ś1116. http://www.aclweb.org/anthology/P16-1105 Pisut Oncharoen and Peerapon Vateekul. 2018. Deep learning using risk-reward function for stock market prediction. In Proceedings of the 2018 2nd International Conference on Computer Science and Artiicial Intelligence . 556ś561. Alan Pankratz. 1983.Forecasting with univariate Box-Jenkins models: Concepts and cases . John Wiley & Sons. Gautam Pant and Olivia RL Sheng. 2015. Web footprints of irms: Using online isomorphism for competitor identiication. Information Systems Research26, 1 (2015), 188ś209. Jefrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) . 1532ś1543. David T Robinson and Berk A Sensoy. 2016. Cyclicality, performance measurement, and cash low liquidity in private equity. Journal of Financial Economics 122, 3 (2016), 521ś543. Bryan R Routledge, Stefano Sacchetto, and Noah A Smith. November, 2016. Predicting merger targets and acquirers from text. Working Paper(November, 2016). David E. Rumelhart, Geof E. Hinton, and R. J. Wilson. 1986. Learning representations by back-propagating Natur errors. e 323 (1986), 533ś536. Robert P Schumaker and Hsinchun Chen. 2009. Textual analysis of stock market prediction using breaking inancial news: The AZFin text system.ACM Transactions on Information Systems (TOIS)27, 2 (2009), 12. Paul C Tetlock. 2007. Giving content to investor sentiment: The role of media in the stock Themarket. Journal of inance62, 3 (2007), 1139ś1168. Paul C Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure irms’ fundamentals.The Journal of Finance63, 3 (2008), 1437ś1467. Laurens van der Maaten and Geofrey Hinton. 2008. Visualizing Data using Journal t-SNE. of Machine Learning Research 9 (2008), 2579ś2605. Manuel R Vargas, Carlos EM dos Anjos, Gustavo LG Bichara, and Alexandre G Evsukof. 2018. Deep leaming for stock market prediction using technical indicators and inancial news2018 articles. International In Joint Conference on Neural Networks (IJCNN). IEEE, 1ś8. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.AIn dvances in Neural Information Processing Systems . 5998ś6008. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 27 Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) . Association for Computational Linguistics, Beijing, China, 707ś712. http://www.aclweb.org/anthology/P15-2116 William Yang Wang and Zhenhao Hua. 2014. A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls.. Asso In ciation for Computational Linguistics . 1155ś1165. Eric Weisbrod. 2018. Stockholders’ Unrealized Returns and the Market Reaction to Financial Disclosur The Journal es. of Finance(2018). Yumo Xu and Shay B Cohen. 2018. Stock Movement Prediction from Tweets and Historical Prices. Association In for Computational Linguistics , Vol. 1. 1970ś1979. Emin Zeytinoglu and Yasemin Deniz Akarim. 2013. Financial failure prediction using inancial ratios: An empirical application on Istanbul Stock Exchange.Journal of Applied Finance and Banking 3, 3 (2013), 107. Shihao Zhou, Zhilei Qiao, Qianzhou Du, G Alan Wang, Weiguo Fan, and Xiangbin Yan. 2018. Measuring customer agility from online reviews using big data text analytics. Journal of Management Information Systems35, 2 (2018), 510ś539. ACM Trans. Manag. Inform. Syst. 28 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. A STOCK MARKET VARIABLES AS DEPENDENT VARIABLE Table 9. Stock Market: Theory Testing Literature Dependent Variable Input Primary Independent Variables Methodology Citation Genre Stock return public news anomaly variables, information-day dummy linear regression (Engelberg et al . variables 2018) Stock return, market volume public news news sentiment vector auto-regressive (VAR) model (Tetlock 2007) Earning, stock return public news news sentiment linear regression (Tetlock et al . 2008) Stock return public news media coverage CAPM, F-F 3-factor model, Carhart 4-factor (Fang and Peress model, Pastor-Stambaugh liquidity model2009) Stock return public news news sentiment (fraction of positive and neg- time series regression, GARCH model (Garcia 2013) ative words) Abnormal trading volume, ab-earning an- capital gain overhang, unexpected earnings, Cox proportional hazard rate model, Fama-(Weisbrod 2018) normal return nouncement idiosyncratic return volatility French-momentum 4-factor model Stock price, return, volatilitymessage message sentiment, disagreement, message classiier ensembles, linear regression (Das and Chen board volume, trading volume 2007) Trading volume, volatility message number of messages, message sentiment and naive bayes classiier, linear regression (Antweiler and board the agreement index Frank 2004) Read the News: Not the Books • 29 ACM Trans. Manag. Inform. Syst. Table 10. Stock Market: Prediction-related Literature Output Variable Input Input Variables Method Citation Genre stock price, price movement, re-public news bag-of-words features, noun phrases, named support vector machine (SVM) (Schumaker and turn entities Chen 2009) stock price movement public news event extraction, event embedding convolutional neural network (CNN) (Ding et al. 2015) stock price movement public news word embedding gated recurrent units (GRU) (Hu et al. 2018) stock price movement press release bag-of-words features Rocchio, k Nearest Neighbors (kNN), linear (Mittermayer and SVM, non-linear SVM Knolmayer 2006) stock price movement press release bag-of-words features support vector machine (SVM) (Luss and Aspre- mont 2015) stock price movement news head- word embedding, historical price convolutional neural network (CNN) and(Oncharoen and line long short-term memory network (LSTM) Vateekul 2018) stock price movement news title word embedding, technical indicators convolutional neural network (CNN) and(Vargas et al. long short-term memory network (LSTM) 2018) stock price volatility earnings unigrams, bigrams, named entity, part-of- semiparametric Gaussian copula regression (Wang and Hua calls speech features 2014) stock price movement social media word embedding generative recurrent neural network (RNN) (Xu and Cohen 2018) stock price volatility time series historical stock price generative neural stochastic model (Luo et al. 2018) 30 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. B FIRM FINANCIAL RATIO AS DEPENDENT VARIABLE Table 11. Firm Financial Ratios: Theory Testing Literature Dependent Variable Input For- Primary Independent Variables Methodology Citation mat Net book leverage, net market numerical recognition of Inevitable Disclosure Doctrine diference-in-diference, diference-in- (Klasa et al. 2018) leverage (IDD), proit margin diference-in-diferences, linear probability model Net cash low as percentage numerical price to dividend ratio , yield spread linear regression (Robinson and of committed capital, net cash Sensoy 2016) low Cash holding , Tobin’s Q numerical capital expenditures, working capital, cashlinear regression (Anderson and low, debt Hamadi 2016) Cost of equity numerical cash low to capital ratio linear regression, CAPM, F-F 3-factor model, (Frank and Shen Carhart 4-factor model 2016) Tobin’s QEBI , TA numerical cash low, R&D expenses, capital expenditure, linear regression (Kang et al. 2017) debt to asset ratio Daily stock return numerical earningsmoothing Jones model, linear regression (Chen et al. 2017) Read the News: Not the Books • 31 C AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODEL (ARIMA) ARIMA model has three components, an autoregression (AR) component on the variable itself, a diferencing (I) component if the time series is not stationary, and a moving average (MA) component on error terms. In ARIMA(p,d ,q) model,p denotes the order of the autoregression (AR), d denotes the order of diferencing (I), and q denotes the order of the moving average (MA). In Equations 6 andL7,is the lag operator, and ϵ is a white noise process. p q X X i d i (1− ϕ L ) (1− L) Y = (1− θ L )ϵ (6) i t i t i=1 i=1 ϵ ∼ N (0, σ ) (7) If we use Box-Jenkins backshift operator, Equation 6 can be written as p d q (1− B) (1− B) Y = (1− B) ϵ (8) t t ϵ ∼ N (0, σ ) p 2 p q 2 p , where B is the backshift operator (1− , B) = 1−ϕ B−ϕ B −···−ϕ B , and (1−B) = 1−θ B−θ B −···−θ B . 1 2 p 1 2 p Equation 6 and Equation 8 can be also written as ϕ (B)∇ Y = θ (B)ϵ (9) p t q t ϵ ∼ N (0, σ ) or ϕ (B)∇ Y = θ (B)ϵ (10) t t ϵ ∼ N (0, σ ) , where ∇ is the diference operator ϕ (,B) = ϕ (B) = 1− ϕ B is thep-order AR operator, θ (B) = θ (B) = p i j i=1 1− θ B is theq-order MA operator, and ϵ is a white noise process. j t j=1 D VECTOR AUTOREGRESSION MODEL (VAR) The Vector Autoregression Model (VAR) model is a multi-variate time series model that describes the inter- dependency between the participating time series when they are inluencing each other. Each time series can be described as its lag values, other time series’ lag values, and an error term. Lag values are represented by autoregression (AR) terms. For example, the VAR model was used to identify the relationship among irm employment growth, sales growth, and proits growth (Coad 2010), and relationship among corporate social responsibility, corporate social irresponsibility, and irm performance (Kang et al. 2016). In VAR model,p denotes the order of the autoregression, and k denotes the number of variables. We can use Equation 11 to illustrate their interactions. Y = A Y + A Y + ... + A Y + c + ϵ (11) t 1 t−1 2 t−2 p t−p t , where A is a time-invariant k x k matrix,c is a constant, andϵ is an error term which satisfyingEthat (ϵ ) = 0 t t and no correlation across times. If we usek = 2 and p = 1 as an example, the VAR (1) model can be represented as Y = a Y + a Y + c + ϵ (12) 1,t 1,1 1,t−1 1,2 2,t−1 1 1,t Y = a Y + a Y + c + ϵ (13) 2,t 2,1 1,t−1 2,2 2,t−1 2 2,t ACM Trans. Manag. Inform. Syst. 32 • Zhai and Zhang E LONG SHORT-TERM MEMORY NETWORKS (LSTMS) Each LSTM cell has an input gate i , an output gate o , and a forget gatef , at each time stept. And output vector t t t of cell at time t, h , is based on the previous cell state h , the current state inputx , and the current cell statec . t t−1 t t By including the sigmoid activation function σ and hyperbolic tangent activation function σ , we can write the д c LSTM with forget gate as: f = σ (W x + U h + b ) (14) t д f t f t−1 f i = σ (W x + U h + b ) (15) t д i t i t−1 i o = σ (W x + U h + b ) (16) t д o t o t−1 o c = f ⊙ c + i ⊙ σ (W x + U h + b ) (17) t t t−1 t c c t c t−1 c h = o ⊙ σ (c ) (18) t t h t , where W , W , W , U , U , and U are weight matrices, and b , b , and b are biases. i o i o i o f f f F CASE STUDY: EXPEDIA GROUP INC. In this example, we work with two target ratios cfm-(cash low margin) andnpm (net proit margin), and hence examine two SEA models in parallel. We choose these two ratios in this case study because they share some common impact factors, and we’d like to see if and how our event attention model captures the semantic subtlety. Figure 15 and Figure 16 provide the bird’s-eye view of overall attention maps. Fig. 15. Events Atention Visualization Example: Expedia Encoded by cfm Table 12 shows pseudo-events in year 2015 month 1. We can see pseudo-event #2 and #3 got greater attention from the model. After looking at these two sentences carefully, we found they focused on the company’s internal activities that lead to a change in cash low and proit. For example, event #2 is talking about Expedia expanded its business via a partnership with Travelocity. Expedia’s responsibility is to provide customer services for Travelocity and support Travelocity’s website under the partnership. Both responsibilities imply more revenue hence more cash low and net proit for Expedia. Let’s move to another high-attention pseudo-event #3. We can see this sentence is talking about how Expedia updates its strategies, i.e., bringing down room prices and taking commissions from hotels, to generate more revenues. It means our event attention model pays closer attention to the pseudo-events that are talking about the irm’s proit generation (1a ) activities and internal cost control (1b) activities. Consistent with human intuition, those activities have larger impacts when forecasting a irm’s future cash low margin and net proit margin. Code used in Figure 17. To be discussed later. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 33 Fig. 16. Events Atention Visualization Example: Expedia Encoded by npm Table 12. Events Atention Visualization Example: Expedia Month 1 Event cfm at- npm at- Text tention tention weight weight 1 0.18 0.19 The company, which develops software for irms such as Barclays Plc and Expedia, is headquartered in the U.S. and has about a third of its 11,000 employees in Russia and Ukraine, ... 2 0.24 0.20 Under that partnership, Expedia provided customer service for Travelocity and sup- ported its websites in the U.S. and Canada. (1a) 3 0.23 0.24 Online travel agents and intermediaries such as Priceline Group and Expedia are push- ing down room prices and taking commissions from hotels, while enabling travelers to organize trips individually, poaching customers from tour operators. (1a, 1b) 4 0.19 0.19 At least 30 people will be on hand to lead workshops and provide advice, including Di-Ann Eisnor, ...; John Malloy, ...; Jerry Engel, ...; and Sam Friend, ..., now owned by Expedia. 5 0.16 0.19 You will now receive the Game Plan newsletter Last year, Google licensed hotel-booking software from Room 77 , a startup backed by Expedia. In Table 13, we can see pseudo-event #3 talks about an acquisition activity (1c) of Expedia. Merger and acquisition activities will impact on company’s cash low; therefore, this pseudo-event got greater attention. In event #4, the pseudo-event talks about an Expedia strategy: displaying travel agents by packaging and selling discounted airfares and hotel rooms. This strategy is likely to increase company sales and lead to higher cash low and proits. Table 13. Events Atention Visualization Example: Expedia Month 2 Event cfm at- npm at- Text tention tention weight weight 1 0.20 0.18 Expedia intends to keep the Orbitz brand intact, Khosrowshahi said on the call. 2 0.20 0.14 Today, Expedia landed one for its shareholders. 3 0.23 0.27 TripAdvisor surged 24 percent after Expedia agreed to acquire Orbitz Worldwide. (1c) 4 0.25 0.28 The online booking company Expedia displaced travel agents by packaging and selling discounted airfares and hotel rooms. (1a, 1b) 5 0.11 0.13 Tarran Vaillancourt, an Expedia spokeswoman, declined to comment. ACM Trans. Manag. Inform. Syst. 34 • Zhai and Zhang In Table 14, we can see several external impact factors mentioned in month 4. Pseudo-event #1 talks about Google’s search engine impact on Expedia and its competitors (2b1, 2b2). More visibility in searched results on Google will likely generate more sales for Expedia. Pseudo-event #5 talks about a newcomer, the world’s biggest online retailer, is trying to compete with Expedia to gain some market shares from Expedia. This pseudo-event implies the impact is not only applied to Expedia itself but also the entire industry. Table 14. Events Atention Visualization Example: Expedia Month 4 Event cfm at- npm at- Text tention tention weight weight 1 0.26 0.26 Microsoft, Expedia, publishers and others have asked the EU to examine complaints that Google favors its own services over competitors and hinders specialized search engines that compete with it. (2b1, 2b2) 2 0.18 0.21 Expedia is joining U.S. issuers from Coca-Cola to Berkshire Hathaway, which sold notes in the shared currency this year as European Central Bank stimulus drives down funding costs in the region. 3 0.20 0.20 With Google commanding almost all of the search market in some European countries, critics including Microsoft and Expedia are fed up with the company, which they say highlights its own Web services in query results at the expense of rivals. 4 0.10 0.12 Expedia climbed after quarterly revenue exceeded estimates. 5 0.25 0.22 The world’s biggest online retailer by revenue will be competing with Priceline Group, Expedia, startup Airbnb and others for a piece of the online hotel booking market. (2b1, 2b2) Table 15 shows that the models corresponding to diferent target variables, cfm and npm, have diferent attention weights on diferent pseudo-events. Pseudo-event #1 talks about another company that bought a large amount of Elong’s stock shares from Expedia. This market activity is likely to have a larger impact on Expedia’s cash low. Pseudo-event #2 talks about Goldman’s announcement about Expedia. Pseudo-event #4 talks about Expedia’s stock share sell in the biotechnology industry and small-caps. Output variable cfm focus more on pseudo-event #1 and #2 while npm focus more on pseudo-event #2 and #4. All pseudo-event #1, #2, and #4 are external impacts (2c) to Expedia, and they are all events in the stock market. Table 15. Events Atention Visualization Example: Expedia Month 5 Event cfm at- npm at- Text tention tention weight weight 1 0.20 0.15 Ctrip, which operates the country’s biggest travel-booking website, bought the Elong stake from U.S.-based Expedia , becoming its biggest shareholder. (2c) 2 0.30 0.27 Goldman talks about hereinclude Caterpillar, Coca Cola, Phillips 66, United Technolo- gies, Automatic Data Processing, and Expedia. (2c) 3 0.17 0.21 Stocks rose, with the Standard & Poor’s 500 Index paring a weekly loss, as Gilead Sciences and Expedia rallied after Thursday’s sellof in biotechnology and small-cap shares. (2c) 4 0.16 0.22 Stocks pared a weekly loss on Friday, as Gilead Sciences and Expedia rallied after Thurs- day’s sellof in biotechnology and small-cap shares.(2c) 5 0.17 0.15 Expedia reached a record, rising 6.7 percent for the ifth straight gain and the longest streak since January. In Table 16, we can see the previously mentioned Elong stock share sell event #2, which relates to the stock market (2c), gets more attention. Pseudo-event #5 receives greater attention due to the discussion of the business environment (2b1, 2b2). ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 35 Table 16. Events Atention Visualization Example: Expedia Month 7 Event cfm at- npm at- Text tention tention weight weight 1 0.13 0.17 Expedia jumped 13 percent and Amgen added 2.9 percent as results beat estimates. 2 0.26 0.24 In May, Expedia sold its stake in ELong , a Chinese online travel company. (2c) 3 0.12 0.17 Amgen rallied 2.9 percent and Expedia jumped 13 percent on better-than-estimated earnings. 4 0.16 0.14 Expedia also topped a record, leading consumer-discretionary shares higher after second-quarter sales and proit topped analysts’ estimates. 5 0.33 0.29 Other homegrown technology companies include Zillow Group, Expedia and Zulily. (2b1, 2b2) Table 17 is another example of diferent target variable allocating diferent attention weights to pseudo-events. The output variable cfm pays more attention to pseudo-event #1 and #2, while the output variable npm pays more attention to pseudo-event #1 and #3. Pseudo-event #1 and #2 are talking about external impact factors to the entire business environment (2b1, 2b2), and those impacts will lead to an impact on Expedia’s cash low. Meanwhile, pseudo-event #1 and #3 are talking about the competition and direct external impact on Expedia, leading to an impact on Expedia’s net proit. The attention models capture the important and related pseudo-events precisely. Table 17. Events Atention Visualization Example: Expedia Month 8 Event cfm at- npm at- Text tention tention weight weight 1 0.26 0.26 With established tour operators facing competition from low-cost airlines and online booking sites such as Expedia.com, Thomas Cook and TUI are under pressure to invest in new oferings. (2b1, 2b2) 2 0.24 0.19 Google’s arguments were countered by Thomas Vinje, a lawyer with Cliford Chance who represents FairSearch Europe, whose members include Microsoft , Expedia and Nokia Oyj. (2b1, 2b2) 3 0.22 0.22 Fighting Competition Priceline, whose sites include Booking.com and Kayak, has struck partnerships and made purchases to fend of competition from Google and Expedia. (2a) 4 0.12 0.13 Expedia’s shares rose to a record on July 31 after second-quarter sales and proit topped analysts’ estimates. 5 0.17 0.20 When asked whether Priceline has seen mounting competition from Google or Expedia, Huston said there has not been a tangible change. In Table 18, the highlighted pseudo-events focus on external impact factors. Pseudo-event #2 and #3 are talking about external partnership (2a) of Expedia. The business partnership is likely to impact Expedia’s revenue and cash low. Pseudo-event #5 is talking about the stock market movement (2c) for Expedia, which may impact Expedia’s net proits. Finally, what’s extremely exciting about this somewhat lengthy case study is that we realize event attention maps in our models (and attention maps in general) not only can serve as sense-making tools but also can facilitate concept, impact factor, and hypothesis generation . We illustrate example in Figure 17. Figure 17 illustrates a tentative concept hierarchy that emerged from our case study of Expedia, upon consolidat- ing insights gained from examining various attention maps (the codes map to previously discussed pseudo-event examples). We can see two high-level impact factors can impact a irm’s cash low margin and net proit margin: internal and external factors. Three kinds of irm internal activities can afect irm’s cfm and npm. ACM Trans. Manag. Inform. Syst. 36 • Zhai and Zhang Table 18. Events Atention Visualization Example: Expedia Month 11 Event cfm at- npm at- Text tention tention weight weight 1 0.15 0.19 Expedia lost 2.6 percent to $122.01, paring an earlier decline of 3.6 percent. 2 0.22 0.18 Expedia has had a partnership with HomeAway for two years, Khosrowshahi said in Wednesday’s statement. (2a) 3 0.23 0.21 Bringing HomeAway into Expedia’s portfolio of brands ’is a logical next step,’ Khosrow- shahi said. (2a) 4 0.22 0.18 Last week, travel-booking site Expedia agreed to acquire vacation-rental company HomeAway for $3.9 billion. 5 0.18 0.24 Online travel company Expedia rebounded 1.9 percent after sliding 2.9 percent Tuesday, while Marriott International and Carnival each added 1.2 percent. (2c) External factors can include three facets, as well. One facet is the companies’ partnership that can integrate companies’ strength to gain more market share. A larger market share likely leads to greater proits. The second external factor is the business environment change, impacting the entire industry or the focal company alone. The third external impact factor is the irm’s stock market activities. Fig. 17. Impact Factors for Cash Flow Margincfm ( ) and Net Profit Marginnpm ( ) ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 37 Table 19. Outperforming Cases Distribution on News-only Model Compared with Ratio-only Model (Prediction within vs. outside of Historical Values Range) sale_nwc pe_exi de_ratio cfm npm equity cash_ratio _invcap within 33.25% 44.34% 26.51% 33.98% 33.49% 26.99% 36.39% historical values range outside of 66.75% 55.66% 73.49% 66.02% 66.51% 73.01% 63.61% historical values range G OUTPERFORMING CASES DISTRIBUTION ON NEWS-ONLY MODEL COMPARED WITH RATIO-ONLY MODEL (PREDICTION WITHIN VS. OUTSIDE OF HISTORICAL VALUES RANGE) After investigating the model results, we found that when the predicted ratio value luctuates outside of the historical ratio range, we tend to see more accurately predicted (outperforming) cases from the news-only model than the ratio-only model. It means when one irm experiences events that cause large business swings, the news information is much needed. In Table 19, we show that among the outperforming cases, when the predicted ratio value luctuates outside the historical ratio range, the news-only model has more outperforming case percentages than its ratio-only counterpart. For example, for the sales to networking capital ratio sale_nwc, when the predicted value luctuates outside the historical values range, 66.75% of the outperforming cases are from the news-only model. Similarly, for debt ratio de_ratio, 73.49% of the outperforming cases are from the news-only model when the predicted value luctuates outside the historical values range. We see this phenomenon is general and across all ratios. More importantly, this phenomenon shows that if the irm experienced dramatic changes, news article information is important for making better predictions. H INDUSTRY DISTRIBUTIONS ON THE DATASET AND OUTPERFORMING CASES DISTRIBUTION ON NEWS-ONLY MODEL COMPARED WITH RATIO-ONLY MODEL We summarize the industry distributions when the news-only model provides more accurate predictions (outper- forming cases) than the ratio-only model. Table 20 shows the industry distributions of the entire dataset and outperforming cases percentage on the news-only model vs. ratio-only model on each variable. In Table 20, we observe that the industry distribution of the outperforming cases is similar to the entire dataset. Furthermore, the health industry has a consistently higher-than-dataset distribution while the consumer industry has a lower. Although not causal, it shows our model can beneit industries with longer business cycles, such as the health industry. Meanwhile, it indicates our model learns universally on all companies; therefore, the outperforming cases are across all industries. Also, it veriies the universal usage of our model. I MEDIA COVERAGE DISCUSSION We make three additional observations on media coverage as the following. (1) Companies with higher market value tend to have more media coverage. In Table 21, we categorized companies by their market value (by the end of the dataset time) and observed that higher market value companies have more media coverage on average. In Table 22, we show the average market Note: we use the terms łmedia coveragež and łthe number of pseudo-eventsž interchangeably. ACM Trans. Manag. Inform. Syst. 38 • Zhai and Zhang Table 20. Industry Distributions on the Dataset and Outperforming Cases of News-only Model Compared with Ratio-only Model Industry dataset sale_nwc pe_exi de_ratio cfm npm equity cash_ratio _invcap Consumer 30.84% 26.44% 22.58% 28.81% 26.80% 24.78% 30.77% 26.97% Health 7.47% 9.20% 12.26% 8.47% 8.25% 9.73% 7.69% 12.36% HiTech 20.00% 19.54% 23.23% 23.73% 18.56% 23.89% 16.67% 20.22% Manufacturing27.23% 24.14% 29.68% 23.73% 29.90% 23.89% 32.05% 26.97% Others 14.46% 20.69% 12.26% 15.25% 16.49% 17.70% 12.82% 13.48% Table 21. Average Number of Pseudo-events per Market Value Category Market Value Category Average Monthly Pseudo-events > $ 500,000 M 3.13 between $ 100,000 M and $ 500, 000 M 2.89 between $ 50,000 M and $ 100, 000 M 2.47 < $ 50,000 M 0.82 Table 22. Average Number of Pseudo-events per Industry Industry Average Market Value Average Monthly Pseudo-events Consumer $ 16,599 M 0.86 Health $ 55,922 M 1.63 HiTech $ 45,125 M 1.31 Manufacturing $ 14,240 M 0.77 Others $ 11,885 M 1.15 value of each industry and their average number of pseudo-events per month. We observe that higher market value industries tend to have more media coverage, for example, the Health and HiTech industries. (2) If the news-only model outperforms the ratio-only model, the volatility of the news coverage tends to be higher. We use volatility to evaluate the luctuations in the number of pseudo-events of each company within each time window (such as monthly). We deine the news coverage volatility as the standard deviation of the number of pseudo-events change△ , as t∈1:M volatility = (△ −△) (19) M − 1 t=1 ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 39 Table 23. Average Number of Pseudo-events Volatility of Outperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.53 0.51 0.52 0.64 0.54 0.49 0.59 Health 0.65 0.77 0.78 1.06 0.78 0.74 0.75 HiTech 0.58 0.56 0.58 0.53 0.57 0.68 0.72 Manufacturing 0.48 0.47 0.50 0.53 0.48 0.59 0.64 Others 0.57 0.39 0.29 0.50 0.39 0.49 0.49 Table 24. Average Number of Pseudo-events Volatility of Underperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.45 0.45 0.45 0.42 0.44 0.46 0.43 Health 0.67 0.50 0.64 0.53 0.60 0.65 0.62 HiTech 0.52 0.51 0.53 0.54 0.52 0.51 0.48 Manufacturing 0.42 0.41 0.43 0.40 0.42 0.39 0.38 Others 0.31 0.38 0.40 0.34 0.38 0.36 0.36 Table 25. Average Number of Pseudo-events of Outperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 1.17 1.09 1.05 1.17 1.03 0.82 1.08 Health 1.13 1.64 2.07 2.34 2.05 1.83 1.82 HiTech 1.48 1.33 1.29 1.28 1.36 1.74 1.75 Manufacturing 0.93 0.82 1.14 0.75 0.89 1.28 1.13 Others 1.81 1.25 1.12 1.38 1.10 1.28 1.47 , where number of pseudo − events  t  − 1 , if number of pseudo − events >0 t−1 number of pseudo − events t−1 △ = (20) 0 , otherwise As shown in Table 23 and Table 24, the pseudo-events volatility of the outperforming cases group is higher than the underperforming group in the majority cases. It means outperforming companies have regular media coverages instead of being silent for a long time. The continuous input helps the model extract helpful information from news consistently. (3) The average number of pseudo-events from the outperforming group tends to be higher than its counterpart. As shown in Table 25 and Table 26, we observed the number of pseudo-events of the outperforming group is higher than its counterpart in the majority of the cases. The above observations are relected by our data. We provide these observations and want to shed light on the practical implications of the model. ACM Trans. Manag. Inform. Syst. 40 • Zhai and Zhang Table 26. Average Number of Pseudo-events of Underperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.80 0.78 0.84 0.79 0.82 0.88 0.82 Health 1.73 1.47 1.48 1.30 1.31 1.51 1.44 HiTech 1.28 1.31 1.33 1.33 1.30 1.24 1.20 Manufacturing 0.76 0.77 0.74 0.80 0.76 0.65 0.70 Others 0.84 1.08 1.13 1.04 1.15 1.10 1.05 ACM Trans. Manag. Inform. Syst. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Management Information Systems (TMIS) Association for Computing Machinery

Read the News, Not the Books: Forecasting Firms Long-term Financial Performance via Deep Text Mining

Loading next page...
 
/lp/association-for-computing-machinery/read-the-news-not-the-books-forecasting-firms-long-term-financial-LvJfFEli9X
Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Association for Computing Machinery.
ISSN
2158-656X
eISSN
2158-6578
DOI
10.1145/3533018
Publisher site
See Article on Publisher Site

Abstract

Read the News, Not the Books: Forecasting Firms’ Long-term Financial Performance via Deep Text Mining SHUANG (SOPHIE) ZHAI, University of Oklahoma, USA ZHU (DREW) ZHANG, Iowa State University, USA In this paper, we show textual data from irm-related events in news articles can efectively predict various irm inancial ratios, with or without historical inancial ratios. We exploit state-of-the-art neural architectures, including pseudo-event embeddings, Long Short-Term Memory Networks, and attention mechanisms. Our news-powered deep learning models are shown to outperform standard econometric models operating on precise accounting historical data. We also observe forecasting quality improvement when integrating textual and numerical data streams. In addition, we provide in-depth case studies for model explainability and transparency. Our forecasting models, model attention maps, and irm embeddings beneit various stakeholders with quality predictions and explainable insights. Our proposed models can be applied both when numerically historical data is or is not available. CCS Concepts: · Computing methodologies→ Natural language processing; Neural networks; · Applied computing → Economics. 1 INTRODUCTION News plays an important role in understanding and predicting irm performance (Clarke . 2020). et A als a communication channel, news articles report irm events and provide decision insights. Although it can be implemented manually with low eiciency, can news articles predict irm inancial prospects automatically and on a large scale? Furthermore, how can users better understand the automated process? This paper investigates these questions and proposes deep learning models to forecast irm inancial ratios, using unstructured (text) and structured (accounting) data. We also open up the proposed deep learning models to provide easy-to-understand model insights for end-users. The Financial Technology (Fintech) industry is on the rise (Deloitte 2017), and the global Fintech market is expected to grow progressively and reach around $305 billion. by The2025 signiicant market share and novel technology needs demand additional research in this area. The sheer volume of information is overwhelming for stakeholders to digest. Although the information overload problem is commonly acknowledged in inancial and broader settings (Hemp 2009), few techniques are readily available to help stakeholders with less sophisticated technical skills overcome this challenge. Therefore, tools and models that can facilitate this need automatically are in demand. In addition, not every stakeholder is proicient in analyzing traditional numerical data, such as accounting data. Therefore, tools and models that can provide linkage between textual and numerical information are greatly preferred. Third, although many stakeholders heavily rely on inancial reports which disclose both textual narratives and accounting igures to seek business https://www.prnewswire.com/news-releases/intech-industry-report-2020-2025---trends-developments-and-growth-deviations-arising- from-the-covid-19-pandemic-301080282.html Authors’ addresses: Shuang (Sophie) Zhai, sophie.zhai@ou.edu, University of Oklahoma, USA; Zhu (Drew) Zhang, zhuzhang@iastate.edu, Iowa State University, USA. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 2158-656X/2022/5-ART $15.00 https://doi.org/10.1145/3533018 ACM Trans. Manag. Inform. Syst. 2 • Zhai and Zhang insights, signiicant prediction disadvantages exist in inancial reports. First, a signiicant portion of corporate inancial reports is released after the actual events. While some reports include a forward-looking statement, such as 10-K, it is not enough for an informed understanding of the company dynamics. Therefore, most of the time, the speciic information provided to investors is after the actual events. Second, private companies are not legally required to disclose their inancial data. Investors of those companies are obligated to seek other information references, such as insider or news, to gain irm insights. However, investors can beneit from news articles in many aspects. News article reports timely corporate event (Lippmann 1946), is published online, and mostly free. Therefore, Main Street investors have easy access to the same information as advanced investors do. Readers can also obtain alternative views to seek expert opinions and community reactions. If investors systematically follow a company’s news, trends can emerge and lead to more educated decisions. More importantly, when accounting data is not available, news articles are naturally considered indispensable sources of up-to-date information for stakeholders who are constantly looking for inancial opportunities. These beneits are considered essential for investors and stakeholders. Public companies’ inancial performance is primarily anchored on stock returns, and private irms’ inancial details are hard to get. Evaluating a irm’s inancial performance is either with a limited scope or not available at all. However, it is vital to assess a irm’s inancial status through a broad spectrum of measures. Therefore, inancial ratios are practically useful to appraise a irm’s inancial performance in various aspects, such as proitability, eiciency, solvency, and others. Financial ratios are reliable measures of irm fundamentals. They can provide a tangible basis for market participants’ decision-making, especially for long-term-minded investors with low trading frequency. Other active stakeholders, such as credit rating agencies and insurance companies, are eyeing irms’ inancial images more broadly than stock returns. Outside of the stock market, private companies participate in the economy and undergo inancial scrutiny. Market participants need methods to assess private irm performance as well. We observe very limited data-driven research that studies multiple corporate inancial ratios from the existing literature, and several research gaps are identiied. Unlike inancial market studies, corporate inance literature is mostly dominated by theory testing research and lacks the study of problems related to the prediction of irm inancial ratios. Given inancial ratios’ indispensable value for the wide range of stakeholders, their forecasting capabilities are highly desirable. Almost all inancial-ratio-related studies use numerical variables as input. For reasons we articulated above, the power of textual data awaits to be exploited. News data has inherently very high dimensionality. It becomes the Achilles’ heel for classical econometric models. As a result, the model that works primarily on text representations is limited, and the integration model with textual and numerical data is yet to emerge in the leading economics literature. On the contrary, deep learning research represents state-of-the-art advancement in machine learning research. It learns implicit representation in the text by forgoing explicit feature engineering and tackles high-dimensionality issues by operating on ixed-size dense vectors nicely. Given the following motivations, we focus on the long-term corporate fundamentals rather than frequent short- term trading activities (Harfor.d2018). et al A irm’s long-term performance relects corporate governance and keeps managerial misbehavior under control (Harfor.d2018). et al For example, long-term irm health encourages the company innovation and improves its credit risk proile (Driss . 2021).etThe al increase of long-term debt is viewed as over-leverage and perceived as harming shareholders’ wealth (D’Mello . 2018). et al In addition, the persistence of long-term debt ratios can indicate a irm’s investment constraints, and market conditions (Bontempi et al. 2020). Financial ratios are representative measures for corporate fundamentals. Analyzing inancial ratios is a decisive step to understand a irm’s inancial health and help manage risks. Meanwhile, inancial ratios are indicators for critical corporate events, such as bankruptcy (Hosaka 2019), business crisis (Lin . 2011), et and al inancial failure (Edmister 1972, Zeytinoglu and Akarim 2013). Hence, forecasting irm inancial ratios bring competitive advantages for companies, investment agencies, and individual investors. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 3 The data sources of this study are chosen with purpose. Public news is a powerful source of inancial intelligence, and inancial ratios are concrete measures of irm’s inancial strengths and vulnerabilities. Both data streams have their unique value. Therefore, when inancial ratios are not available, the predictions made by news articles are valuable. When both data streams are available, the predictions made by integrating the two data streams are also helpful. As a result, studying how the news-based model performs is charming, and the integration between news and irm ratios is intriguing. Meanwhile, long-term irm inancial performance prediction becomes even more practically useful because most investors are not frequent traders. Therefore, we automate the news-powered and full-ledged irm long-term inancial performance forecasting task by exploiting state-of-the-art natural language processing and machine learning capabilities. Speciically, we propose deep learning models to help stakeholders acquire long-term irm intelligence, rather than market prediction, and expect to contribute on both methodological and application fronts in: • Proposing novel, low-barrier, and long-horizon (e.g., a year) models for all to forecast irms’ full-ledged inancial performance. • Providing low-latency and forward-looking intelligence to make almost-real-time investment decisions rooted in the timeliness of news data. • Enhancing model transparency, explainability, and accountability in inancial decision-making through provided model insights. 2 RELATED WORK Our study is relevant to three streams of literature. The irst stream is related inance literature involving textual data, which includes stock market studies, irm inancial ratio studies, and corporate inance studies. The second stream is time series models, which represent traditional treatment of sequential numerical data. The third stream is deep-learning-, and natural-language-processing- related modeling techniques, which constitute the methodological foundation of our research models. 2.1 Financial Studies using Text Input 2.1.1 Stock Market Variable as Dependent Variable. A public irm’s stock market performance is often the irst indicator of its overall inancial health. Therefore, numerous studies have been dedicated to the understanding or prediction of stock markets. Due to the vast body of literature in this area, we only review those directly relevant to our research, i.e., studies based on publicly available non-numerical information as input. A representative set of theory-testing studies in this area are summarized in .TFuele able 9d by recent advancements in Artiicial Intelligence (AI), researchers have also extensively studied how to predict stock market behavior using machine learning models. We summarize a representative set of such studies in T.able 10 2.1.2 Firm Financial Ratio as Dependent Variable. Besides stock returns, there exist other quantitative measures of a company’s inancial health, many of which form various inancial ratios. A comprehensive taxonomy of irm inancial ratios is deined by Wharton Research Data Services (WRDS ), and it constitutes the focal phenomenon of our study. A set of representative theory testing studies that involve inancial ratios are summarized in Table 11. Please see Appendix A. Please see Appendix A. https://wrds-www.wharton.upenn.edu/documents/793/WRDS_Industry_Financial_Ratio_Manual.pdf Please see Appendix B. We italicized the dependent variable or independent variable when it is either a inancial ratio or some variations (e.g., the nominator or denominator). ACM Trans. Manag. Inform. Syst. 4 • Zhai and Zhang 2.1.3 Corporate Finance Studies. Textual data have gained popularity in economics research (Gentzko.w et al 2019, Loughran and McDonald 2016). Several recent inance studies exploited textual data to investigate irm activities, such as irm organization (Hoberg and Phillips 2018), corporate cultur . 2018), e (Licorp et al orate innovation (Bellstam et . 2017), al competitor identiication (Pant and Sheng 2015), and merger and acquisition (Routledge et al . 2016), accompanied by machine learning techniques. Notably, economics-rooted studies are still mostly driven by theory testing, viewing AI-based tools merely as modeling aids instead of centerpieces. We believe the community should diversify its methodology portfolio by embracing forecasting frameworks based on machine learning backbones. 2.2 Time Series Models 2.2.1 AutoRegressive Integrated Moving Average Model (ARIMA). Time series models study temporal dynamics in sequence data. The ARIMA Model (Pankratz 1983) is the de facto standard for uni-variate time series analysis in econometrics. Therefore, it is a natural baseline model for our study. In ARIMA(p,d ,q) model,p denotes the order of the autoregression, d denotes the order of diferencing, and q denotes the order of the moving average. 2.2.2 Vector Autoregression Model (VAR). The VAR model is a multi-variate time series model that describes the inter-dependency between the participating time series when they are inluencing each other. In VAR p model, denotes the order of the autoregression, and k denotes the number of variables . 2.3 Deep Learning Techniques 2.3.1 Deep Learning Models. Deep learning (LeCun et .al 2015) models are uniquely positioned to consume high-dimensional text data (Abbasi . 2019, et alAbrahams et al. 2015, Chou et al. 2010, Gruss et al. 2018, Li and Qin 2017, Zhou et al . 2018) and capture non-linear temporal dynamics. They are a family of machine learning techniques composed of multiple linear and non-linear processing layers to learn representations of data with various abstraction levels. They discover intricate structures in large data sets by using the backpropagation algorithm (Rumelhart et. al 1986) and learn model parameters to compute the representation in each layer from the representation in the previous layer. Deep learning models, including Transformer-based models .(Devlin et al 2018, Vaswani et al . 2017), have dramatically improved the state-of-the-art in many domains, such as language modeling, speech recognition, question answering (Liu et al. 2020), visual object detection, and genomics. 2.3.2 Sentence Embedding. A centerpiece of natural language processing is semantic representations of words and sentences. Deep learning methods model them as word and sentence embeddings. An embedding is a mapping from a discrete object, such as a word or sentence, to a dense vector of real numbers. Glove (Pennington. et al 2014) and Word2Vec (Mikolov et al. 2013) are widely used word embedding techniques. While one can trivially combine word embeddings to produce a sentence embedding, researchers have studied techniques to generate more robust sentence representations. A building block of our models, Smooth Inverse Frequency (SIF) (Arora et al. 2017) is an unsupervised, PCA-based technique to produce sentence embeddings for downstream NLP applications. An illustration of the conceptual process can be found in Figure 1. 2.3.3 Long Short-Term Memory Networks (LSTMs). Long Short-Term Memory (LSTM) networks, irst proposed by (Hochreiter and Schmidhuber 1997), are a representative family of deep learning architectures for sequence data. They model non-linear temporal dynamics and are capable of digesting vector input and emitting vector output at every time stamp. Numerous studies have been conducted using LSTMs, such as information extraction (Miwa and Bansal 2016), syntactic parsing (Dyer.et 2015), al speech recognition (Graves et. al 2013), machine We present more model details in Appendix C. We present more model details in Appendix D. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 5 Fig. 1. Deep Learning for Natural Language Processing: An Illustration Fig. 2. Problem Formulation translation (Bahdanau et .al 2015), and question answering (Wang and Nyberg 2015). We present more model details in Appendix E. A natural extension of the LSTM network is the bidirectional LSTM (BiLSTM) architecture, which supports the sequence pass through the architecture in both feed-forward and backward directions and →− ←− →− ←− aggregates hidden statesh and h at every time step byh = σ (h ,h ). t t t t t 3 PROBLEM FORMULATION In this study, we use structured numerical data (represented by inancial ratios) and/or unstructured text data (represented by events from news articles) to forecast irms’ long-term inancial health, as embodied in the ACM Trans. Manag. Inform. Syst. 6 • Zhai and Zhang Table 1. Financial Ratio Definition Category Output Variable Name Deintion price Valuation Price to Earning (dilutepd, e_exi earning excl.EI) sales Eiciency Sales to Working Capital sale_nwc working capital total liability Solvency Debt Ratio de_ratio shareholders’ equity income before EI and depre. Soundness Cash Flow Margin cfm sales net income Proitability Net Proit Margin npm sales common equity Capitalization Common Equity to Invested equity_invcap invested capital Capital cash and short-term invest. Liquidity Cash Ratio cash_ratio current liabilities inancial ratios (an instance of do ł wnstream modelsž in Figure 1). The problem formulation illustrated in Figure 2 is generalizable to a broad array of economic forecasting problems with similar conigurations and high-dimensional input. Speciically, we deine a irm-speciic pseudo-event as a sentence that mentions a irm in the news. There are two types of pseudo-events in our study. The following are several sample sentences, which we treat as pseudo-events (the italicized entities represent the focal companies under consideration). Meanwhile, inancial ratios are indicators of an organization’s inancial and operating strengths and vul- nerabilities. A irm’s inancial performance can be evaluated in various aspects. Valuation-focused inancial ratios assess the organization’s value and its investment potential. Eiciency-focused ratios evaluate how the organization uses its assets to generate sales. Solvency-focused and soundness-focused ratios inspect if the irm can meet its debt obligations. Proitability-focused ratios measure a irm’s ability to make proits against its costs. Capitalization-focused ratios measure how well a irm transfers its capital to irm value. Liquidity-focused ratios indicate how well a company can handle its short-term debt obligations. Based on the irm inancial ratio categorization in the Wharton Research Data Services (WRDS), we assess a irm’s inancial performance from valuation, eiciency, solvency, soundness, proitability, capitalization, and liquidity categories. There is a list of ratios in each category. It is impractical (and unnecessary) to work with all 70+ inancial ratios deined in the WRDS document. Therefore, given literatur and theepreference in real-world analysis, we choose one representative measure from each category. The aims are to keep generality and practical relevance. We summarize the inancial ratios (used as both output and input in our framework) in Table The task intuition is that the semantic signals encoded in irm pseudo-events have predictive power for the irm’s future inancial status, measured in the form of inancial ratios in Table 1. Formally, t+H t Y = f (E , R ) (1) (i )j (i )j (i ) j=t−M |S | (i )j E = д (S ) (2) (i )j (i )jk k=0 Please see Appendix B. https://online.hbs.edu/blog/post/how-to-determine-the-inancial-health-of-a-company. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 7 , where • Y denotes all target ratios, y denotes individual ratio Y = ,{y , ...,y }, p indexes target ratios, and p ∈ 1 |Y | {1, ...,|Y|}. • H denotes the prediction horizon, and M denotes the memory horizon, both measured in terms of the number of time windows. • R denotes all input ratios, r denotes individual ratioR, and = [r ; ...;r ], where [;] denotes concatenation. 1 |R| • i indexes companies, k indexes pseudo-events, andj indexes time windows. • E is the aggregate event embedding for company i in time windo j;w S is the encoding of a sentence k (i )j (i )jk that mentions companyi in time windo j.w • f is a learned function that maps all pseudo-event embeddings and ratio(s) to target value(s). • д is a function that aggregates multiple sentence embeddings into one pseudo-event embedding. In principle f and , д can be parameterized as any function approximator. 4 FORECASTING MODELS Two data streams are served as model inputs in our proposed models shown in Figure 3. One is the corporate inancial ratio, and the other is irm news articles. In the ratio stream, missing data is interpolated based on rules. In the news stream, company focused sentences (pseudo-events) are extracted before generating pseudo-event embeddings. The ratio stream is the input for three groups of models: time series models, ratio-based deep learning models, and ratio-text integration deep learning models. The news stream is the input for two groups of models: news-based deep learning models and ratio-news integration deep learning models. In addition, we analyze two types of model insights - Events Attention Map and Firm Embeddings - to demonstrate the interpretability of the proposed deep learning models and understand the model’s inner workings from a business perspective. Single-task models are designed to output individual target variables, and multi-task models are designed to output multiple target variables simultaneously. We summarize these model characteristics in Table 2. Since the task has a natural time series setup, we chose LSTM and/or BiLSTM networks, state-of-the-art deep learning models for sequential data, as our proposed deep learning models’ backbone. The proposed models will learn non-linear temporal dynamics from irm events and inancial ratios to predict the irm target ratio. We have introduced the related time series models, ARIMA and VAR, in Section 2.2. In the following sections, therefore, we will introduce the proposed deep learning models. 4.1 SR or SMR: Single-task Single-Ratio orMulti-Ratio Input Model In the SR model shown in Figure 4, each company i’s individual ratio histor r is use y d as the model input. The single ratio value r is extended to a dense vector in the v = Dense (r ) layer. The dense vectorv becomes (i )j (i )j (i )j (i )j the next LSTM layer’s input. After the sequential processing in the LSTM layer, denseuvector is generated (i )t from theu = LST M (v t ) layer, and it is used as input to predict the individual y atratio the end of (i )t (i ) (i )p j=t−M t+H forecasting horizon H in they = Dense (u ) layer. Unlike the SR model, in the irst Ratio Input layer, the (i )t (i )p SMR model concatenates all ratiosRto , instead of using only one ratio as the input. (i )j 4.2 MR: Multi-task Multi- Ratio Input Model Similar to the SMR model, the MR model in Figure 5 concatenates all R ratios intothe Ratios Input layer. Then, (i )j all input ratios are computed at once inv the= Dense (R ) layer, andv becomes the input for the next (i )j (i )j (i )j LSTM layer. The LSTM layer outputsu = LST M (v t ) after the sequential processing. Finally, unlike the (i )t (i ) j=t−M SR model which predict the target variables individually, MR model predicts all target variables simultaneously t+H in theY = Dense (u ) layer. (i )t (i ) ACM Trans. Manag. Inform. Syst. 8 • Zhai and Zhang Fig. 3. Research Framework Overview Fig. 4. SR and SMR Models Fig. 5. MR Model 4.3 SEMax: Single-task Event Max-pooling Model Unlike the SR and MR models that use ratios in the model input, SEMax model uses textual data from company pseudo-events in news to predict target variables. After the pre-processing of pseudo-event extraction and sentence embedding, pseudo-event representations as forms of embeddings become the SEMax model’s input. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 9 Table 2. Models Summary Model Model Full Name Single-Multi-Single Multi-Event Deep Group task task Ra- Ratio In- Learn- tio In- put ing In- put put Time Series ARIMA Auto-Regressive ✓ ✓ Integrated Moving Models Average Model VAR Vector AutoregRession ✓ ✓ Model Ratio-based SR Single-task Single- Ratio ✓ ✓ ✓ Input Model Models SMR Single-taskMulti- Ratio ✓ ✓ ✓ Input Model MR Multi-task Multi- Ratio ✓ ✓ ✓ Input Model SEMax Single-task Event ✓ ✓ ✓ Maxpooling Model Event-based SEA Single-task Event ✓ ✓ ✓ Models Attention Model MEAU Multi-task Event ✓ ✓ ✓ Attention Unweighted Model MEAW Multi-task Event ✓ ✓ ✓ Attention Weighted Model Ratio-Event SREA Single-task Single- Ratio ✓ ✓ ✓ ✓ Integration Input Event Attention Models Model SMRE Single-taskMulti- Ratio ✓ ✓ ✓ ✓ Input Event Attention Model MREA Multi-task Multi- Ratio ✓ ✓ ✓ ✓ Input Event Attention Model The challenge here is while LSTM layer only accepts a single vector as input per time window, multiple pseudo- events are typically existed per window. Therefore, before feeding into the LSTM layer, multiple event embeddings are required to be aggregated into one. In the literature, such operations are often termed "pooling" and represent the essence of theд function in Equation 2. Pooling technique combines "nearby" values of a signal (e.g., through averaging) or picking one representative value (e.g., through maximization or sub-sampling). In our case, within every time window, we max-pool on every dimension of all pseudo-event embedding vectors in the MaxPooling ACM Trans. Manag. Inform. Syst. 10 • Zhai and Zhang layer, as shown in the highlighted pseudo-event MaxPooling component in Figure 6. Formally, we deine the <d > <d > MaxPooling component as∀d ,v = max E , where d indexes individual dimensions in the embedding (i )j (i )jk space, k indexes pseudo-events, andv is the aggregated vector for all pseudo-events in time windo j after w the pooling operation. In practice, we use BiLSTM layers as the model backbone. After BiLSTM layers, the model generates a dense vector that repre- senting the semantics of the company from news articles after the f e = BiLST M (v t ) layer. We call this dense vector (i )t (i ) j=t−M f e Firm Embedding. It is essentially the irm semantic repre- (i )t sentation learned from textual data in the form of pseudo-events. Finally, the SEMax model predicts the individual target ratio from the learned irm embedding in the output layer. 4.4 SEA: Single-task Event Atention Model Notice the max-pooling trick in the previous SEMax model is quite naive and produces a crude aggregate of multiple event embed- dings. A potentially more powerful approach is to preserve each event’s entirety and adaptively assign weights to them. In other words, the model should pay more łattentionž to the more infor- mative pseudo-events. The intuition for attention mechanisms in natural language processing is to assign greater attention to text units (typically words or tokens) that contain more information Fig. 6. SEMax Model for the task at hand. Bahdanau et al . (2015) were the irst to apply attention mechanism in NLP, and more speciically, in machine translation. More recently, Vaswani et . (2017) al used multi-head attention alone to solve sequence prediction problems that are traditionally handled by other neural networks such as LSTMs and Convolutional Neural Networks. As with the SEMax model, the SEA model also takes textual information from pseudo-events to predict single target variables. Unlike the SEMax model that handles pseudo-events aggregation through max pooling, the SEA model tackles the pseudo-events aggregation task from the attention mechanism, shown in the highlighted Pseudo-event Attention Component in Figure 7. Formally, we deine our event attention mechanism as: u = siдmoid (W E + b ) (3) (i )jk (i )j (i )jk (i )j exp (u ) (i )jk α = P (4) (i )jk exp (u ) (i )jk v = α E (5) (i )j (i )jk (i )jk k=1 , where u is a pseudo-event’s raw attention scorαe, is the normalized attention weight, v andis the (i )jk (i )jk (i )j weighted sum vector to represent companyi at time windojw . Similar to the SEMax model, we also use BiLSTM as the backbone of the SEA model for sequential processing. After the BiLSTM layers, the SEA model again generates the Firm Embedding, a dense vector representation of the company, in the Dense layer, and uses it to predict the output layer’s target value. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 11 Fig. 7. SEA Model Fig. 8. MEAU and MEAW Models 4.5 MEAU or MEAW: Multi-task E vent Atention Unweighted or Weighted Model While we can perform learning and inference on individual inancial ratios, it is conceivable that multiple ratios can share common characteristics rooted in the company’s business operations. A family of machine learning techniques called Multi-Task Learning (MTL), which improves generalization by leveraging the domain- speciic information contained in the training signals of related tasks (Caruana 1997). Multi-task learning’s key rationale is learning the shared representation among all associated tasks, and it can also be viewed as a regularization technique. The company’s shared representation is learned based on the guidance from all related tasks. Furthermore, the layered architecture of deep learning models makes it quite feasible to practice multi-task learning. In Figure 8, the model is learned by optimizing an aggregate loss function that is a linear combination of the individual task loss. If every target variable gets the same weight, the model optimizes an unweighted loss in the output layer. Otherwise, diferent variables can get diferent weights, and the model will then optimize the weighted loss in the output layer. We report the implemented weight details of MEAU and MEAW models in Section 7.3. 4.6 SREA or SMRE: Single-task Single-Ratio or Multi-Ratio E vent Atention Integration Model In Figure 9, after the individual sequential processes of the ratio stream and pseudo-event stream, we start to combine the hidden states of these two data streams in the Merge (Multiply) lay m er as = Multiply (l ,h ) (i )j (i )j (i )j , where l is hidden states from the ratio LSTM layerh, andis hidden state from the pseudo-events BiLSTM (i )j (i )j layer (the layer before the last BiLSTM layer). Inspired by (Luong et al . 2015), we irst conduct a temporal self-attention on the integrated vemctor, and the (i )j exp (z ) (i )j attention scoreb is computed asz = ELU (W m + o ) and b = . To give the model more (i )j (i )j (i ) (i )j (i )j (i )j exp (z ) (i )j historical knowledage, we then apply the attention weights b on the ratio LSTM hidden states and calculate (i )j the integrated weighted vector j as j = b l . (i )t (i )t (i )j (i )j j=t−M ACM Trans. Manag. Inform. Syst. 12 • Zhai and Zhang Fig. 9. SREA and SMRE Models Next, we concatenate the integrated weighted vector j with the last hidden state of the pseudo-event BiLSTM (i ) layer to compute the context vectorC and compute the Firm Embedding C = [j ;h ] , where [;] denotes (i )t (i )t (i )t (i )t concatenation. Then, the model computes the Firm Embedding f e based on information from both ratio stream (i )t and pseudo-event stream. Similar to the SEA model, Firm Embedding is the input for the prediction of individual t+H target variables in the dense layerf as e = Dense (C ) and y = Dense (f e ). Similar to the SMR model, (i )t (i )t (i )t (i )p the SMRE model concatenates all ratiosRto in the Ratio Input layer. (i )j 4.7 MREA: Multi-task R atio Event Atention Integration Model Just like the diference between the SR and MR models, two model components are diferent in the SREA and MREA models. First, in the Ratios Input layer in Figure 10, all ratios are concatenated before feeding into dense vector in thev = Dense (R ) layer. Second, in the Multi-task Output layer, the shared company semantic (i )j (i )j t+H representation Firm Embedding is used to predict all target variables simultaneY ously=in Dethe nse (f e ) (i )t (i ) layer. 5 DATASET AND PREPROCESSING 5.1 Dataset Although our framework applies to public and private irms, we only have access to public irms’ inancial data. Hence, we focus on Fortune 1,000 companies in our study. For the news stream, we use news articles from a major business news service published between the year 2011 and 2015 . The descriptive statistics of the textual data are listed in Table 3. For the ratio stream, ticker ID and inancial ratios between the year 2011 and 2016 are collected from Wharton Research Data Services (WRDS). Here is another practical constraint we have to work with. Though the models are capable of handling much larger text corpora, this is the maximum amount of news articles we were able to collect. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 13 Fig. 10. MREA Model Table 3. Textual Data Statistics Year Number of Number of Number of Articles Pseudo- Companies events 2011 18,615 92,707 707 2012 14,840 77,043 697 2013 58,211 289,007 819 2014 49,711 183,976 817 2015 39,705 135,936 810 Total 181,082 778,669 927 (union) 5.2 Data Pre-processing 5.2.1 Event Pre-processing. We extract news content and publication time and parse the content into multiple sentences using Python NLTK. By consulting a pre-compiled gazetteer of company names, name variants, and abbreviations, we extract all sentences that contain mentions of each focal company, followed by simple noise- reduction heuristics. Each such sentence is treated as a pseudo-event. In the case of multiple in-list companies in one sentence, the corresponding pseudo-event will participate in multiple companies’ event sequences. After dropping no-news-coverage irms, 927 companies remain in our data. ACM Trans. Manag. Inform. Syst. 14 • Zhai and Zhang Table 4. ARIMA Best-performing parameters target variable p d q sale_nwc 1 1 2 pe_exi 1 1 0 de_ratio 1 0 2 cfm 1 0 0 npm 2 0 0 equity_invcap 1 0 0 cash_ratio 2 0 0 eps 1 1 0 Finally, we replace all company names and name variants with a special token ‘FOCOMP’ (meaning ’focal company’), to help the model generalize across irms by sharing statistical strength of similar event embeddings. 5.2.2 Ratio Pre-processing. To tackle the missing value problem in the ratio data downloaded from WRDS, we irst exclude companies with more than two missing values in the study period, then ill in the small number of missing values by linear interpolation between neighboring values. After eliminating companies with too many missing values in irm inancial ratios, 707 companies remain in our data. 5.2.3 Sentence (pseudo-event) Encoding. We use the SIF sentence embedding method (Arora et al . 2017) to encode each extracted pseudo-event into a 300-dimension dense vector. 5.2.4 Dataset Generation. To fully use all available data, we use the rolling window by one-month method to generate instances for each company. As illustrated in Figure 11, instances that have the forecasting periods between the year 2011 and 2014 are in the training set, instances that have forecasting periods within the year 2015 are in the validation set. To avoid peeking into the future and staying realistic to the real scenario, instances with a forecasting period at the end of 2016 are in the testing set. Fig. 11. Model Validation and Testing 6 MODEL INSTANTIATION, TRAINING, AND EVALUATION 6.1 Model Instantiation 6.1.1 ARIMA Model Instantiation. ARIMA model is well-known to be suited for small memory sizes, so we look for the baseline model among a set of small-memory-size ARIMA models. To ind the best performing ARIMA model parameters, we experiment with combinations of the following parameter settings: autoregressiv p e term in {1, 2, 3}, diferencing term d in {0, 1}, and the moving average term q in {0, 1, 2}. Among these variations, we ind the best-performing coniguration p,d ,(q) shown in Table 4 in the validation phase and use them in the inal testing. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 15 6.1.2 VAR Model Instantiation. Similar to the ARIMA model’s instantiation, we also need to ind the best performing VAR( p) model parameters. We experiment autoregressive term p in {1, 2, 3, 4, 5} on the diferencing terms {0, 1, 2}. Among these variations, we ind the best-performing model VAR(1)pwith =1 and diferencing = 2 to provide the lowest average of variables performance in the validation phase and use it in the inal testing. 6.2 Model Training In principle, one can stack a large number of BiLSTM layers to build a deep neural network. For practical computational complexity reasons, we only stack two BiLSTM layers in our text-based models and three BiLSTM layers in our integration models. In experiments, we deine time jwindo as a month w and memory sizeM = 12. In line with the vision of long-term forecasting illustrated in Figure 2, we set the forecasting H = 12. horizon We also setK=5, the median number of pseudo-events per company per month , and they are selected based on the L2 norm values of the pseudo-event embeddings sorted from the largest to the smallest. In all deep learning models, we use Exponential Linear Units (ELUs) (Clevert . 2016) et alas the activation function and employ Adam optimizer (Kingma and Ba 2015). 6.3 Model Evaluation We conduct the model evaluation shown in Figure 11 in the most stringent temporal out-of-sample fashion. Two phases are implemented in the entire process. In the validation phase, we use event data from 2011 to 2014 as training data and predict ratios during the year 2015. In this process, we also tune model hyper-parameters (e.g., number of epochs and learning rate) by validating against the target ratios. Once model parameters are determined based on validation performance, in the inal testing phase, we retrain the same model by using all data from the year 2011 to 2015 and evaluate the inal prediction at the end of 2016. We employ Mean Absolute P ˆ Y −Y 100% N i i Percentage Error (MAPE) MAPE = | | as the loss function and use it as the performance measure. i=1 N Y The inal result is the mean of ten runs of each model coniguration. 7 RESULTS, ANALYSIS, AND DISCUSSION Our model performance is summarized in Table 5. Table 5. Model Performance (MAPE, M=12, H=12) Output ARIMA VAR SR SMR MR SEMax SEA MEAU MEAW SREA SMRE MREA Vari- able sale_nwc 100.55% 2885.92% 25.79% 27.18% 30.11% 57.64% 55.95% 52.10% 52.61% 25.57% 27.36% 32.51% pe_exi 100.28% 1381.58% 50.87% 51.06% 53.46% 96.83% 96.64% 70.84% 69.88% 48.68% 54.15% 54.76% de_ratio 66.92% 246.32% 20.71% 20.73% 22.53% 64.07% 55.44% 51.07% 51.59% 20.65% 20.92% 27.76% cfm 187.38% 205.38% 43.08% 44.37% 45.65% 94.70% 89.10% 74.33% 72.20% 41.51% 45.57% 48.17% npm 163.54% 232.25% 69.59% 71.69% 73.07% 112.55% 95.85% 104.89% 98.67% 65.66% 79.95% 85.31% equity_ 33.15% 77.10% 19.77% 23.13% 24.79% 107.69% 90.76% 77.67% 73.16% 18.80% 23.35% 29.17% invcap cash_ratio 58.65% 144.58% 32.73% 38.33% 42.22% 102.11% 85.33% 90.50% 81.61% 32.45% 38.86% 51.46% 7.1 Time Series Models (ARIMA and VAR) Time series models ARIMA and VAR are baseline models that operate on inancial ratios. While the ARIMA model takes one ratio series at a time (single variable input and single variable output), the VAR model works on multiple ratio series simultaneously (multiple variables input and multiple variables output). Our experiment results show that the ARIMA model performs better than VAR, which indicates the interactions among all variables can ACM Trans. Manag. Inform. Syst. 16 • Zhai and Zhang negatively inluence the individual target variable’s prediction. Therefore, modeling all variables at the same time is not beneicial in our task. Furthermore, considering the higher diferencing value (d=2) in the VAR(1) model, we can also see that making the VAR model stationary is more challenging. Our observations are consistent with the indings of the previous literature (Bagshaw 1986, Litterman 1986). 7.2 Ratio-based Deep Learning Models (SR, SMR, and MR) When comparing ratio-based deep learning models and time series models, we observe that the deep learning models produce much stronger performance than traditional time series models, thanks to their non-linearity and larger memory. The results suggest that when forecasting performance is the top priority, deep learning models should be the preferred tool for numerical time series. Among the SR, SMR, and MR’s model performance, we notice again that to lump sum all variables together in the input is not a good approach, even in the more lexible deep learning architectures. So far, we have observed that traditional time series models are outperformed by their deep learning counter- parts. More importantly, both groups of models lack the capacity to consume textual data. To the latter point, we witness the unique text power from event-based deep learning models, SEMax, SEA, MEAU, and MEAW. 7.3 Event-based Deep Learning Models (SEMax, SEA, MEAU, and MEAW) While event-based deep learning models have no access to numerical ratio data, they show very competitive forecasting performance. They indicate that even if there is no numerical data available, predicting irm inancial ratios from textual data is still feasible. The event-based models’ performance are stronger than traditional time series models in 5 out of 7 target ratios (sale_nwc, pe_exi, de_ratio, cfm, and npm). For the rest two target variablesequity_invcap ( and cash_ratio), ARIMA model produces lower MAPE than event-based deep learning models. This outcome suggests the existence of diferent dynamics underlie diferent inancial Equity_invcap ratios. and cash_ratio are capitalization and liquidity category variables. Capitalization and liquidity ratios are tied to the irm’s internal operations. Therefore, they are more stable, receive less news coverage, and are more amenable to an inherently linear model such as ARIMA. In contrast, the ive outperforming ratios all involve sales and the stock market. They are more volatile, receive more news coverage, and consequently see more success with news event-based non-linear deep learning models. Also, among event-based deep learning models, the SEA model consistently outperforms the SEMax model. It emphasizes the superiority of the attention mechanism in SEA over the relatively naive max-pooling trick in SEMax. Furthermore, the MEAU model (each task-speciic loss gets an equal weight of 1) shares representations among multiple ratios, yields further performance gain over the SEA model on 5 out of 7 target ratios. To further boost the multi-task learning performance, we assign higher weights to the loss of the two łharderž target ratios,npm and cash_ratio. When moderate over-weighting is in place (roughly between 2 and 5), we observe another wave of error reduction on the MEAW model. When comparing performance between the multi-task deep learning models with the VAR model, we notice that in joint modeling of multiple target ratios, news event-induced shared representation is considerably more superior to over-parameterized interactions in the VAR model. 7.4 Ratio-Event Integration Deep Learning Models (SREA, SMRE, and MREA) The SREA model performance is lifted over SR by bringing in the news data. The most beneited target ratios are price to earnings ratio pe_exi ( ), cash low margincfm ( ), and net proit marginnpm ( ), variables happen to be among the winning ones from the event-based deep learning models. It strengthens our observation of the power of news data. On the other hand, the inferiorities of model SMRE and MREA further conirm that lumping all ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 17 ratio series together as model input is not a good idea, in the same ways it is not a good idea in VAR and MR. Overall, SREA is the winning model across all model families by combining news data with individual ratio series. 7.5 Model Findings We summarize our major indings below: • When working with numerical series alone, deep learning models outperform traditional time series models. • When numerical data is not available, event-based deep learning models leverage attention mechanism and multi-task learning framework and achieve competitive forecasting performance. • When numerical and event data are available, ratio-event integration models perform the best in the single-task framework. • Across all model families, uni-variate models outperform multi-variate models. Multi-task learning is only beneicial on event-based deep learning models considering the powerful shared representations induced from textual data. 8 MODEL EXPLAINABILITY INSIGHTS Deep learning models are often criticized as being big black boxes. While such criticism is not entirely unreasonable considering the nonparametric nature of deep learning models, this section represents our efort to open the boxes and gain insights into their inner workings. More speciically, we extract pseudo-event attention weights and irm embeddings at the testing phase, visualize and analyze them. To look for the purity of each target variable and focus on how news articles perform on one ratio at a time, we use the SEA model to derive the interpretability insights. 8.1 Event Atention Map The introduced attention mechanism provides an excellent form of model interpretability, in the sense that the normalized attention scores quantify the relative importance of the pseudo-events within each time window. We extract these scores from the corresponding pseudo-events in the Pseudo-event Attention layer of the proposed models. In this example, we work with the SEA model for ratio sale_nwc and focus on the company Apple Inc. A much more detailed example on Expedia is presented in Appendix F. We present the overview of pseudo-events attention map in Figure 12, and zoom into the speciics in Table 6 and Table 7. In Table 6, the most-attended pseudo-event is #4, which discusses Apple Inc.’s patent iling two years ago. Intuitively, the event likely creates a positive long-term impact on the irm’s inancial well-being. The model correctly assigns it a higher weight. Table 7 suggests a very interesting mistake made by the model. Pseudo-event #1 was deemed relevant to Apple Inc. by the event extraction algorithm due to company name matching and assigned higher weights by the attention mechanism due to semantic matching (it is undoubtedly sales-related). The only problem is that the sentence actually talks about Samsung (at the time, it was a supplier of chips used in Apple devices), the correct understanding of which can only arise from a broader context and requires NLP capabilities arguably beyond state-of-the-art (e.g., anaphora resolution across sentence boundaries) models. 8.2 Firm Embeddings Another useful insight of our model is irm embeddings. Technically, each irm embedding is a semantic represen- tation of the irm learned from the model. Understandings of these embeddings are usually derived from irms’ relative positioning in a high-dimensional space, typically visualized in a 2-D space. The notion of proximity can intuitively guide peer benchmarking for internal management and portfolio-building for outside investors. ACM Trans. Manag. Inform. Syst. 18 • Zhai and Zhang Fig. 12. Events Atention Map Visualization Example: Apple Inc. Overview Table 6. Events Atention Map Example: sale_nwc, Apple Inc. Month 1 Event Attention Text weight 1 0.17 Apple’s iPhone and iPad have won over users in recent years. 2 0.18 Apple was third with 10.6 percent. 3 0.17 Apple, the largest company by market value, reports results. 4 0.27 Cupertino, California-based Apple iled the patent application in June 2013. 5 0.20 Apple jumped 5.7 percent. Table 7. Events Atention Map Visualization Example: Apple Inc. Month 11 Event Attention Text weight 1 0.29 In addition to sales from its own handsets, it gains revenue from each iPhone sold because its chip unit manufactures the main pro- cessor used in Apple’s phones and tablets. 2 0.21 These stock grants are meant to reward them down the road for their hard work in helping to keep Apple the most innovative company in the world. 3 0.25 The company sought an order for Apple to produce the documents. 4 0.12 Apple, ..., added 2.5 percent to $388.83. 5 0.12 Apple, meanwhile, opened its iTunes store in 2003. A traditional way to understand irm relationships is through industry segmentation. Therefore, we refer to Fama-French 5-industry portfolios (Fama and French 2019) in Table 8 for irm’s industry. Since we replaced all company names with the special token ‘FOCOMP’, our model generates irm embeddings without knowing the company and certainly not knowing any industry membership. We extract each irm ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 19 Table 8. Fama-French 5-Industry Portfolios Industry Portfolio Consumer Consumer Durables, NonDurables, Wholesale, Retail, and Some Services Manufacturing Manufacturing, Energy, and Utilities HiTech Business Equipment, Telephone and Television Transmission Healthcare Healthcare, Medical Equipment, and Drugs Others Mines, Construction, Building Maintenance, Transportation, Hotels, Business Services, Entertainment, and Finance embedding by capturing the activations at the inal Dense layer and use t-SNE (van der Maaten and Hinton 2008) to visualize the irm embeddings in a 2-D space. We demonstrate irm embeddings encoded by two diferent ratios from the SEA model to relect the power of text. The clustering phenomena of Figure 13 and Figure 14 show the irm embeddings trained from texts in a non-trivial manner. The irm embeddings are trained from the textual information in the news that is relevant to the targeted ratio. The sentence weights are also assigned during the model training. The clustering behavior is a product of model training and demonstrating what has been learned from the training process. In addition, as indicated in Figure 14, the target ratio values of similar companies are not necessarily in a narrow range. 8.2.1 Sale_nwc-encoded firm embeddings. The irm embeddings encoded by sale_nwc are shown in Figure 13. We have two main observations. First, we observe that the overall irm embedding space agrees with the Fama-French industry segmentation quite well. Most consumer companies (in blue) are clustered in the top portion of the graph: one big cluster is on the right upper corner (region 1), and a smaller crowd inside and is on the mid-upper region (region 2). The healthcare industry (in gold) and the HiTech industry (in green) are clustered on the left bottom corner (region 3). The manufacturing companies are scattered across the space. Second, when there is a disagreement, the irm embeddings learned from deep learning structures can pick up interesting semantics. For example, although Netlix (in purple) is Fama-French classiied ‘Others’ category, its irm embedding is situated very close to Walt Disney and CBS (in green) in region 4, likely due to their similar product oferings. On the other hand, though oicially classiied in the HiTech industry, Walt Disney and CBS are located much closer to companies in the consumer industry than typical HiTech companies (e.g., Google and Microsoft in region 3). Another example we observe is Nike (in region 2), which, though oicially belongs to the manufacturing industry, falls into the consumer industry’s neighborhood. Practically, its łpersonaž in public perceptions does appear to be more like Macy’s and Foot Locker than Caterpillar. 8.2.2 De_ratio-encoded firm embeddings. We visualize the irm embeddings encodede_ratio d by in Figure 14 (debt ratio values are shown in the igure when applicable), and have the following observations: • Retail companies (region 3), such as Wal-Mart, Costco, Best Buy, the Home Depot, and Target, have similar debt ratios, and they are admittedly recognized closely by our irm embedding. • The solvency situation of HiTech companies (region 2) is relatively similar. Hence, they are clustered together, with IBM being an exception (our models did not succeed in forecasting its relatively high debt ratio). • Berkshire Hathaway (region 1), surprisingly, is quite far away from other inancial institutions. Does it make sense? As it turns out, its debt ratio (1.18) is much lower than companies such as Citigroup (7.44), JPMorgan Chase (9.49), and Morgan Stanley (10.12). The reason is Berkshire Hathaway is known for its conservative debt use, as its chairman and CEO Warren Bufett believes in running businesses with as little debt as possible. Our model picks this semantic up from texts precisely. ACM Trans. Manag. Inform. Syst. 20 • Zhai and Zhang To further examine the model behavior and comparison, we use the dense layer before the prediction layer to produce the plots for each target variable from the ratio-only model SR. The results show that ratio-only models do not provide interpretable and meaningful irm clusters as the texts-only SEA models do. The reason is that the irm embedding from the texts-only model and their corresponding clustering behaviors emerge from texts. According to the targeted ratio, the applicable information of a irm in the text makes its relative location. For example, the clustering behavior in Figure 13 has emerged from textual information related to sales and net working capital, and Figure 14 has emerged from texts relevant to debt. The above examples show that our news event-powered deep learning models learned sophisticated irm semantic representations much better than the hard industry segmentation, which otherwise requires non-trivial knowledge of irms and industries. Meanwhile, the non-trivial clusterings show the value of texts and the interpretable insights of each variable provided by the text models. Such interpretable and dynamic clustering phenomena will not exist when having ratios alone. 8.3 Case Study: Expedia Group Inc. In this example, we work with two target ratioscfm - (cash low margin) andnpm (net proit margin) to show their event attention maps. In addition, we demonstrate how the event attention maps can serve as sense-making tools to facilitate concept, impact factor, and hypothesis generation, in Figure 17. 8.4 Practical Relevance After looking at the interpretation of the model results, although it is not causal, we observe that our proposed text-based model provides better-predicted results when the predicted ratio is outside of the historical values range . In other words, if a irm experiences large swing in certain business aspects, such as sales, earnings, or operation, using text data will help the decision-maker make more accurate predictions. Meanwhile, while our model can be universally used on all companies, it is beneicial for industries with long-term business cycles, such as the health industry. We also observe that companies with higher market value tend to have more media coverage, and continuous media coverage helps the model extract useful information consistently. In addition, the high attention sentences learned from the text model can help decision-makers distinguish higher relevant information automatically and efectively in a timely manner. The customized irm clusters generated from our irm embedding will quickly help decision-makers grasp not obvious and non-trivial company similarities in a graph representation. The graphs are user-friendly and intuitive. Each igure is customized for each ratio. With the high emphasis on individual ratios, these graphs will support decision-makers to make clear interpretations and better goal-oriented decisions. Moreover, in the case study of Expedia Group Inc., we provided the sentence weights on diferent ratios for comparison. The business insights ofered can help decision-makers automatically identify highly relevant sentences and information on a broad spectrum of variables. We provide the case study details in Appendix F. We provide supporting details in Appendix G. We provide more details in Appendix H and Appendix I. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 21 ACM Trans. Manag. Inform. Syst. Fig. 13. Firm Embedding Encoded by sale_nwc 22 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. Fig. 14. Firm Embedding Encoded by de_ratio Read the News: Not the Books • 23 Finally, the abstract łtheory-alikež interpretation derived from high attention sentences, shown in Figure 17, can help decision-makers make logical references on various events and business aspects. As a result, the decision-makers can use the various tools provided in this study to support their decision-making under multiple and diferent business scenarios in reality. 9 CONCLUSIONS This paper shows the efectiveness of deep text mining models in forecasting irms’ long-term inancial per- formance, driven by irm-speciic pseudo-events. Overall, the news event-powered deep learning models yield impressively competitive forecasting performance compared to standard time series models based on historical accounting data. We consistently demonstrate the power of text in multiple model families, whether accounting data is present or absent. In addition, we ofer two model insights, events attention map and irm embedding, toward model interpretation, transparency, and accountability, especially when numerical data is not available. Our work has several considerable implications for irm stakeholders, the inancial industry, and the research community. Internally, the forecasting models and their artifacts provide decision support for irm executives when performing various tasks, such as strategic planning, inancial forecasting, and peer benchmarking. Externally, institutional investors can beneit from the models’ predictive outcome and explanatory insights. Such enhanced transparency and accountability are even more critical when dealing with private irms with no duty to disclose their accounting books. Main Street investors usually are not equipped or inclined to conduct sophisticated quantitative analysis based on (arguably clean) numerical accounting data. Our models and artifacts provide previously non-existent, interpretable decision without aid compromising the prediction quality. Third-party regulators and agencies represent another family of beneiciaries. For example, S&P Global Ratings’ analysis can leverage the enriched information channel and interplay with multi-faceted inancial pictures, both of which are essential components in our framework. On the methodological front, our problem formulation and model architecture are problem independent. Therefore, they have general applicability in solving a broad spectrum of economic forecasting problems. Finally, artifacts derived from our models can help members of the business research community, in particular theory-minded researchers, generate new concepts and hypotheses, and even conduct preliminary theoretical explorations. Although we present an early attempt at bringing large-scale textual data and state-of-the-art deep learning models into inancial forecasting, our work is not without its limitations. Several directions can be extended. First, the success of our models heavily relies on the quality of input. Room for improvement exists in event extraction and event encoding techniques. Second, other types of non-numerical data can participate in the modeling process, e.g., textual portions of SEC ilings, sell-side reports, etc. Third, multi-task learning potential is yet to be fully realized by designing more educated information-sharing model architectures. Fourth, the generalizability of our model architecture will need to be fully explored in broader economic forecasting settings. One inal note: this research aims not to establish a horse race and declare the winner between econometric models and AI-based models. While ARIMA or VAR is certainly not the former’s apex, our deep learning-based models also have great potential for improvement. We strive for, and what we hope to have achieved, to illustrate the empirical power of textual data in economic forecasting and demonstrate corresponding modeling capabilities. In practice, sophisticated researchers and agencies will likely exploit both families of models. The linear form of ARIMA models and their coeicients are deemed to be interpretable by many researchers, but not necessarily the best genre of interpretability for mathematically unsophisticated laypeople. Though we do not intend to engage in a philosophical debate around model interpretability, we believe our model artifacts provide alternative avenues of this important notion. ACM Trans. Manag. Inform. Syst. 24 • Zhai and Zhang REFERENCES Ahmed Abbasi, Jingjing Li, Donald Adjeroh, Marie Abate, and Wanhong Zheng. 2019. Don’t Mention It? Analyzing User- Generated Content Signals for Early Adverse Event Warnings. Information Systems Research30, 3 (2019), 1007ś1028. Alan S Abrahams, Weiguo Fan, G Alan Wang, Zhongju Zhang, and Jian Jiao. 2015. An integrated text analytic framework for product defect discoveryPr . oduction and Operations Management24, 6 (2015), 975ś990. Ronald W Anderson and Malika Hamadi. 2016. Cash holding and control-oriented inance Journal. of Corporate Finance41 (2016), 410ś425. Werner Antweiler and Murray Z Frank. 2004. Is all that talk just noise? The information content of internet stock message boards. The Journal of inance59, 3 (2004), 1259ś1294. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. Proceedings of the International Conference on Learning Representations (2017). Michael Bagshaw. 1986. Comparison of univariate ARIMA, multivariate ARIMA and vector autoregression for Federal ecasting. Reserve Bank of Cleveland Working Papers WP 86-02 (1986). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate.Proceedings of the International Conference on Learning Representations (2015). Gustaf Bellstam, Sanjai Bhagat, and J Anthony Cookson. 2017. Innovation in Mature Firms: A Text-Based Analysis. SSRN (2017). Maria Elena Bontempi, Laura Bottazzi, and Roberto Golinelli. 2020. A multilevel index of heterogeneous short-term and long-term debt dynamics.Journal of Corporate Finance64 (2020), 101666. Rich Caruana. 1997. Multitask Learning. Mach. Learn. 28, 1 (July 1997), 41ś75. https://doi.org/10.1023/A:1007379606734 Changling Chen, Jeong-Bon Kim, and Li Yao. 2017. Earnings smoothing: Does it exacerbate or constrain stock price crash risk?Journal of Corporate Finance42 (2017), 36ś54. Chen-Huei Chou, Atish P Sinha, and Huimin Zhao. 2010. A hybrid attribute selection approach for text classiication. Journal of the Association for Information Systems 11, 9 (2010), 1. Jonathan Clarke, Hailiang Chen, Ding Du, and Yu Jefrey Hu. 2020. Fake news, investor attention, and market reaction. Information Systems Research(2020). Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and accurate deep network learning by exponential linear units (ELUs). Proceedings of International Conference on Learning Representations (2016). Alex Coad. 2010. Exploring the processes of irm growth: evidence from a vector auto-regression. Industrial and Corporate Change 19, 6 (2010), 1677ś1703. Sanjiv R Das and Mike Y Chen. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the Management web. science53, 9 (2007), 1375ś1388. Deloitte. 2017. Report: Fintech by the numbers. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. Twenty-fourth In international joint conference on artiicial intelligence . Ranjan D’Mello, Mark Gruskin, and Manoj Kulchania. 2018. Shareholders valuation of long-term debt and decline in irms’ leverage ratioJournal . of Corporate Finance48 (2018), 352ś374. Hamdi Driss, Wolfgang Drobetz, Sadok El Ghoul, and Omrane Guedhami. 2021. Institutional investment horizons, corporate governance, and credit ratings: International evidence Journal . of Corporate Finance67 (2021), 101874. Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-Based Dependency Parsing with Stack Long Short-Term Memory. InProceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Pap . Asso ers)ciation for Computational Linguistics, Beijing, China, 334ś343. http://www.aclweb.org/anthology/P15-1033 ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 25 Robert O Edmister. 1972. An empirical test of inancial ratio analysis for small business failur Journal e preof diction. Financial and Quantitative analysis 7, 2 (1972), 1477ś1493. Joseph Engelberg, R David McLean, and Jefrey Pontif. 2018. Anomalies and neThe ws.Journal of Finance73, 5 (2018), 1971ś2001. Eugene F. Fama and Kenneth R. French. 2019. Detail for 5 Industry Portfolios. http://mba.tuck.dartmouth.edu/pages/faculty/ ken.french/Data_Library/det_5_ind_port.html. Lily Fang and Joel Peress. 2009. Media coverage and the cross-section of stock returns. The Journal of Finance64, 5 (2009), 2023ś2052. Murray Z Frank and Tao Shen. 2016. Investment and the weighted average cost of capital. Journal of Financial Economics 119, 2 (2016), 300ś315. Diego Garcia. 2013. Sentiment during recessions. The Journal of Finance68, 3 (2013), 1267ś1300. Matthew Gentzkow, Bryan T Kelly, and Matt Taddy. 2019. Text as Data.Journal of Economic Literatur(2019). e Alex Graves, Abdel-rahman Mohamed, and Geofrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing . IEEE, 6645ś6649. Richard Gruss, Alan S Abrahams, Weiguo Fan, and G Alan Wang. 2018. By the numbers: The magic of numerical intelligence in text analytic systems. Decision Support Systems113 (2018), 86ś98. Jarrad Harford, Ambrus Kecskés, and Sattar Mansi. 2018. Do long-term investors improve corporate decision making? Journal of Corporate Finance50 (2018), 424ś452. Paul Hemp. 2009. Death by information overload. Harvard business review87, 9 (2009), 82ś9. Gerard Hoberg and Gordon Phillips. 2018. Conglomerate industry choice and product language Management . Science64, 8 (2018), 3735ś3755. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memorNeural y. computation9, 8 (1997), 1735ś1780. Tadaaki Hosaka. 2019. Bankruptcy prediction using imaged inancial ratios and convolutional neural Expnetw ert systems orks. with applications 117 (2019), 287ś299. Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. ProIn ceedings of the eleventh ACM international conference on web search and data mining . 261ś269. Charles Kang, Frank Germann, and Rajdeep Grewal. 2016. Washing away your sins? Corporate social responsibility, corporate social irresponsibility, and irm performance Journal.of Marketing80, 2 (2016), 59ś79. Hyung Cheol Kang, Robert M Anderson, Kyong Shik Eom, and Sang Koo Kang. 2017. Controlling shareholders’ value, long-run irm value and short-term performanceJournal . of Corporate Finance43 (2017), 340ś353. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations(2015). Sandy Klasa, Hernan Ortiz-Molina, Matthew Serling, and Shweta Srinivasan. 2018. Protection of trade secrets and capital structure decisions. Journal of Financial Economics 128, 2 (2018), 266ś286. Yann LeCun, Yoshua Bengio, and Geofrey Hinton. 2015. Deep learning. Nature 521, 7553 (27 5 2015), 436ś444. https: //doi.org/10.1038/nature14539 Kai Li, Feng Mai, Rui Shen, and Xinyan Yan. 2018. Measuring Corporate Culture Using MachineSSRN Learning. (2018). Xiao-Bai Li and Jialun Qin. 2017. Anonymizing and sharing medicalInformation text records.Systems Research28, 2 (2017), 332ś352. Fengyi Lin, Deron Liang, and Enchia Chen. 2011. Financial ratio selection for business crisis Expertprsystems ediction. with applications 38, 12 (2011), 15094ś15102. Walter Lippmann. 1946.Public opinion . Vol. 1. Transaction Publishers. Robert B Litterman. 1986. A statistical approach to economic forecasting. Journal of Business & Economic Statistics 4, 1 (1986), 1ś4. ACM Trans. Manag. Inform. Syst. 26 • Zhai and Zhang Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. 2020. Finbert: A pre-trained inancial language represen- tation model for inancial text mining. ProceeIn dings of the Twenty-Ninth International Joint Conference on Artiicial Intelligence, IJCAI . 5ś10. Tim Loughran and Bill McDonald. 2016. Textual analysis in accounting and inance:Journal A surveyof. Accounting Research 54, 4 (2016), 1187ś1230. Rui Luo, Weinan Zhang, Xiaojun Xu, and Jun Wang. 2018. A Neural Stochastic Volatility Mo Prodel. ceedings In of the Thirty-Second AAAI Conference on Artiicial Intelligence . Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Efective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Lisbon, Portugal, 1412ś1421. https://doi.org/10.18653/v1/D15-1166 Ronny Luss and Alexandre Aspremont. 2015. Predicting abnormal returns from news using text classiication. Quantitative Finance15, 6 (2015), 999ś1012. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jefrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality Pro.ce Inedings of the 26th International Conference on Neural Information Processing Systems (Lake Tahoe, Nevada). 3111ś3119. Marc-Andre Mittermayer and Gerhard F Knolmayer. 2006. Newscats: A news categorization and trading system. Sixth In International Conference on Data Mining . Ieee, 1002ś1007. Makoto Miwa and Mohit Bansal. 2016. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long.Papers) Association for Computational Linguistics, Berlin, Germany, 1105ś1116. http://www.aclweb.org/anthology/P16-1105 Pisut Oncharoen and Peerapon Vateekul. 2018. Deep learning using risk-reward function for stock market prediction. In Proceedings of the 2018 2nd International Conference on Computer Science and Artiicial Intelligence . 556ś561. Alan Pankratz. 1983.Forecasting with univariate Box-Jenkins models: Concepts and cases . John Wiley & Sons. Gautam Pant and Olivia RL Sheng. 2015. Web footprints of irms: Using online isomorphism for competitor identiication. Information Systems Research26, 1 (2015), 188ś209. Jefrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) . 1532ś1543. David T Robinson and Berk A Sensoy. 2016. Cyclicality, performance measurement, and cash low liquidity in private equity. Journal of Financial Economics 122, 3 (2016), 521ś543. Bryan R Routledge, Stefano Sacchetto, and Noah A Smith. November, 2016. Predicting merger targets and acquirers from text. Working Paper(November, 2016). David E. Rumelhart, Geof E. Hinton, and R. J. Wilson. 1986. Learning representations by back-propagating Natur errors. e 323 (1986), 533ś536. Robert P Schumaker and Hsinchun Chen. 2009. Textual analysis of stock market prediction using breaking inancial news: The AZFin text system.ACM Transactions on Information Systems (TOIS)27, 2 (2009), 12. Paul C Tetlock. 2007. Giving content to investor sentiment: The role of media in the stock Themarket. Journal of inance62, 3 (2007), 1139ś1168. Paul C Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure irms’ fundamentals.The Journal of Finance63, 3 (2008), 1437ś1467. Laurens van der Maaten and Geofrey Hinton. 2008. Visualizing Data using Journal t-SNE. of Machine Learning Research 9 (2008), 2579ś2605. Manuel R Vargas, Carlos EM dos Anjos, Gustavo LG Bichara, and Alexandre G Evsukof. 2018. Deep leaming for stock market prediction using technical indicators and inancial news2018 articles. International In Joint Conference on Neural Networks (IJCNN). IEEE, 1ś8. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.AIn dvances in Neural Information Processing Systems . 5998ś6008. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 27 Di Wang and Eric Nyberg. 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) . Association for Computational Linguistics, Beijing, China, 707ś712. http://www.aclweb.org/anthology/P15-2116 William Yang Wang and Zhenhao Hua. 2014. A Semiparametric Gaussian Copula Regression Model for Predicting Financial Risks from Earnings Calls.. Asso In ciation for Computational Linguistics . 1155ś1165. Eric Weisbrod. 2018. Stockholders’ Unrealized Returns and the Market Reaction to Financial Disclosur The Journal es. of Finance(2018). Yumo Xu and Shay B Cohen. 2018. Stock Movement Prediction from Tweets and Historical Prices. Association In for Computational Linguistics , Vol. 1. 1970ś1979. Emin Zeytinoglu and Yasemin Deniz Akarim. 2013. Financial failure prediction using inancial ratios: An empirical application on Istanbul Stock Exchange.Journal of Applied Finance and Banking 3, 3 (2013), 107. Shihao Zhou, Zhilei Qiao, Qianzhou Du, G Alan Wang, Weiguo Fan, and Xiangbin Yan. 2018. Measuring customer agility from online reviews using big data text analytics. Journal of Management Information Systems35, 2 (2018), 510ś539. ACM Trans. Manag. Inform. Syst. 28 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. A STOCK MARKET VARIABLES AS DEPENDENT VARIABLE Table 9. Stock Market: Theory Testing Literature Dependent Variable Input Primary Independent Variables Methodology Citation Genre Stock return public news anomaly variables, information-day dummy linear regression (Engelberg et al . variables 2018) Stock return, market volume public news news sentiment vector auto-regressive (VAR) model (Tetlock 2007) Earning, stock return public news news sentiment linear regression (Tetlock et al . 2008) Stock return public news media coverage CAPM, F-F 3-factor model, Carhart 4-factor (Fang and Peress model, Pastor-Stambaugh liquidity model2009) Stock return public news news sentiment (fraction of positive and neg- time series regression, GARCH model (Garcia 2013) ative words) Abnormal trading volume, ab-earning an- capital gain overhang, unexpected earnings, Cox proportional hazard rate model, Fama-(Weisbrod 2018) normal return nouncement idiosyncratic return volatility French-momentum 4-factor model Stock price, return, volatilitymessage message sentiment, disagreement, message classiier ensembles, linear regression (Das and Chen board volume, trading volume 2007) Trading volume, volatility message number of messages, message sentiment and naive bayes classiier, linear regression (Antweiler and board the agreement index Frank 2004) Read the News: Not the Books • 29 ACM Trans. Manag. Inform. Syst. Table 10. Stock Market: Prediction-related Literature Output Variable Input Input Variables Method Citation Genre stock price, price movement, re-public news bag-of-words features, noun phrases, named support vector machine (SVM) (Schumaker and turn entities Chen 2009) stock price movement public news event extraction, event embedding convolutional neural network (CNN) (Ding et al. 2015) stock price movement public news word embedding gated recurrent units (GRU) (Hu et al. 2018) stock price movement press release bag-of-words features Rocchio, k Nearest Neighbors (kNN), linear (Mittermayer and SVM, non-linear SVM Knolmayer 2006) stock price movement press release bag-of-words features support vector machine (SVM) (Luss and Aspre- mont 2015) stock price movement news head- word embedding, historical price convolutional neural network (CNN) and(Oncharoen and line long short-term memory network (LSTM) Vateekul 2018) stock price movement news title word embedding, technical indicators convolutional neural network (CNN) and(Vargas et al. long short-term memory network (LSTM) 2018) stock price volatility earnings unigrams, bigrams, named entity, part-of- semiparametric Gaussian copula regression (Wang and Hua calls speech features 2014) stock price movement social media word embedding generative recurrent neural network (RNN) (Xu and Cohen 2018) stock price volatility time series historical stock price generative neural stochastic model (Luo et al. 2018) 30 • Zhai and Zhang ACM Trans. Manag. Inform. Syst. B FIRM FINANCIAL RATIO AS DEPENDENT VARIABLE Table 11. Firm Financial Ratios: Theory Testing Literature Dependent Variable Input For- Primary Independent Variables Methodology Citation mat Net book leverage, net market numerical recognition of Inevitable Disclosure Doctrine diference-in-diference, diference-in- (Klasa et al. 2018) leverage (IDD), proit margin diference-in-diferences, linear probability model Net cash low as percentage numerical price to dividend ratio , yield spread linear regression (Robinson and of committed capital, net cash Sensoy 2016) low Cash holding , Tobin’s Q numerical capital expenditures, working capital, cashlinear regression (Anderson and low, debt Hamadi 2016) Cost of equity numerical cash low to capital ratio linear regression, CAPM, F-F 3-factor model, (Frank and Shen Carhart 4-factor model 2016) Tobin’s QEBI , TA numerical cash low, R&D expenses, capital expenditure, linear regression (Kang et al. 2017) debt to asset ratio Daily stock return numerical earningsmoothing Jones model, linear regression (Chen et al. 2017) Read the News: Not the Books • 31 C AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODEL (ARIMA) ARIMA model has three components, an autoregression (AR) component on the variable itself, a diferencing (I) component if the time series is not stationary, and a moving average (MA) component on error terms. In ARIMA(p,d ,q) model,p denotes the order of the autoregression (AR), d denotes the order of diferencing (I), and q denotes the order of the moving average (MA). In Equations 6 andL7,is the lag operator, and ϵ is a white noise process. p q X X i d i (1− ϕ L ) (1− L) Y = (1− θ L )ϵ (6) i t i t i=1 i=1 ϵ ∼ N (0, σ ) (7) If we use Box-Jenkins backshift operator, Equation 6 can be written as p d q (1− B) (1− B) Y = (1− B) ϵ (8) t t ϵ ∼ N (0, σ ) p 2 p q 2 p , where B is the backshift operator (1− , B) = 1−ϕ B−ϕ B −···−ϕ B , and (1−B) = 1−θ B−θ B −···−θ B . 1 2 p 1 2 p Equation 6 and Equation 8 can be also written as ϕ (B)∇ Y = θ (B)ϵ (9) p t q t ϵ ∼ N (0, σ ) or ϕ (B)∇ Y = θ (B)ϵ (10) t t ϵ ∼ N (0, σ ) , where ∇ is the diference operator ϕ (,B) = ϕ (B) = 1− ϕ B is thep-order AR operator, θ (B) = θ (B) = p i j i=1 1− θ B is theq-order MA operator, and ϵ is a white noise process. j t j=1 D VECTOR AUTOREGRESSION MODEL (VAR) The Vector Autoregression Model (VAR) model is a multi-variate time series model that describes the inter- dependency between the participating time series when they are inluencing each other. Each time series can be described as its lag values, other time series’ lag values, and an error term. Lag values are represented by autoregression (AR) terms. For example, the VAR model was used to identify the relationship among irm employment growth, sales growth, and proits growth (Coad 2010), and relationship among corporate social responsibility, corporate social irresponsibility, and irm performance (Kang et al. 2016). In VAR model,p denotes the order of the autoregression, and k denotes the number of variables. We can use Equation 11 to illustrate their interactions. Y = A Y + A Y + ... + A Y + c + ϵ (11) t 1 t−1 2 t−2 p t−p t , where A is a time-invariant k x k matrix,c is a constant, andϵ is an error term which satisfyingEthat (ϵ ) = 0 t t and no correlation across times. If we usek = 2 and p = 1 as an example, the VAR (1) model can be represented as Y = a Y + a Y + c + ϵ (12) 1,t 1,1 1,t−1 1,2 2,t−1 1 1,t Y = a Y + a Y + c + ϵ (13) 2,t 2,1 1,t−1 2,2 2,t−1 2 2,t ACM Trans. Manag. Inform. Syst. 32 • Zhai and Zhang E LONG SHORT-TERM MEMORY NETWORKS (LSTMS) Each LSTM cell has an input gate i , an output gate o , and a forget gatef , at each time stept. And output vector t t t of cell at time t, h , is based on the previous cell state h , the current state inputx , and the current cell statec . t t−1 t t By including the sigmoid activation function σ and hyperbolic tangent activation function σ , we can write the д c LSTM with forget gate as: f = σ (W x + U h + b ) (14) t д f t f t−1 f i = σ (W x + U h + b ) (15) t д i t i t−1 i o = σ (W x + U h + b ) (16) t д o t o t−1 o c = f ⊙ c + i ⊙ σ (W x + U h + b ) (17) t t t−1 t c c t c t−1 c h = o ⊙ σ (c ) (18) t t h t , where W , W , W , U , U , and U are weight matrices, and b , b , and b are biases. i o i o i o f f f F CASE STUDY: EXPEDIA GROUP INC. In this example, we work with two target ratios cfm-(cash low margin) andnpm (net proit margin), and hence examine two SEA models in parallel. We choose these two ratios in this case study because they share some common impact factors, and we’d like to see if and how our event attention model captures the semantic subtlety. Figure 15 and Figure 16 provide the bird’s-eye view of overall attention maps. Fig. 15. Events Atention Visualization Example: Expedia Encoded by cfm Table 12 shows pseudo-events in year 2015 month 1. We can see pseudo-event #2 and #3 got greater attention from the model. After looking at these two sentences carefully, we found they focused on the company’s internal activities that lead to a change in cash low and proit. For example, event #2 is talking about Expedia expanded its business via a partnership with Travelocity. Expedia’s responsibility is to provide customer services for Travelocity and support Travelocity’s website under the partnership. Both responsibilities imply more revenue hence more cash low and net proit for Expedia. Let’s move to another high-attention pseudo-event #3. We can see this sentence is talking about how Expedia updates its strategies, i.e., bringing down room prices and taking commissions from hotels, to generate more revenues. It means our event attention model pays closer attention to the pseudo-events that are talking about the irm’s proit generation (1a ) activities and internal cost control (1b) activities. Consistent with human intuition, those activities have larger impacts when forecasting a irm’s future cash low margin and net proit margin. Code used in Figure 17. To be discussed later. ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 33 Fig. 16. Events Atention Visualization Example: Expedia Encoded by npm Table 12. Events Atention Visualization Example: Expedia Month 1 Event cfm at- npm at- Text tention tention weight weight 1 0.18 0.19 The company, which develops software for irms such as Barclays Plc and Expedia, is headquartered in the U.S. and has about a third of its 11,000 employees in Russia and Ukraine, ... 2 0.24 0.20 Under that partnership, Expedia provided customer service for Travelocity and sup- ported its websites in the U.S. and Canada. (1a) 3 0.23 0.24 Online travel agents and intermediaries such as Priceline Group and Expedia are push- ing down room prices and taking commissions from hotels, while enabling travelers to organize trips individually, poaching customers from tour operators. (1a, 1b) 4 0.19 0.19 At least 30 people will be on hand to lead workshops and provide advice, including Di-Ann Eisnor, ...; John Malloy, ...; Jerry Engel, ...; and Sam Friend, ..., now owned by Expedia. 5 0.16 0.19 You will now receive the Game Plan newsletter Last year, Google licensed hotel-booking software from Room 77 , a startup backed by Expedia. In Table 13, we can see pseudo-event #3 talks about an acquisition activity (1c) of Expedia. Merger and acquisition activities will impact on company’s cash low; therefore, this pseudo-event got greater attention. In event #4, the pseudo-event talks about an Expedia strategy: displaying travel agents by packaging and selling discounted airfares and hotel rooms. This strategy is likely to increase company sales and lead to higher cash low and proits. Table 13. Events Atention Visualization Example: Expedia Month 2 Event cfm at- npm at- Text tention tention weight weight 1 0.20 0.18 Expedia intends to keep the Orbitz brand intact, Khosrowshahi said on the call. 2 0.20 0.14 Today, Expedia landed one for its shareholders. 3 0.23 0.27 TripAdvisor surged 24 percent after Expedia agreed to acquire Orbitz Worldwide. (1c) 4 0.25 0.28 The online booking company Expedia displaced travel agents by packaging and selling discounted airfares and hotel rooms. (1a, 1b) 5 0.11 0.13 Tarran Vaillancourt, an Expedia spokeswoman, declined to comment. ACM Trans. Manag. Inform. Syst. 34 • Zhai and Zhang In Table 14, we can see several external impact factors mentioned in month 4. Pseudo-event #1 talks about Google’s search engine impact on Expedia and its competitors (2b1, 2b2). More visibility in searched results on Google will likely generate more sales for Expedia. Pseudo-event #5 talks about a newcomer, the world’s biggest online retailer, is trying to compete with Expedia to gain some market shares from Expedia. This pseudo-event implies the impact is not only applied to Expedia itself but also the entire industry. Table 14. Events Atention Visualization Example: Expedia Month 4 Event cfm at- npm at- Text tention tention weight weight 1 0.26 0.26 Microsoft, Expedia, publishers and others have asked the EU to examine complaints that Google favors its own services over competitors and hinders specialized search engines that compete with it. (2b1, 2b2) 2 0.18 0.21 Expedia is joining U.S. issuers from Coca-Cola to Berkshire Hathaway, which sold notes in the shared currency this year as European Central Bank stimulus drives down funding costs in the region. 3 0.20 0.20 With Google commanding almost all of the search market in some European countries, critics including Microsoft and Expedia are fed up with the company, which they say highlights its own Web services in query results at the expense of rivals. 4 0.10 0.12 Expedia climbed after quarterly revenue exceeded estimates. 5 0.25 0.22 The world’s biggest online retailer by revenue will be competing with Priceline Group, Expedia, startup Airbnb and others for a piece of the online hotel booking market. (2b1, 2b2) Table 15 shows that the models corresponding to diferent target variables, cfm and npm, have diferent attention weights on diferent pseudo-events. Pseudo-event #1 talks about another company that bought a large amount of Elong’s stock shares from Expedia. This market activity is likely to have a larger impact on Expedia’s cash low. Pseudo-event #2 talks about Goldman’s announcement about Expedia. Pseudo-event #4 talks about Expedia’s stock share sell in the biotechnology industry and small-caps. Output variable cfm focus more on pseudo-event #1 and #2 while npm focus more on pseudo-event #2 and #4. All pseudo-event #1, #2, and #4 are external impacts (2c) to Expedia, and they are all events in the stock market. Table 15. Events Atention Visualization Example: Expedia Month 5 Event cfm at- npm at- Text tention tention weight weight 1 0.20 0.15 Ctrip, which operates the country’s biggest travel-booking website, bought the Elong stake from U.S.-based Expedia , becoming its biggest shareholder. (2c) 2 0.30 0.27 Goldman talks about hereinclude Caterpillar, Coca Cola, Phillips 66, United Technolo- gies, Automatic Data Processing, and Expedia. (2c) 3 0.17 0.21 Stocks rose, with the Standard & Poor’s 500 Index paring a weekly loss, as Gilead Sciences and Expedia rallied after Thursday’s sellof in biotechnology and small-cap shares. (2c) 4 0.16 0.22 Stocks pared a weekly loss on Friday, as Gilead Sciences and Expedia rallied after Thurs- day’s sellof in biotechnology and small-cap shares.(2c) 5 0.17 0.15 Expedia reached a record, rising 6.7 percent for the ifth straight gain and the longest streak since January. In Table 16, we can see the previously mentioned Elong stock share sell event #2, which relates to the stock market (2c), gets more attention. Pseudo-event #5 receives greater attention due to the discussion of the business environment (2b1, 2b2). ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 35 Table 16. Events Atention Visualization Example: Expedia Month 7 Event cfm at- npm at- Text tention tention weight weight 1 0.13 0.17 Expedia jumped 13 percent and Amgen added 2.9 percent as results beat estimates. 2 0.26 0.24 In May, Expedia sold its stake in ELong , a Chinese online travel company. (2c) 3 0.12 0.17 Amgen rallied 2.9 percent and Expedia jumped 13 percent on better-than-estimated earnings. 4 0.16 0.14 Expedia also topped a record, leading consumer-discretionary shares higher after second-quarter sales and proit topped analysts’ estimates. 5 0.33 0.29 Other homegrown technology companies include Zillow Group, Expedia and Zulily. (2b1, 2b2) Table 17 is another example of diferent target variable allocating diferent attention weights to pseudo-events. The output variable cfm pays more attention to pseudo-event #1 and #2, while the output variable npm pays more attention to pseudo-event #1 and #3. Pseudo-event #1 and #2 are talking about external impact factors to the entire business environment (2b1, 2b2), and those impacts will lead to an impact on Expedia’s cash low. Meanwhile, pseudo-event #1 and #3 are talking about the competition and direct external impact on Expedia, leading to an impact on Expedia’s net proit. The attention models capture the important and related pseudo-events precisely. Table 17. Events Atention Visualization Example: Expedia Month 8 Event cfm at- npm at- Text tention tention weight weight 1 0.26 0.26 With established tour operators facing competition from low-cost airlines and online booking sites such as Expedia.com, Thomas Cook and TUI are under pressure to invest in new oferings. (2b1, 2b2) 2 0.24 0.19 Google’s arguments were countered by Thomas Vinje, a lawyer with Cliford Chance who represents FairSearch Europe, whose members include Microsoft , Expedia and Nokia Oyj. (2b1, 2b2) 3 0.22 0.22 Fighting Competition Priceline, whose sites include Booking.com and Kayak, has struck partnerships and made purchases to fend of competition from Google and Expedia. (2a) 4 0.12 0.13 Expedia’s shares rose to a record on July 31 after second-quarter sales and proit topped analysts’ estimates. 5 0.17 0.20 When asked whether Priceline has seen mounting competition from Google or Expedia, Huston said there has not been a tangible change. In Table 18, the highlighted pseudo-events focus on external impact factors. Pseudo-event #2 and #3 are talking about external partnership (2a) of Expedia. The business partnership is likely to impact Expedia’s revenue and cash low. Pseudo-event #5 is talking about the stock market movement (2c) for Expedia, which may impact Expedia’s net proits. Finally, what’s extremely exciting about this somewhat lengthy case study is that we realize event attention maps in our models (and attention maps in general) not only can serve as sense-making tools but also can facilitate concept, impact factor, and hypothesis generation . We illustrate example in Figure 17. Figure 17 illustrates a tentative concept hierarchy that emerged from our case study of Expedia, upon consolidat- ing insights gained from examining various attention maps (the codes map to previously discussed pseudo-event examples). We can see two high-level impact factors can impact a irm’s cash low margin and net proit margin: internal and external factors. Three kinds of irm internal activities can afect irm’s cfm and npm. ACM Trans. Manag. Inform. Syst. 36 • Zhai and Zhang Table 18. Events Atention Visualization Example: Expedia Month 11 Event cfm at- npm at- Text tention tention weight weight 1 0.15 0.19 Expedia lost 2.6 percent to $122.01, paring an earlier decline of 3.6 percent. 2 0.22 0.18 Expedia has had a partnership with HomeAway for two years, Khosrowshahi said in Wednesday’s statement. (2a) 3 0.23 0.21 Bringing HomeAway into Expedia’s portfolio of brands ’is a logical next step,’ Khosrow- shahi said. (2a) 4 0.22 0.18 Last week, travel-booking site Expedia agreed to acquire vacation-rental company HomeAway for $3.9 billion. 5 0.18 0.24 Online travel company Expedia rebounded 1.9 percent after sliding 2.9 percent Tuesday, while Marriott International and Carnival each added 1.2 percent. (2c) External factors can include three facets, as well. One facet is the companies’ partnership that can integrate companies’ strength to gain more market share. A larger market share likely leads to greater proits. The second external factor is the business environment change, impacting the entire industry or the focal company alone. The third external impact factor is the irm’s stock market activities. Fig. 17. Impact Factors for Cash Flow Margincfm ( ) and Net Profit Marginnpm ( ) ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 37 Table 19. Outperforming Cases Distribution on News-only Model Compared with Ratio-only Model (Prediction within vs. outside of Historical Values Range) sale_nwc pe_exi de_ratio cfm npm equity cash_ratio _invcap within 33.25% 44.34% 26.51% 33.98% 33.49% 26.99% 36.39% historical values range outside of 66.75% 55.66% 73.49% 66.02% 66.51% 73.01% 63.61% historical values range G OUTPERFORMING CASES DISTRIBUTION ON NEWS-ONLY MODEL COMPARED WITH RATIO-ONLY MODEL (PREDICTION WITHIN VS. OUTSIDE OF HISTORICAL VALUES RANGE) After investigating the model results, we found that when the predicted ratio value luctuates outside of the historical ratio range, we tend to see more accurately predicted (outperforming) cases from the news-only model than the ratio-only model. It means when one irm experiences events that cause large business swings, the news information is much needed. In Table 19, we show that among the outperforming cases, when the predicted ratio value luctuates outside the historical ratio range, the news-only model has more outperforming case percentages than its ratio-only counterpart. For example, for the sales to networking capital ratio sale_nwc, when the predicted value luctuates outside the historical values range, 66.75% of the outperforming cases are from the news-only model. Similarly, for debt ratio de_ratio, 73.49% of the outperforming cases are from the news-only model when the predicted value luctuates outside the historical values range. We see this phenomenon is general and across all ratios. More importantly, this phenomenon shows that if the irm experienced dramatic changes, news article information is important for making better predictions. H INDUSTRY DISTRIBUTIONS ON THE DATASET AND OUTPERFORMING CASES DISTRIBUTION ON NEWS-ONLY MODEL COMPARED WITH RATIO-ONLY MODEL We summarize the industry distributions when the news-only model provides more accurate predictions (outper- forming cases) than the ratio-only model. Table 20 shows the industry distributions of the entire dataset and outperforming cases percentage on the news-only model vs. ratio-only model on each variable. In Table 20, we observe that the industry distribution of the outperforming cases is similar to the entire dataset. Furthermore, the health industry has a consistently higher-than-dataset distribution while the consumer industry has a lower. Although not causal, it shows our model can beneit industries with longer business cycles, such as the health industry. Meanwhile, it indicates our model learns universally on all companies; therefore, the outperforming cases are across all industries. Also, it veriies the universal usage of our model. I MEDIA COVERAGE DISCUSSION We make three additional observations on media coverage as the following. (1) Companies with higher market value tend to have more media coverage. In Table 21, we categorized companies by their market value (by the end of the dataset time) and observed that higher market value companies have more media coverage on average. In Table 22, we show the average market Note: we use the terms łmedia coveragež and łthe number of pseudo-eventsž interchangeably. ACM Trans. Manag. Inform. Syst. 38 • Zhai and Zhang Table 20. Industry Distributions on the Dataset and Outperforming Cases of News-only Model Compared with Ratio-only Model Industry dataset sale_nwc pe_exi de_ratio cfm npm equity cash_ratio _invcap Consumer 30.84% 26.44% 22.58% 28.81% 26.80% 24.78% 30.77% 26.97% Health 7.47% 9.20% 12.26% 8.47% 8.25% 9.73% 7.69% 12.36% HiTech 20.00% 19.54% 23.23% 23.73% 18.56% 23.89% 16.67% 20.22% Manufacturing27.23% 24.14% 29.68% 23.73% 29.90% 23.89% 32.05% 26.97% Others 14.46% 20.69% 12.26% 15.25% 16.49% 17.70% 12.82% 13.48% Table 21. Average Number of Pseudo-events per Market Value Category Market Value Category Average Monthly Pseudo-events > $ 500,000 M 3.13 between $ 100,000 M and $ 500, 000 M 2.89 between $ 50,000 M and $ 100, 000 M 2.47 < $ 50,000 M 0.82 Table 22. Average Number of Pseudo-events per Industry Industry Average Market Value Average Monthly Pseudo-events Consumer $ 16,599 M 0.86 Health $ 55,922 M 1.63 HiTech $ 45,125 M 1.31 Manufacturing $ 14,240 M 0.77 Others $ 11,885 M 1.15 value of each industry and their average number of pseudo-events per month. We observe that higher market value industries tend to have more media coverage, for example, the Health and HiTech industries. (2) If the news-only model outperforms the ratio-only model, the volatility of the news coverage tends to be higher. We use volatility to evaluate the luctuations in the number of pseudo-events of each company within each time window (such as monthly). We deine the news coverage volatility as the standard deviation of the number of pseudo-events change△ , as t∈1:M volatility = (△ −△) (19) M − 1 t=1 ACM Trans. Manag. Inform. Syst. Read the News: Not the Books • 39 Table 23. Average Number of Pseudo-events Volatility of Outperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.53 0.51 0.52 0.64 0.54 0.49 0.59 Health 0.65 0.77 0.78 1.06 0.78 0.74 0.75 HiTech 0.58 0.56 0.58 0.53 0.57 0.68 0.72 Manufacturing 0.48 0.47 0.50 0.53 0.48 0.59 0.64 Others 0.57 0.39 0.29 0.50 0.39 0.49 0.49 Table 24. Average Number of Pseudo-events Volatility of Underperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.45 0.45 0.45 0.42 0.44 0.46 0.43 Health 0.67 0.50 0.64 0.53 0.60 0.65 0.62 HiTech 0.52 0.51 0.53 0.54 0.52 0.51 0.48 Manufacturing 0.42 0.41 0.43 0.40 0.42 0.39 0.38 Others 0.31 0.38 0.40 0.34 0.38 0.36 0.36 Table 25. Average Number of Pseudo-events of Outperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 1.17 1.09 1.05 1.17 1.03 0.82 1.08 Health 1.13 1.64 2.07 2.34 2.05 1.83 1.82 HiTech 1.48 1.33 1.29 1.28 1.36 1.74 1.75 Manufacturing 0.93 0.82 1.14 0.75 0.89 1.28 1.13 Others 1.81 1.25 1.12 1.38 1.10 1.28 1.47 , where number of pseudo − events  t  − 1 , if number of pseudo − events >0 t−1 number of pseudo − events t−1 △ = (20) 0 , otherwise As shown in Table 23 and Table 24, the pseudo-events volatility of the outperforming cases group is higher than the underperforming group in the majority cases. It means outperforming companies have regular media coverages instead of being silent for a long time. The continuous input helps the model extract helpful information from news consistently. (3) The average number of pseudo-events from the outperforming group tends to be higher than its counterpart. As shown in Table 25 and Table 26, we observed the number of pseudo-events of the outperforming group is higher than its counterpart in the majority of the cases. The above observations are relected by our data. We provide these observations and want to shed light on the practical implications of the model. ACM Trans. Manag. Inform. Syst. 40 • Zhai and Zhang Table 26. Average Number of Pseudo-events of Underperforming Cases from the News-only Model Industry sale_nwc pe_exi de_ratio cfm npm equity_invcap cash_ratio Consumer 0.80 0.78 0.84 0.79 0.82 0.88 0.82 Health 1.73 1.47 1.48 1.30 1.31 1.51 1.44 HiTech 1.28 1.31 1.33 1.33 1.30 1.24 1.20 Manufacturing 0.76 0.77 0.74 0.80 0.76 0.65 0.70 Others 0.84 1.08 1.13 1.04 1.15 1.10 1.05 ACM Trans. Manag. Inform. Syst.

Journal

ACM Transactions on Management Information Systems (TMIS)Association for Computing Machinery

Published: Jan 16, 2023

Keywords: Natural language processing

References