The stock market is an intricate and dynamic system characterized by non-linearity, non-stationarity, volatility, and significant noise. These features make time series data, such as stock prices, highly uncertain and susceptible to multifaceted influences. Accurate forecasting of stock price movements can enhance investor confidence and yield significant financial gains, positioning stock price prediction as a key area of financial research.
2. Methods of Technical Analysis
Experts and scholars have explored various methods for forecasting stock prices. These methods are broadly categorized into statistical analysis methods, machine-learning methods, and deep learning methods.
3. Statistical Analysis Methods
Statistical methods, such as Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity (GARCH), and Autoregressive Conditional Heteroskedasticity (ARCH), have been extensively applied:
Jarrett et al. applied the ARIMA model to forecast the Chinese stock market.
Juan et al. utilized the GARCH model to examine Dubai International Airport’s activity volume and its impact on the UAE stock market.
While these methods are effective for linear data, they face limitations when dealing with the nonlinearity and excessive noise inherent in stock price data.
4. Machine-Learning Methods
Machine-learning techniques, such as Random Forest and Support Vector Machines (SVM), are widely used for stock price prediction due to their ability to identify complex patterns in time series data. Key studies include:
Lin et al. employed Principal Component Analysis (PCA) for stock prediction.
Patel et al. applied the naive Bayesian method for stock index prediction.
Nti et al. utilized the Random Forest method, enhanced with decision tree optimization, to predict stock indices.
Fu et al. and Xu et al. implemented SVM and integrated learning classifiers to minimize prediction errors.
While machine-learning methods improve prediction speed and accuracy, their information processing capabilities remain limited.
5. Deep Learning Methods
Deep learning offers superior capabilities for processing large-scale, nonlinear data with models such as Back Propagation (BP), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU). Notable contributions include:
Zhang demonstrated that deep learning neural networks outperform traditional ARIMA models for nonlinear data.
Yu et al. applied dimensionality reduction with LLE and used the BP neural network to improve prediction accuracy.
Despite their advantages, deep learning methods require refinement, particularly in integrating comprehensive data sources and optimizing hyperparameters.
6. Candlestick Patterns in Stock Prediction
Candlestick charts, derived from high-frequency price data, effectively express trend signals and structural characteristics. Investors analyze these patterns to anticipate price fluctuations, as candlestick series reveal inherent associations with stock movements. Candlestick patterns, which consist of single or multiple candlesticks of varying lengths, serve as a critical tool for identifying trends. However, predicting stock prices using candlestick patterns remains a complex challenge.
7. Proposed Model: SSA-CPBiGRU
To address the limitations of existing methods, this paper proposes a Bidirectional Gated Recurrent Unit Model Integrating Candlestick Patterns and a Sparrow Search Algorithm (SSA-CPBiGRU) for stock price forecasting.
The key innovations and contributions of the proposed model include:
Integration of Candlestick Patterns: Combining structural features of candlestick charts with advanced deep learning.
Enhanced Forecasting Accuracy: Leveraging the Bidirectional GRU for improved learning of temporal relationships.
Optimization with Sparrow Search Algorithm (SSA): Automating hyperparameter selection to enhance prediction performance.
Currently, the primary information utilized for predicting stock prices is basic market data, which lacks structural relationships and exhibits limited expression capacity for the overall system state. This model ingeniously integrates candlestick patterns with stock market data, serving as input for the stock price prediction model, imbuing the input data with structural characteristics and time series relationships. Furthermore, this paper uses a Bidirectional Gated Recurrent Unit (BiGRU) network to extract deeper feature relationships, thereby enhancing the learning ability of the network;
This paper applies a sparrow search algorithm (SSA) [14] to stock price forecasting, addressing the challenge of high randomness and difficulty in hyperparameter selection of the CPBiGRU network. Simultaneously, it enhances the accuracy of stock price forecasting;
Current research typically utilizes data from the same time window for forecasting. However, in actual trading decisions, investors often refer to stock price information from multiple trading days. Therefore, this paper explores the impact of extracting stock data from different time windows on prediction results.
The rest of the paper is organized as follows. Related work is covered in Section 2. The methodology is explained in Section 3. Section 4 presents the results, showing the experiments performed, and Section 5 provides the conclusions, with future directions listed in detail.
2. Related Work
We discuss published literature in two different categories here. These categories are related to candlestick patterns analysis and using deep learning models in stock prediction, respectively. The models pertaining to these domains is reviewed in detail and elaborated on.
2.1. Candlestick Patterns Analysis
Candlestick charts are a form of technical analysis visualization that are created by plotting the opening, highest, lowest, and closing prices of each analysis period [15]. Figure 1 represents an example of a candlestick chart. A box used to represent the difference between the open price and the close price is called the body of the candlestick. When the close price is higher than the open price, the body color is red, otherwise the body is green. The research environment of candlestick patterns is multifaceted. It can be categorized into three primary cases: sequential pattern mining and its applications, candlesticks and their applications, and stock time series forecasting.
Figure 1. Example of candlestick chart.
The sequential pattern mining algorithms have mainly concentrated on discovering rules, generating sequences, combining resources [16], and predicting sequences. Aiming for accurate destination forecasting, and giving its first partial trip trajectory as input, Iqbal et al. [17] considered it a multi-class prediction problem and proposed an efficient, non-redundant contrast sequence miner algorithm to distinguish mine patterns, which can only be seen in one class. Li et al. [18] proposed a one-pass algorithm, with a commonly used frequency as the output for a given sequence, and two advanced models to further improve the processing efficiency of voluminous sequences and streaming data. To address the shortcomings of traditional sequence pattern mining algorithms, Wang et al. [19] proposed a timeliness variable threshold and an increment Prefixspan algorithm, verifying the effectiveness of the algorithm. Sophisticated investors can analyze candlestick sequences in historical data and speculate on the patterns that will appear in the next period, thus enabling them to prognosticate future trends in stock markets. Candlesticks and their applications mainly focus on the image processing of candlesticks, identifying and interpreting certain candlestick patterns, while also striving to enhance recognition accuracy. Birogul et al. [20] and Guo et al. [21] encoded candlestick data into 2D candlestick charts and learned the morphological features of the candlestick data through deep neural networks. Chen et al. [22] proposed a two-step approach for the automatic recognition of eight candlestick patterns, with an average accuracy surpassing that of the LSTM model. Fengqian et al. [23] used candlestick charts as a generalization of price data across a timeframe, serving as a denoising tool. They subsequently employed cluster analyses and reinforcement learning methods to achieve online adaptive control of parameters within unfamiliar environments, ultimately enabling the implementation of high frequency transaction strategies.
Stock time series forecasting is mainly divided into two categories: stock price forecasting and stock trend forecasting. Wang et al. [24] evaluated the effectiveness of several renowned candlestick patterns, employing recent data from 20 American stocks to estimate stock prices. Udagawa et al. [13] proposed a hybrid algorithm that combines candlesticks sharing a specific price range into a single candlestick, thereby eliminating noisy candlesticks. Madbouly et al. [25] combined a cloud model, fuzzy time series and Heikin-Ashi candlesticks to forecast stock trends, thereby enhancing the accuracy of predictions. Wang et al. [26] proposed a quantification method for stock market candlestick charts based on the Hough variation. They utilized the graph structure embedding method and multiple attention graph neural networks to improve the prediction performance of stock prices.
2.2. Deeping Learning Approaches in Stock Prediction
Currently, deep learning approaches have become the predominant focus of research both domestically and internationally. Numerous experts and scholars have dedicated their efforts to exploring this field extensively. Rather et al. [27] used RNN with memory capability to predict stock returns. Minami [28] employed a variant RNN model called LSTM, which effectively alleviates common issues in neural networks such as gradient vanishing and explosions, as well as long-range dependence. Gupta et al. [29] augmented prediction speed by adopting GRU, a network structure with fewer gating mechanisms compared to LSTM, as the main network structure of the prediction model. Chandar et al. [30] used s wavelet neural network model to predict stock price trends. The aforementioned scholars have utilized historical stock price data at the input level of their models in order to conduct their prediction research. Additionally, deep learning techniques for stock prediction can incorporate a wider and more varied range of information sources, thereby enriching the factors influencing the prediction model. Cai et al. [31] integrated the characteristics of financial- and stock market-related news posts into a hybrid model consisting of a LSTM and Convolutional Neural Network (CNN) for forecasting. They constructed the prediction model using a multi-layer recurrent neural network. However, although the complexity of the model was increased, the prediction accuracy could be further improved. Ho et al. [32] combined candlestick charts with social media data, proposing a multi-channel collaborative network for predicting stock trends.
Furthermore, the current existing network models still have certain limitations. The selection of hyperparameters in these models is often based on existing research or experience, coming with a certain degree of subjectivity. Appropriate hyperparameters can improve the topology of the network model and enhance its generalization and fitting capabilities. Hence, the issue of eliminating the influence of human factors and finding the optimal network model hyperparameters is a matter of concern for scholars. Hu [33] employed a Bayesian algorithm to optimize the learning rate, the number of hidden layers, and the number of neurons within LSTM, with the goal of forecasting the stock prices of prominent stocks at specific stages of the Chinese stock market. The experimental results demonstrated that the optimized model possesses high universality and efficacy. Optimization of important parameters in the proposed network by a swarm intelligent optimization algorithm can solve the problems of network hyperparameters selection with high randomness, difficult selection, and influence by human factors, improving the prediction accuracy [34,35]. Song et al. [36] utilized the Particle Swarm Optimization (PSO) with adaptive learning strategy to optimize the time step, the batch size, the number of iterations, and the number of hidden layer neurons in the LSTM network. The SSA has a stronger optimization ability than the Grey Wolf Optimization (GWO), Ant Colony Optimization (ACO), and PSO [14]. Li et al. [37] conducted a comprehensive investigation comparing the Bat Algorithm (BA), GWO, Dragonfly Algorithm (DA), Whale Optimization Algorithm (WOA), Grasshopper Optimization Algorithm (GOA), and SSA. The performance of these algorithms, including their convergence speed, accuracy, and stability, was compared using 22 standard CEC test functions. The results clearly demonstrated that the SSA surpassed the other five algorithms in all aspects. Liao [38] utilized the SSA to optimize the weights, bias, the number of neurons in the hidden layer, and the number of iterations within the LSTM network. This methodology has heightened the prediction accuracy of the model and has been effectively implemented to address load forecasting issues with favorable results.
3. Methodology
The full system diagram is shown in Figure 2. Firstly, the stock market data and candlestick are preprocessed separately. Subsequently, the CPBiGRU optimized by SSA is utilized to forecast the closing price of stocks. Finally, the evaluation metrics are employed to evaluate the model performance.
Figure 2. System diagram of the proposed method.
3.1. BiGRU Network
The GRU model proposed by Cho et al. [39] is a variant of LSTM. GRU simplifies the input gate and forget gate in LSTM into an update gate, which controls the input and previous output to be transferred to the next cell. Additionally, it sets a reset gate to regulate the amount of historical information to be forgotten. It can effectively avoid the problems of long-range dependence and insufficient memory capacity of traditional recurrent neural networks. In contrast to LSTM, GRU exhibits a more simple architecture, as well as faster training and fitting speeds [40]. The architecture of the GRU is shown in Figure 3, and the operation formula is defined by Equations (1)–(4).𝑧𝑡=𝜎(𝑊𝑧·[ℎ𝑡−1,𝑥𝑡]+𝑏𝑧)zt=σ(Wz·ht−1,xt+bz)(1)𝑟𝑡=𝜎(𝑊𝑟·[ℎ𝑡−1,𝑥𝑡]+𝑏𝑟)rt=σ(Wr·[ht−1,xt]+br)(2)ℎ𝑡̃=𝑡𝑎𝑛ℎ(𝑊·[𝑟𝑡∗ℎ𝑡−1,𝑥𝑡]+𝑏ℎ̃)ht~=tanh(W·[rt∗ht−1,xt]+bh~)(3)ℎ𝑡=(1−𝑧𝑡)∗ℎ𝑡−1+𝑧𝑡∗ℎ𝑡̃ht=1−zt∗ht−1+zt∗ht~(4)where 𝑥𝑡xt and ℎ𝑡ht represent the input value and output value of the GRU network at moment 𝑡t, respectively. In addition, ℎ𝑡−1ht−1 is the output at the previous moment, ℎ𝑡̃ht~ is the candidate value of the memory cell value at moment 𝑡t, 𝑟𝑡rt represents the reset gate, and 𝑧𝑡zt represents the update gate. 𝑊𝑧Wz, 𝑊𝑟Wr, and 𝑊W are the weight matrices of the update gate, reset gate and candidate hidden layer state, respectively. 𝑏𝑧,𝑏𝑟bz,br, and 𝑏ℎ̃bh~ are the bias terms of the update gate, reset gate and candidate hidden layer state, respectively, while 𝜎σ is the sigmoid activation function.
Figure 3. GRU architecture block diagram.
However, the GRU network only considers unidirectional information flow while disregarding the influence of other directions of information flow. The lack of sufficient influential factors and information characteristics for data at prediction points imposes certain limitations on the predictive performance of the network. BiGRU is a neural network architecture that considers information flow in both historical and future directions. Building upon the foundation of the unidirectional GRU, it incorporates an additional layer of reverse-GRU. It not only improves the issue of temporal dependencies in data, but also expands the number of neural units, enabling more precise prediction results. Therefore, we choose BiGRU as the foundational model for prediction methodology.
The output value of BiGRU network unit is obtained by combining the hidden unit output values in both backward and forward directions. The final output of BiGRU is calculated as shown in Equation (5): where 𝑤𝑡wt and 𝑣𝑡vt represent the weights corresponding to the forward hidden layer state ℎ𝑡→ht→ and the reverse hidden layer state ℎ𝑡←ht← corresponding to the BiGRU at moment 𝑡t, 𝑏𝑡bt represents the offset corresponding to the hidden layer state at moment 𝑡t. The structure diagram of BiGRU is shown in Figure 4.ℎ𝑡=𝑤𝑡ℎ𝑡→+𝑣𝑡ℎ𝑡←+𝑏𝑡ht=wtht→+vtht←+bt(5)
Figure 4. BiGRU structure diagram.
3.2. A Dual Port BiGRU Network Integrating Candlestick Patterns
Candlestick patterns reflect the market trend and price information, and they can be roughly divided into two categories: reversal patterns and continuation patterns. Talib is a Python quantitative indicator library, offering functions to identify 61 candlestick patterns. These functions return three values: 0, 100 and −100. Here, 100 indicates recognition of the pattern, while −100 indicates recognition of the pattern’s inverse form, 0 indicates that the candlestick pattern is not recognized, 100 indicates the recognition of the pattern, and −100 indicates the recognition of the pattern’s inverse form. Since the return values of different patterns indicate different market trends, they cannot be directly incorporated into the stock price prediction models. The experiment defines three kinds of stock trend prediction values: −1, 0, and 1. Here, −1 indicates a downward trend in the stock, 1 indicates an upward trend in the stock, and 0 indicates that the candlestick pattern is not detected. In this paper, the return values of 61 candlestick patterns are transformed into corresponding stock trend prediction values based on the pattern definition. The generated correlation table of stock trend prediction values is presented in Table 1. For example, the candlestick pattern of two crows predicts a falling stock price. When this pattern is identified, the function in Talib that recognizes this pattern will return a value of 100. When the reverse pattern is identified, the function will return a value of −100. Therefore, in the correlation table of stock trend predictions, the return values of 100 and −100 for the two crows pattern correspond to stock trend predictions of −1 and 1, respectively.
Table 1. The correlation table of stock trend prediction values.
If multiple candlestick patterns are recognized on a certain day, the stock trend prediction for that day is the sum of the corresponding stock trend predictions for all identified patterns, e.g, if the engulfing pattern and belt-hold pattern are detected on a certain day, and the return values for both patterns are 100. Firstly, we refer to the correlation table to find the corresponding stock trend prediction value of 1 for these two patterns. Then, we add the trend prediction values for these two patterns together, resulting in a stock trend prediction value of 2 for this day. The stock trend prediction value and the stock market data are individually normalized and employed as inputs to the CPBiGRU network. The flowchart of CPBiGRU model is shown in Figure 5.
Figure 5. Flowchart for CPBiGRU.
3.3. Principle of Sparrow Search Algorithm
SSA is a novel swarm intelligence optimization technique. The algorithm simulates the foraging behavior of sparrow populations, dividing the sparrow population into discoverers and joiners. SSA calculates the fitness value of sparrows through the constructed fitness function, thus achieving the role and position transformation between individual sparrows, effectively avoiding the issue of traditional optimization algorithms easily getting stuck in local optima.
The population composed of n sparrows can be expressed as follows: where 𝑋X is a randomly initialized sparrow population, 𝑥x is an individual sparrow, 𝑑d represents the dimensionality of the population, and 𝑛n is the number of sparrows.𝑋=⎡⎣⎢⎢⎢⎢⎢𝑥1,1𝑥2,1𝑥1,2𝑥2,2⋮𝑥𝑛,1⋮𝑥𝑛,2……𝑥1,𝑑𝑥2,𝑑⋮…⋮𝑥𝑛,𝑑⎤⎦⎥⎥⎥⎥⎥X=x1,1x1,2x2,1x2,2…x1,d…x2,d⋮⋮xn,1xn,2⋮⋮…xn,d(6)
The fitness values of all sparrows can be expressed as follows: where 𝐹𝑥Fx is the fitness matrix, 𝑓f is the fitness value, expressed as the Root Mean Squared Error (RMSE) between the stock price prediction data and the real price data.𝐹𝑥=⎡⎣⎢⎢⎢⎢⎢⎢⎢𝑓([𝑥1,1 𝑥1,2… 𝑥1,𝑑])𝑓([𝑥2,1 𝑥2,2… 𝑥2,𝑑])⋮⋮𝑓([𝑥𝑛,1 𝑥𝑛,2… 𝑥𝑛,𝑑])⎤⎦⎥⎥⎥⎥⎥⎥⎥Fx=fx1,1 x1,2… x1,dfx2,1 x2,2… x2,d⋮⋮fxn,1 xn,2… xn,d(7)
The discoverers search for food and provide foraging direction to all joiners. During each iteration, the formula for the discoverer update location can be described using the following equation: where 𝑡t is the current iteration number, 𝑗=1, 2, 3,…,𝑑j=1, 2, 3,…,d, 𝑖𝑡𝑒𝑟𝑚𝑎𝑥itermax is the maximum number of iterations and is a constant, 𝑋𝑖,𝑗Xi,j represents the position information of the 𝑖i-th sparrow in the 𝑗j-th dimension, 𝛼α is a random number in [0, 1]. 𝑄Q is a random number that follows a normal distribution, and 𝐿L is a 1×𝑑1×d matrix in which each element is 1. 𝑅2(𝑅2∈[0, 1])R2(R2∈[0, 1]) and 𝑆𝑇(𝑆𝑇∈[0.5, 1.0])ST(ST∈[0.5, 1.0]) are the alarm value and the safe value, respectively. When 𝑅2<𝑆𝑇R2<ST, it indicates the absence of predators in the environment, allowing the discoverers to conduct extensive searches. When 𝑅2≥𝑆𝑇R2≥ST, it indicates that certain individuals within the population have detected predators and they will immediately emit an alarm signal. Following the reception of these alarm signals, the population is required to move to a secure location.𝑋𝑡+1𝑖,𝑗=⎧⎩⎨𝑋𝑡𝑖,𝑗·exp(−𝑖𝛼·𝑖𝑡𝑒𝑟𝑚𝑎𝑥)𝑋𝑡𝑖,𝑗+𝑄·𝐿𝑅2<𝑆𝑇𝑅2≥𝑆𝑇Xi,jt+1=Xi,jt·exp−iα·itermaxR2<STXi,jt+Q·LR2≥ST(8)
During the foraging process, joiners continuously revise their positions in order to acquire food while simultaneously keeping an eye on discovers and competing with them for food. Equation (9) outlines the approach used by joiners to update their positions. When 𝑖>𝑛/2i>n/2, it indicates that the 𝑖i-th joiner with a lower fitness value has not obtained food and needs to fly to other places to search for more. Where 𝑋𝑃XP represents the location of the optimal discoverer, 𝑋𝑤𝑜𝑟𝑠𝑡Xworst represents the current global worst position, 𝑛n is the population size, while A is a 1×𝑑1×d matrix where each element is randomly assigned as 1 or −1, and 𝐴+=𝐴𝑇(𝐴𝐴𝑇)−1A+=ATAAT−1.𝑋𝑡+1𝑖,𝑗=⎧⎩⎨𝑄·exp(𝑋𝑡𝑤𝑜𝑟𝑠𝑡−𝑋𝑡𝑖,𝑗𝑖2)𝑋𝑡+1𝑃+|𝑋𝑡𝑖,𝑗−𝑋𝑡+1𝑃|·𝐴+·𝐿𝑖>𝑛2𝑖≤𝑛2Xi,jt+1=Q·expXworstt−Xi,jti2i>n2XPt+1+Xi,jt−XPt+1·A+·Li≤n2(9)
In the algorithm, it is assumed that between 10% and 20% of sparrows in the population will become aware of danger and emit alarm signals when the danger occurs. The initial positions of these sparrows are randomly generated in the population, and their positions are updated based on Equation (10). Where 𝑋𝑏𝑒𝑠𝑡Xbest represents the current global optimal position, 𝛽β is a step size control parameter following a normal distribution with a mean of 0 and a variance of 1. 𝐾K is a random number in [−1, 1] that indicates the direction of sparrow movement and also serves as a step size control parameter. Additionally, 𝑓𝑖fi represents the fitness value of the 𝑖i-th sparrow individual, while 𝑓𝑔fg, 𝑓𝑤fw represent the current global optimal and worst fitness values, respectively, and 𝜀ε is an extremely small constant to avoid the error of zero division.𝑋𝑡+1𝑖,𝑗=⎧⎩⎨𝑋𝑡𝑏𝑒𝑠𝑡+𝛽·|𝑋𝑡𝑖,𝑗−𝑋𝑡𝑏𝑒𝑠𝑡|𝑋𝑡𝑖,𝑗+𝐾·(|𝑋𝑡𝑖,𝑗−𝑋𝑡𝑤𝑜𝑟𝑠𝑡|(𝑓𝑖−𝑓𝑤)+𝜀)𝑓𝑖>𝑓𝑔𝑓𝑖=𝑓𝑔Xi,jt+1=Xbestt+β·Xi,jt−Xbesttfi>fgXi,jt+K·Xi,jt−Xworsttfi−fw+εfi=fg(10)When 𝑓𝑖>𝑓𝑔fi>fg, it means that this sparrow is at the edge of the population and is extremely vulnerable to predators. When 𝑓𝑖=𝑓𝑔fi=fg, it indicates that the sparrow in the middle of the population perceives the danger and needs to approach the other sparrows in time in order to reduce the risk of predation.
The flowchart of the SSA-CPBiGRU model is shown in Figure 6. It can be divided into the following five steps:
Figure 6. Flowchart for SSA-CPBiGRU.
Initialization: We take the learning rate, iteration times and the number of units in two hidden layers of the CPBiGRU network as the hyperparameter objectives to be optimized by the SSA. The position information of the population and related parameters are randomly initialized after setting the value range of the optimized hyperparameter, the population size of sparrows, optimization iteration times, and initial safety threshold value;
Fitness value: We use the RMSE between the predicted value of the network model and the real value as the fitness function for SSA and the loss function for CPBiGRU, to determine the fitness values of each sparrow;
Update: We update the sparrow position by Equations (8)–(10) and obtain the fitness value of the sparrow population. Simultaneously, we record the optimal individual position and global optimal position value in the population;
Iteration: We ascertain if the maximum value of the number of update iterations has been attained. In such a case, conclude the loop and yield the optimal individual solution, signifying the determination of the optimal parameters for the network structure. Otherwise, return to step (3);
Optimization results output: The optimal hyperparameter values output by the SSA algorithm are employed as the learning rate, iteration number, and number of units in two hidden layers of the CPBiGRU network. Afterwards, the network is reconstructed and we proceed with subsequent procedures such as inverse normalization and evaluation analysis.