Analyzing and identifying predictable time range for stress prediction based on chaos theory and deep learning

Let S = \((s_1,s_2,\ldots , s_n)\) be a sequence of user’s daily stress levels, where \(s_i\) denotes the stress level at the i-th day (\(1 \le i \le n\)), and \(DOM(s_i)=\\), corresponding to stress level unknown, no stress at all, very little stress, some stress, a lot of stress, and a great deal of stress. Our stress prediction task aims to address the following two subtasks:

(1)

identify predictable time range m, within which reliable predictions can be made;

(2)

predict daily stress levels \((s_,s_,\ldots , s_)\) on the next m days.

We address the first subtask by conducting chaos analysis on the given stress sequence, and then leverage chaos theory and deep learning to address the second subtask. Before the discussion, let’s briefly review chaos theory and deep learning.

A brief introduction to chaos theory and deep learningChaos theory

Chaos theory is a branch of mathematics and science that studies the behavior of nonlinear dynamical systems [35]. In order to study the chaotic characteristics of nonlinear systems, it is important to firstly recover the dynamics of a system from a single observed variable [40], known as phase space reconstruction. Phase space reconstruction can be achieved based on Takens’ theorem [41], which guarantees that the reconstructed phase space is topologically equivalent to the original one as long as we choose a sufficiently large embedding dimension and an appropriate time delay. [42,43,44,45] provided methods for choosing the appropriate embedding dimension, and [46] presented the way for choosing an appropriate time delay.

In the reconstructed phase space, we can explore the complexity of the nonlinear system, and estimate the amount of chaos in the nonlinear system through the largest Lyapunov exponent. Here, the largest Lyapunov exponent is a quantity that characterizes the rate of separation of infinitesimally close trajectories in the dynamical system. It is a measure of the sensitivity to initial conditions or the predictability of the system [33]. Usually, if the value of the largest Lyapunov exponent is positive, the presence of chaos can be determined.

The time range for accurate prediction of a chaotic system can then be estimated as a function of the largest Lyapunov exponent [32]. [47,48,49,50,51] provided methods for calculating the largest Lyapunov exponent based on equations and definitions. A further improvement of the method was made in [52].

Deep learning

Deep learning is a sub-branch of machine learning that uses artificial neural networks with multiple layers to learn from data. It can handle unstructured data such as images, texts, audio, and video, and learn features from the data automatically without human intervention. The history of deep learning can be traced back to the 1940s, when Warren McCulloch and Walter Pitts explored the idea of artificial neural networks and proposed the McCulloch–Pitts neuron [53]. In 1986, Geoffrey Hinton popularized the back-propagation algorithm for training multi-layer neural networks [54], which caused another upsurge in neural networks.

Afterwards, a series of neural networks such as Convolutional Neural Networks (CNN) [55] and Recurrent Neural Networks (RNN) [56] were subsequently proposed. CNN [55] used convolutional layers to extract features from images, which was widely used for image classification, object detection, face recognition, etc. RNN [56] employed recurrent layers to process sequential data such as text, speech, and video. They can capture dependencies and context in the data. Some variants of RNN like Long Short-Term Memory Network (LSTM) [57] and Gated Recurrent Unit (GRU) [58] have been developed to overcome the problem of vanishing gradients, and learn dependencies in sequential data, which were widely used for natural language processing, speech recognition, etc.

Identification of predictable time range m (subtask 1)Phase space reconstruction

We project the user’s original stress sequence \(S=(s_1,s_2,\ldots , s_n)\) into a high-dimensional phase space:

$$\begin X(\tau ,d) = (X_1,X_2,\ldots ,X_) \end$$

(1)

where \(X_k=(s_k, s_,\ldots ,s_) \in }^\), \(\tau\) is the time delay determining the distance between the two successive points, \(s_k\) and \(s_\), in the phase space, and d is the embedding dimension of the phase space for \(k=1,2,\ldots ,n-(d-1)\tau\).

The setting of \(\tau\) is to maximize the knowledge about \(s_\) from \(s_k\) and minimize the redundancy between \(s_\) and \(s_k\). [46] presented a way to set \(\tau\) by computing and minimizing the mutual information M between \((s_1,s_2,\ldots ,s_)\) and \((s_,s_,\ldots ,\) \(s_)\). The less the mutual information, the less dependence between the two variables.

$$\begin \begin M&= \sum _^~\sum _^P(s_i,s_j) \cdot log_2 ~P(s_i,s_j) \\&\quad - \sum _^P(s_i) \cdot log_2 ~P(s_i) - \sum _^P(s_j) \cdot log_2 ~P(s_j) \end \end$$

(2)

where \(P(s_i)\) and \(P(s_j)\) are the probabilities of \(s_i\) and \(s_j\) in \((s_1,s_2,\ldots ,s_)\) and \((s_,s_,\ldots ,\) \(s_)\), respectively, and \(P(s_i,s_j)\) is the joint probability distribution of \(s_i\) and \(s_j\). When M drops to the local minimum value for the first time, the corresponding value of \(\tau\) is the optimal delay time \(\tau\).

We adopted the method proposed in [43] to determine the minimum embedding dimension value d. The basic idea is that, since the chaotic sequence is the projection of the high-dimensional chaotic system in the one-dimensional space, during the projection, some non-adjacent points in the high-dimensional space will become adjacent when projected into one-dimensional space, forming false nearest neighbors. When the embedding dimension gradually increases, the false nearest neighbors gradually disappear. When the number of false nearest neighbors drops to 0, a suitable embedding dimension value is obtained. Let \(X_k(d)=(s_k, s_,\ldots ,s_)\) be a vector in the phase space \(X(\tau ,d)\). In this d-dimensional space, each vector \(X_k(d)\) has its nearest neighbor \(X_(d)\), where \(n(k,d)\in \\) and \(n(k,d)\ne k\). The distance between \(X_k(d)\) and \(X_(d)\), denoted as \(R_k(d)\), can be computed as:

$$\begin \begin R_k(d)&= ~ ||~ X_k(d) - X_(d) ~|| \\&= ~ \sqrt^|s_-s_|^2} \end \end$$

(3)

where \(||\cdot ||\) denotes the Euclidean norm.

When the dimension of the phase space increases from d to (\(d+\) 1), the distance between the two points will change to:

$$\begin R_k(d+1) = ||~X_k(d+1)-X_(d+1)~|| \end$$

(4)

Whenever \(R_k(d+1)\) is much larger than \(R_k(d)\), the two nearest neighbors can be regarded as false nearest neighbors. The distance ratio r(k, d) of the nearest neighbors in the d-dimensional space and (\(d+1\))-dimensional space can be computed as:

$$\begin \begin r(k,d)&= \frac\\&=\frac(d+1)||}(d)||} \end \end$$

(5)

When r(k, d) is larger than a threshold, \(X_k(d)\) and \(X_(d)\) become false nearest neighbors. We calculate the average distance ratio of all nearest neighbor pairs in the d-dimensional space and (\(d+1\))-dimensional space by:

$$\begin E(d) = \frac\sum \limits _^r(k,d) \end$$

(6)

When the false nearest neighbors gradually decrease along with the increase of d, the changes of r(k, d) and of E(d) tend to be stable. To measure the variation from d to (\(d+1\)), we define:

$$\begin EV(d) = \frac \end$$

(7)

When EV(d) stops changing at a certain value \(d_0\), it means that the change of E(d) becomes stable, and the number of false nearest neighbors tends to 0. In this case, the result \(d_0\) is the minimum embedding dimension value we are looking for.

Largest Lyapunov exponent

The largest Lyapunov Exponent is an important factor for judging whether the sequence is chaotic. It indicates the average exponential divergence rate of adjacent trajectories in the phase space. A positive Lyapunov Exponent means that no matter how small the distance between the two trajectories in the initial state, the distance between them will increase exponentially over time. This is one of the most typical features of a chaotic system, so if the value of the largest Lyapunov Exponent is positive, the presence of chaos can be determined.

According to [32], the predictable time range of a chaotic system can be defined as the length of time before small differences in the initial state of the system begin to change exponentially, which is the reciprocal of the largest Lyapunov Exponent \(l_\) of the stress sequence. Therefore, the predictable time range m can thus be inferred as:

$$\begin m = \frac} = \frac(q)} \end$$

(8)

where \(l_\) is the slope of function D(q), i.e., the derivative of D(q) [52]:

$$\begin D(q) = \frac\sum \limits _^ln~||X_-X_|| \end$$

(9)

Here, \(X_\) is the nearest neighbor of \(X_k\).

Prediction of stress levels on the next m days (subtask 2)Fig. 1figure 1

Chaos and deep learning based stress prediction framework

Figure 1 shows our chaos and deep learning based stress prediction framework. The stress sequence \(S=(s_1,s_2,\ldots ,s_n)\) was firstly embedded with chaos dynamic patterns into a high d-dimensional phase space \(X(\tau ,d)=(X_1,X_2,\ldots \,X_)\), where \(X_k=(s_k,s_,\ldots ,s_)\) (for \(k=1,2,\ldots ,n-(d-1)\tau\)). To ensure that our model can handle variable-length input sequences in long-term prediction, we applied two ways of zero-padding in the input sequence. We added 0 before and after the input sequence respectively to make the sequence reach the longest input length (44 in this paper), and fed these two sequences into the model at the same time for training. We enforced dimension attention upon each d-dimensional vector in \(X(\tau ,d)\), and then fed the dimension-attended \(X_1,X_2,\ldots \,X_\) into respective LSTMs chained with temporal attention to learn the stress sequence representation.

Dimension attention. The dimension attention DA can be computed as:

$$\begin \begin DA&= tanh((X(\tau ,d))^T\times W_+b_) \end \end$$

(10)

where \(DA \in }^\), and \(W_\in }^\times 1},b_ \in }^ \) are trainable parameters.

With dimension attention, we can get the dimension-attended \(X_1,X_2,\ldots \,X_\) denoted as \(X^(\tau ,d)\):

$$\begin \begin X^(\tau ,d) = X(\tau ,d) \times DA \end \end$$

(11)

Then \(X^(\tau ,d) = (X^_1,X^_2,\ldots \,X^_)\) is fed into LSTM:

$$\begin \begin h_k&= LSTM(h_,_k}) \end \end$$

(12)

Temporal attention. The temporal attention TA can be computed as:

$$\begin \begin TA&= Softmax(H\times W_+b_) \end \end$$

(13)

where \(H=(h_1, h_2, \ldots , h_)\), \(TA \in }^\times 1}\), and \(W_\in }^\times 1},b_ \in }^ \) are trainable parameters, \(hidden\_size\) is the size of hidden state of LSTM, in this study, \(hidden\_size=8\).

With the temporal attention TA, we can get the overall information I of the input stress level sequence \(X(\tau ,d)\):

$$\begin \begin I = H\times TA \end \end$$

(14)

Through a final fully connected layer and Softmax, prediction of stress level on the next (\(n+\) 1)-th day could be made.

$$\begin \begin }_ = Softmax(I\times W_s + b_s) \end \end$$

(15)

where \(}_\) represents the possibility of the user under different stress levels, \(W_s \in }^\times num\_class}\) and \(b_s \in }^\) are trainable parameters, \(num\_class\) is number of stress levels, in this study, \(num\_class\) = 2 or 3.

With the predicted stress level \(}_\), we could then form a longer stress sequence \(S=(s_1,s_2,\ldots ,s_n,}_)\) to predict the stress level \(}_\) on the (\(n+\) 2)-th day. The process repeated until stress levels \((}_,}_,\ldots , }_)\) on the next m days were predicted.

Comments (0)

No login
gif