Analysis of Trends
Introduction
We will study trend models for the analysis of time series data. Trend models that we will cover include linear, quadratic and harmonic trend models and those account for seasonality. We will observe that while the trend models are very good at capturing the trend in time series data, their performance is poor on capturing serial correlation in time series data. We will have handson tasks to deepen your understanding of trend models and improve your skillsets for the implementation of time series analysis methods.
The trend in time series is closely related to the mean function of the series. Changes in mean over time create a trend in the series. In general, the mean function is an arbitrary function of time. We will consider relatively simple functions of time to model the trend in time series.
In this module, we will study
 the deterministic and stochastic trend,
 modeling deterministic trends,
 estimation of constant mean,
 regression approach to model the trend,
 analysis of residuals after modelling the trend.
learning objectives
This week will contribute to Course Learning Objectives:
 Present time series in an informative way, both graphically and with summary statistics
 Develop stationary and nonstationary, and seasonal and nonseasonal time series models
Deterministic Versus Stochastic Trends
One of the challenges in time series analysis is that the same time series may be viewed quite differently by different analysts. For example, one can foresee a trend in a simulated random walk with a constant mean for all time. The perceived trend is a result of the strong positive correlation between the series values at nearby time points and the increasing variance in the process as time goes by. Therefore, one can see different trends in the next simulations. This type of trend is called stochastic trend.
In the average monthly temperatures example of the first module, we got the following time series plot in Figure 1:
Here we have a cyclical or seasonal trend, but here the reason for the trend is clear that the Northern Hemisphere’s changing inclination toward the sun. We can model this trend by Yt=Xt+μt, where μt is a deterministic function that is periodic with period 12 and it should satisfy μt=μt−12 for all t. We can assume that Xt represent an unobserved variation around μt and has zero mean for all t. So, this model assumes that μt is the mean function for the observed series Yt. Because the mean function is determined beforehand and we can set the “functions form” of a trend, the trend considered here is a deterministic trend. It is possible to set a linear mean function such that μt=β0+β1t or a quadratic time trend such as μt=β0+β1t+β2t2.
Estimation of a Constant Mean
When we consider a constant mean over time, we set μt=μ, for all t. So, our model is written as
Our aim is to estimate the value of μ using the observed series Y1,Y2,…,Yn. The straightforward estimate of μ is the sample mean calculated as
Here sample mean is an unbiased estimator of the constant mean. To investigate its efficiency, we need to find the variance of the sample mean. Suppose that {Yt} is a stationary time series with autocorrelation function ρk. Then, the variance of the sample mean is obtained as
Note that if the series {Yt} is just white noise then ρk=0 for k>0; and hence, Var(Y¯ reduces to simply γ0/n, which is the population variance divided by the sample size.
Instead of constant mean, we can set a moving average model such that Yt=et−1/2et−1, which is also stationary. Then, we find that ρ1=−0.4, which means that we have a negative correlation at lag 1, and ρk=0 for k>1. In this case, we have
For a large n, the correction factor (n−1)/n will approach to 1. Thus, we get
So, the variance of the estimator of μ for the moving average model is less than that of for the constant mean model: 0.2(γ0/n)<γ0/n. The reason for getting a more efficient estimator with a moving average model is that in the moving average model, it is possible for the series to oscillate back and forth across the mean. On the other hand, if ρk≥0 for all k≥1, Var(Y¯) will be larger than γ0/n.
For many stationary processes, the autocorrelation function decays quickly enough with increasing lags. under this assumption and given a large sample, we obtain the following approximation:
Here, negative correlations and large sample size both increase the efficiency of the estimator.
We should note that the precision of the sample mean as an estimator of μ can be strikingly different for a nonstationary process with a constant mean. For example, for the random walk process defined in Module 1, we find the following:
Notice that in this special case the variance of our estimate of the mean actually increases as the sample size n increases. Because this is unacceptable, we need to consider other estimation techniques for nonstationary series.
Regression Approach
Classical regression analysis can be used to model nonconstant mean trend. We will consider linear, quadratic, seasonal means, and cosine trends.
Linear and Quadratic Trends in Time
The deterministic linear trend model is expressed as follows:
μt=β0+β1t
where β0 represents intercept and β1 corresponds to the slope of the linear trend. Suppose β^0 and β^1 are the classical least squares estimates of β0 and β1, respectively. Then, β^0 and β^1 are obtained as follows:
where t=(n+1)/2 is the average of integers 1,2,…,n.
Consider the simulated random walk process in Figure 2:




Estimates of slope and intercept are β^1=0.1341 and β^0=−1.008, respectively. Here slope is statistically significant at 5% significance level. The trend line is plotted over the time series in Figure 3:


Appropriateness of this linear trend model will be considered later.
The deterministic quadratic trend model is expressed as follows
μt=β0+β1t+β2t2
where β0 represents intercept, β1 corresponds to the linear trend, and β2 corresponds to quadratic trend in time.
The following code chunk fits a quadratic trend model to the random walk data:




Fitted quadratic trend is shown in Figure 4:


Cyclical or Seasonal Trends
Consider now modeling and estimating seasonal trends, such as for the average monthly temperature data in Figure 5.
Here we assume that the observed series can be represented as
Yt=μt+Xt
where E(Xt)=0 for all t. The most general assumption for μt with monthly seasonal data is that there are 12 parameters, β1,β2,…,β12, giving the expected average temperature for each of the 12 months. To represent seasonality, we may write a seasonal model such that
We need to set up indicator variables (sometimes called dummy variables) that indicate the month to which each of the data points pertains before going on with estimation of parameters. We can also include an intercept term β0 in the model.




All of the parameters corresponding to months are statistically significant at 5% level. We can include the intercept parameter as follows:




R omits the January coefficient in this case. Notice that when we have the intercept in the model, we interpret resulting parameters as the difference between the first month and the related one. Now the February coefficient is interpreted as the difference between February and January average temperatures, the March coefficient is the difference between March and January average temperatures, and so forth. In this model, all of the differences between January and the other months are statistically significant at 5% level in both models. Notice that the Intercept coefficient plus the February coefficient here equals the February coefficient the model with no intercept parameter.
Cosine Trends
In the seasonal means model, we separate the effect of each month. However, there is nothing about the shape of the seasonal trend in the seasonal means model. We can include the information on the shape of the seasonal trend in the model by assigning a cosine curve as the mean function μt:
μt=βcos(2πft+Φ)
Here, β(>0), f, and Φ are called the amplitude, frequency, and phase of the curve. As t varies, the curve oscillates within [−β,β] interval. Since the curve repeats itself exactly every 1/f time units, 1/f is called the period of the cosine wave. When we set f=1/12, a cosine wave will repeat itself every 12 months. So we say that the period is 12.
For the estimation purposes, we need to make the above cosine trend model linear in terms of its parameters. With the following misinterpretation, we get
βcos(2πft+Φ)=β1cos(2πft)+β2sin(2πft)
where
β=β21+β22−−−−−−√ and Φ=atan(−β2/β1)
and, conversely,
β1=βcos(Φ) and β2=βsin(Φ).
Consequently, we will use cos(2πft) and sin(2πft) to estimate β1 and β2, respectively. The simplest such model for the trend would be expressed as
μt=β0+β1cos(2πft)+β2sin(2πft)
Here the constant term β0 represents a cosine with frequency zero.
In any practical example, we must be careful how we measure time, as our choice of time measurement will affect the values of the frequencies of interest. For example, if we have monthly data but use 1,2,3,… as our time scale, then 1/12 would be the most interesting frequency, with a corresponding period of 12 months. However, if we measure time by year and fractional year, say 1980 for January, 1980.08333 for February of 1980, and so forth, then a frequency of 1 corresponds to an annual or 12month periodicity.
The following code chunk fits a cosine curve at the fundamental frequency to the average monthly temperature series.




The following code chunk plots the fitted curve along with the observed average monthly temperature series in Figure 6.


The cosine trend model fits the data quite well with the exception of most of the January values, where the observations are lower than the model would predict.
Interpreting Regression Output Estimates of regression parameters are obtained under some assumptions on the stochastic component {Xt} of linear trend model. So, some properties of regression output heavily depend on the assumption that Xt is white noise and some other parts depend on approximate normality of Xt.
When we have μt=β0+β1t as the mean function, the unobserved stochastic component Xt can be estimated (predicted) by Yt−μ^t. If Xt has a constant variance, we estimate the standard deviation of Xt, namely γ0−−√, by the residual standard deviation
s=1n−p∑t=1n(Yt−μ^t)2−−−−−−−−−−−−−−−−√
where p is the number of parameters estimated in μt and n−p is the socalled degrees of freedom for s.
The smaller the value of s, the better the fit.
Another measure of goodness of fit of the trend is the coefficient of determination, namely R2. One interpretation of R2 is that it is the square of the sample correlation coefficient between the observed series and the estimated trend. It is also the fraction of the variation in the series that is explained by the estimated trend.
High but not close to 1 values of R2 implies a satisfactory fit.
When we fit the straight line to the random walk data, we get the following output:




According to multiple R2, about 81% of the variation in the random walk series is explained by the linear time trend. The adjusted version of multiple R2 provides an approximately unbiased estimate of true R2.
The standard deviations of the coefficients labeled Std. Error on the output need to be interpreted carefully. They are appropriate only when the usual regression assumption that the stochastic component is white noise. This assumption rarely true for time series data!
If the stochastic component is normally distributed white noise, then the pvalues are given under “Pr(>t)” can be used to test the null hypothesis that the corresponding unknown regression coefficient is zero.
Residual Analysis
The estimator or predictor of unobserved stochastic component {Xt},
X^t=Yt−μ^t
is called residual corresponding to the tth observation.
An estimate is the guess of an unknown parameter and a prediction is an estimate of an unobserved random variable.
If the trend model is reasonably correct, then the residuals should behave roughly like the true stochastic component, and various assumptions about the stochastic component can be assessed by looking at the residuals.
If the stochastic component is white noise, then the residuals should behave roughly like independent (normal) random variables with zero mean and standard deviation of s. We can standardise residuals to make their mean zero.
After computation of residuals or standardised residual, we examine various residual plots. The first plot to examine is the plot of the residuals over time. If the series is seasonal, we can use labels while plotting to identify the seasonality better.
In the first example, We will use the monthly average temperature series which we fitted with seasonal means as our first example to illustrate some of the ideas of residual analysis. The following chunk generates a time series plot for the standardized residuals of the monthly temperature data fitted by seasonal means:


If the stochastic component is white noise and the trend is adequately modeled, we would expect such a plot to suggest a rectangular scatter with no discernible trends whatsoever.
There are striking departures from randomness seen in the plot in Figure 7.
The labels of months are added in Figure 8.


There is no apparent pattern relating to different months of the year in Figure 8.
Next, we look at the standardized residuals versus the corresponding trend estimate, or fitted value in Figure 9. The function rstudent() computes standardised residuals.


As anomaly with this plot small residuals would be associated with small fitted trend values and large residuals with large fitted trend values, or there would be less variation for residuals associated with certain sized fitted trend values or more variation with other fitted trend values. Although there is somewhat more variation for the March residuals and less for November, the plot does not indicate any dramatic patterns that would cause us to doubt the seasonal means model.
Normality of residuals can be checked with a histogram. Figure 10 displays a frequency histogram of the standardized residuals from the seasonal means model for the temperature series.


The plot is somewhat symmetric and tails off at both the high and low ends as a normal distribution does.
Another plot to check normality is the quantilequantile (QQ) plot. Such a plot displays the quantiles of the data versus the theoretical quantiles of a normal distribution.
With normally distributed data, the QQ plot looks approximately like a straight line.
Figure 11 shows the QQ scores (calculated under normal distribution) plot for the standardized residuals from the seasonal means model for the temperature series.


The straightline pattern here supports the assumption of a normally distributed stochastic component in this model.
In addition to visualisations, there are various hypothesis tests that can be used to check the normality assumption of the stochastic component. One of these tests is the ShapiroWilk test that calculates the correlation between the residuals and the corresponding normal quantiles. We apply the ShapiroWilk test to the residuals of temperature series using the following code chunk




We get the pvalue of 0.6954. So we conclude not to reject the null hypothesis that the stochastic component of this model is normally distributed.
Independence in the stochastic component is another assumption to check. The runs test can be applied over the residuals. The runs test applied over the residuals of temperature series leads to a pvalue of 0.216. Thus, we conclude not to reject the null hypothesis stating the independence of the stochastic component in this seasonal means model.
Sample Autocorrelation Function
Sample autocorrelation function (ACF) is a very useful and important tool in the analysis of time series data. We compute the sample correlation between the pairs k units apart in time. However, we modify this slightly, taking into account that we are assuming stationarity, which implies a common mean and variance for the series. With this in mind, we define the sample autocorrelation function, rk, at lag k as
for k=1,2,…. A plot of rk versus lag k is often called a correlogram.
Because we are interested in discovering possible dependence in the stochastic component, the sample autocorrelation function for the standardized residuals is of interest. Figure 12 displays the sample autocorrelation for the standardized residuals from the seasonal means model of the temperature series.


All values are within the horizontal dashed lines, which are placed at ±2/n−−√. According to the ACF plot none of the hypotheses ρk=0 can be rejected at the usual significance levels for k=1,2,…,21. Thus, we infer that the stochastic component of the series is white noise.
As a second example, a time series plot of the standardized residuals arising from fitting a straight line to the random walk time series is shown in Figure 13:


In Figure 13, the residuals “hang together” too much for the white noisethe plot is too smooth. Furthermore, there seems to be more variation in the last third of the series than in the first twothirds. When we plot standardised residuals versus fitted trend line values, we observe a similar effect with larger residuals associated with larger fitted values from Figure 14.


The sample ACF of the standardized residuals is given in Figure 15:


This ACF plot confirms the smoothness of the time series plot as we have correlation values higher than the confidence bound at several lags. This is not what we expect from a white noise process.
As another example, we return to the annual rainfall in Los Angeles for which we found no evidence of dependence in that series and check the normality assumption using the QQ plot in Figure 16.


Because we see a considerable amount of departure from the reference line, we conclude that the normality assumption does not hold for the annual rainfall series in Los Angeles. The ShapiroWilk test also confirms this inference with a pvalue less than 0.05.




Forecasting with regression models
After ensuring that the fitted model is suitable for prediction purposes, we use the model to find forecasts. For time series regression models, this task is simply based on the straightforward use of the fitted regression model. We apply the following steps to find h steps ahead forecasts:
Generate a sequence of time points of lengths h starting from the last observation point. For example, suppose we have a time series of length 10 and h=4. Then the new sequence becomes t=11,12,13,14.
Write each value of the new sequence generated in the previous step in place in the fitted model and calculate forecasts.
We can implement these steps using the predict() function with the fitted model object and the sequence created at step 1 as inputs.
To illustrate, let’s use the fitted linear model for the random walk data to find 5 steps ahead forecasts. The following code chunk does this task:




We can plot these forecasts next to the time series of interest by the following code chunk as in Figure 17:


As another example, the harmonic model fitted to the average monthly temperature series and find forecasts for 7 months ahead.




We plot the forecasts along with the original series with the following code chunk in Figure 18. The meaning of the colors is the same as Figure 17.


Forecasts from the harmonic model successfully follow the repeating pattern in the original series.
Summary
In this module, we focused on describing, modeling, and estimating deterministic trends in time series. The simplest deterministic “trend” is a constantmean function. Regression methods were then pursued to estimate trends that are linear or quadratic in time. Methods for modeling cyclical or seasonal trends came next, and the reliability and efficiency of all of these regression methods were investigated. Finally, we studied residual analysis to investigate the quality of the fitted model. We also introduced the important sample autocorrelation function, which is a very useful and important tool in the analysis of time series.