Time Series Analysis Basic Plots, Examples, and Fundamental Concepts
Introduction
You will study the basic context of time series and stochastic processes including basic visualisations for time series data.
Time series data is composed of observations collected sequentially over time. This type of data is frequently seen and analysed in data analytics applications.
In business, we observe
 weekly interest rates,
 daily closing stock prices,
 monthly price indices,
 yearly sales figures, and so forth.
In meteorology, we observe
 daily high and low temperatures,
 annual precipitation and drought indices, and
 hourly wind speeds.
In agriculture, we record
 annual figures for crop and livestock production,
 soil erosion, and export sales.
In the biological sciences, we observe the electrical activity of the heart at millisecond intervals.
In ecology, we record the abundance of an animal species.
As a recent example from time series research, Miwa Fukino, Yoshito Hirata, and Kazuyuki Aihara from University of Tokyo proposed a new method based on nonlinear time series analysis analyse music’s emotional effect.
In this module,
we will start with examples of time series and visualise time series data with the most frequently used plots. Introduce the modelbuilding strategy that we will follow through the semester. We will focus on the fundamental concepts in terms of Time Series and Stochastic Processes, Means, Variances, and Covariances, and Stationarity.
Examples of time series
In this section, we will see some examples of time series and basic visualisations to make inferences about the basic characteristics of time series data. In all of the examples, we use the R package TSA. Therefore you need to install the TSA packages calling


Annual Rainfall in Los Angeles
In this example, the data is composed of the annual rainfall amounts recorded in Los Angeles, California, over more than 100 years. The time series plot for this data is generated with the following code chunk. Here, the dataset larain contains the annual rainfall data. Please note that throughout the codes given in this module, you can also use win.graph() function to open a new graphics window with the settings determined by the arguments of this function.


In Figure 1, we observe considerable variation in rainfall amount over the years from this time series plot. The year 1883 was an exceptionally wet year for Los Angeles, while 1983 was quite dry. For the analysis, we are interested in whether or not consecutive years are related in some way. If so, we might be able to use one year’s rainfall value to help forecast next year’s rainfall amount. The following code chunk generates a scatter plot to investigate the relationship between pairs of consecutive rainfall values:


The plots show no “trends” and no general tendencies. In Figure 2, the point plotted near the lower righthand corner shows that the year of extremely high rainfall, 40 inches in 1883, was followed by an amount nearly 12 inches in 1884. The point near the top of the display shows that the 40inch year was preceded by a much more typical year of about 15 inches. We observe from this plot that there is little information about this year’s rainfall amount from last year’s amount. The correlation between last year’s rainfall amount and this year’s amount which can be calculated by the following code chunk:


An Industrial Chemical Process
The second example is on a a time series from an industrial chemical process. A color property from consecutive batches in the process is measured here and named color in the package. We plot the time series with the following code chunk:


From Figure 3, it seems that succeeding measurements are related to one another as the values that are neighbors in time tend to be similar in size. This is more clearly seen in the following scatter plot of neighboring pairs:


In Figure 4, we see a slight upward trend, which is apparent but not terribly strong. The correlation in this scatter plot is about 0.6:


Annual Abundance of Canadian Hare
The third example concerns the annual abundance of Canadian hare. The data consists of their abundance over about 30 years. The following is the time series plot of annual abundance of Canadian hare for 30 years:


In Figure 5, neighboring values here are very closely related. Large changes in abundance do not occur from one year to the next. The following plot shows the correlation between abundance versus the previous year’s abundance:


In Figure 6, we see an upward trend in the plotlow values tend to be followed by low values in the next year, middlesized values by middlesized values, and high values by high values. The amount of correlation between neighboring abundance values is 0.70:


Monthly Average Temperatures in Dubuque, Iowa
The average monthly temperatures (in degrees Fahrenheit) over a number of years recorded in Dubuque, Iowa, are shown in the following plot:


Figure 7 displays seasonality. Seasonality for monthly values occurs when observations twelve months apart are related in some manner or another. All Januarys and Februarys are quite cold but they are similar in value and different from the temperatures of the warmer months of June, July, and August, for example. There is still variation among the January values and variation among the June values. Models for such series must accommodate this variation while preserving the similarities. Here the reason for the seasonality is well understoodthe Northern Hemisphere’s changing inclination toward the sun.
The following shows the scatter plot of neighboring temperature measurements.


As expected, we observe a high correlation between temperatures of succeeding months from Figure 8. However, it is impossible to observe seasonality for this scatter plot.


Monthly Oil Filter Sales
The last example is on the monthly sales to dealers of a specialty oil filter for construction equipment manufactured by John Deere. According to the authors of our textbook, when these data were first presented to one of the authors, the manager said, “There is no reason to believe that these sales are seasonal.” Seasonality would be present if January values tended to be related to other January values, February values tended to be related to other February values, and so forth. The time series plot in Figure 9 is not designed to display seasonality especially well.


To exhibit the seasonality, we can build a more informative plot labeling the months with the first letters of their names. In this version, all January values are plotted with the character J, all Februarys with F, all Marches with M, and so forth. With these plotting symbols, it is much easier to see that sales for the winter months of January and February all tend to be high, while sales in September, October, November, and December are generally quite low. The seasonality in the data is much easier to see from the modified time series plot in Figure 10.


Here, the command time(oilfilters) puts continuous time points on each measurement such that
The command as.vector(season(oilfilters)) creates a vector puts months on each measurement such that
Then, the points() function writes the first letter of each month over the time series plot generated by plot() function.
Figure 11 shows the scatter plot of neighboring monthly sales of oil filters.


And the amount of correlation between sales of succeeding months is


Although we observe a seasonal effect, autocorrelation between monthly sales is slightly correlated. So, we should note that having seasonality in time series data does not imply a significant correlation.
A ModelBuilding Strategy
It is not a straightforward task to find appropriate models for time series data. There are several components that a data scientist should deal with. Throughout the course we will follow a multistage modelbuilding strategy with the following three steps:
 model specification, (or identification)
 model fitting, and
 model diagnostics.
In model specification (or identification), the classes of time series models are selected that may be appropriate for a given observed series. In choosing a model, we shall attempt to adhere to the principle of parsimony; that is, the model used should require the smallest number of parameters that will adequately represent the time series.
Albert Einstein is quoted in Parzen (1982, p. 68) as remarking that “everything should be made as simple as possible but not simpler.”
The model fitting consists of finding the best possible estimates of a number of parameters involved in the model using a statistical estimation method such as least squares, maximum likelihood estimation, or Bayesian estimation.
Model diagnostics is concerned with assessing the quality of the model that we have specified and estimated. We try to answer the questions:
How well does the model fit the data? Are the assumptions of the model reasonably well satisfied? If no inadequacies are found, the modeling may be assumed to be complete, and the model may be used, for example, to forecast future values. Otherwise, we choose another model in the light of the inadequacies found; that is, we return to the model specification step. In this way, we cycle through the three steps until an acceptable model is found.
Time Series and Stochastic Processes
The sequence of random variables \({Yt:t=0,±1,±2,±3,…}\) is called a stochastic process and serves as a model for an observed time series. It is known that the complete probabilistic structure of such a process is determined by the set of distributions of all finite collections of the Y’s. Much of the information in these joint distributions can be described in terms of means, variances, and covariances. So, we will focus on the measures mostly based on the first and second moments.
Means, Variances, and Covariances
For a stochastic process {<t=0,±1,±2,±3>,…}, the mean function is defined by
μt=E(Yt), for t=0,±1,±2,±3,…
Actually, μt is just the expected value of the process at time t.
The autocovariance function, γt,s is defined as
γt,s=Cov(Yt,Ys), for t,s=0,±1,±2,±3,…
where Cov(Yt,Ys)=E[(Yt−μt)(Ys−μs)]=E(YtYs)−μtμs.
The autocorrelation function, ρt,s, is given by
ρt,s=Corr(Yt,Ys), for t,s=0,±1,±2,±3,…
where
Corr(Yt,Ys)=Cov(Yt,Ys)Var(Yt)Var(Ys)−−−−−−−−−−−−−√=γt,sγt,tγs,s−−−−−√.
Please note that
γt,t=Cov(Yt,Yt)=E[(Yt−μt)(Yt−μt)]=E[(Yt−μt)2]=E(Y2t)−μ2t=Var(Yt)
and
γs,s=Cov(Ys,Ys)=E[(Ys−μs)(Ys−μs)]=E[(Ys−μs)2]=E(Y2s)−μ2s=Var(Ys).
Consequently, ρt,t=ρs,s=1.
Because γt,s=γs,t, we have ρt,s=ρs,t. And because γt,s≤γt,tγs,s−−−−−√, we have ρt,s≤1.
Values of ρt,s near ±1 indicate strong (linear) dependence, whereas values near zero indicate weak (linear) dependence. If ρt,s=0, we say that Yt and Ys are uncorrelated.
Because most of the context is based on the covariance and correlations, we will go a little bit further. To investigate the covariance properties of various time series models, the following result will be used repeatedly: If c1,c2,…,cm and d1,d2,…,dn are constants and t1,t2,…,tm and s1,s2,…,sn are time points, then the contrivance of two linear combinations of time series observations is calculated by
Cov[∑i=1mciYti,∑j=1ndjYsj]=∑i=1m∑j=1ncidjCov(Yti,Ysj).
As a special case, we obtain the variance of a linear combination of time series observations as
Var[∑i=1mciYti]=∑i=1nc2iVar(Yti)+2∑i<jcicjCov(Yti,Ytj).
The Random Walk
Let e1,e2,… be a sequence of independent, identically distributed random variables each with zero mean and variance σ2e. The observed time series, {<t=1,2>,…}, is constructed as follows:
Y1Y2Yt==⋮=e1e1+e2e1+e2+⋯+et
Apparently, this system of equations can be written as
Yt=Yt−1+et
with initial condition Y1=e1.
If the e’s are interpreted as the sizes of the “steps” taken (forward or backward) along a number line, then Yt is the position of the random walker at time t.
We obtain the mean and variance of the random walk process as follows:
μt=E(Yt)=E(e1+e2+⋯+et)=E(e1)+E(e2)+⋯+E(et)=0+0+⋯+0.
So, μt=0 for all t. As for the variance,
Var(Yt)=Var(e1+e2+⋯+et)=Var(e1)+Var(e2)+⋯+Var(et)=σ2e+σ2e+⋯+σ2e.
So we have a linearly increasing variance Var(Yt)=tσ2e with time.
For 1≤t≤s, we have the following autocorrelation function,
γt,s=tσ2e.
Autocorrelation function is straightforwardly obtained as follows:
ρt,s=ts−−√
for 1≤t≤s. For example, we have the following autocorrelation values ρ1,2=0.707, ρ8,9=0.943, ρ24,25=0.980, and ρ1,25=0.200. The neighboring time points are more correlated than those distant from each other.
Figure 12 shows a simulated random walk:


A Moving Average
Suppose that {Yt} is constructed as
Yt=et+et−12
where e’s are i.i.d. with zero mean and variance σ2e. Consequently, we have the following:
μt=0Var(Yt)=0.5σ2eCov(Yt,Yt−1)=0.25σ2e and Cov(Yt,Yt−k)=0 for k>1, and ρt,s=0.5 and ρt,t−k=0 for k>1.
Values of Y precisely onetime unit apart have exactly the same correlation no matter where they occur in time and, more generally, ρt,t−k is the same for all values of t. This leads us to the important concept of stationarity.
The following code chunk generates a random realisation of a moving average series in Figure 13.


Stationarity
The basic idea of stationarity is that the probability laws that govern the behavior of the process do not change over time. the assumption of stationarity allows us to make statistical inferences about the structure of a stochastic process on the basis of an observed record of that process. Specifically, a process {Yt} is said to be strictly stationary if the joint distribution of Yt1,Yt2,…,Ytn is the same as the joint distribution of Yt1−k,Yt2−k,…,Ytn−k for all choices of time points t1,t2,…,tn and all choices of time lag k.
It then follows that both the mean and variance functions are constant for all time: E(Yt)=E(Yt−k) and Var(Yt)=Var(Yt−k) for all t and k.
The covariance between Yt and Ys depends on time only through the time difference t−s and not otherwise on the actual times t and s.
Thus, for a stationary process, to simplify our notation, we can write
γk=Cov(Yt,Yt−k) and ρk=Corr(Yt,Yt−k)=γkγ0.
General properties of a stationary process are
γ0=Var(Yt)γk=γ−kγk≤γ0
and correspondingly,
ρ0=1ρk=ρ−kρk≤1.
If a process is strictly stationary and has a finite variance, then the covariance function must depend only on the time lag. A definition that is similar to that of strict stationarity but is mathematically weaker is the following: A stochastic process {Yt} is said to be weakly (or secondorder) stationary if
The mean function is constant over time, and γt,t−k=γ0,k for all time t and lag k. In this course, the term stationary when used alone will always refer to this weaker form of stationarity.
Figure 14 and 15 show a stationary and a nonstationary randomly generated series. Which one is for the nonstationary series?
White Noise
One of the important and mostly used stationary processes is called white noise process. A white noise process is actually a sequence of i.i.d. random variables {et}. many useful processes can be constructed from white noise.
For a white noise process, μt=E(et) is constant and γk=Var(et) for k=0 and γk=0 for k≠0. Thus, we can write ρk=1 for k=0 and ρk=0 for k≠0. We usually assume that the white noise process has mean zero and denote Var(et) by σ2e.
The moving average also defines a stationary process. For a moving average process, Yt=(et+et−1)/2, we have ρk=1 for k=0, ρk=0.5 for k=1, and ρk=0 for k≥2.
Figure 16 shows an example of a white noise series:


Random Cosine Wave
Let us define the following process:
Yt=cos[2π(t12+Φ)] for t=0,±1,±2,…
where Φ∼Uniform(0,1). Figure 17 displays the time series plot of this process:


Statistical properties of this process are
E(Yt)=∫10cos[2π(t12+ϕ)]f(ϕ)dϕ=∫10cos[2π(t12+ϕ)]⋅1dϕ=12πsin(2πt12+ϕ)∣∣∣10=12π[sin(2πt12+2π)−sin(2πt12)]=0
for all t. Also,
γt,s=12cos[2π(t−s12)].
So the process is stationary with the autocorrelation function
ρk=cos(2πk12) for t=0,±1,±2,…
With this example, we observe that it will be difficult to assess whether or not stationarity is a reasonable assumption for a given time series on the basis of the time sequence plot of the observed data.
As another example, the random walk constructed with Yt=e1+e2+⋯+et is not stationary due to nonconstant variance function Var(Yt=tσ2e also its covariance function γt,s=tσ2e does not depend only on time lag for 0≤t<s.
Instead of going on with Yt directly, we can consider the differences of successive Yvalues, denoted ΔYt. Then $Y_{t} = Y_{t}Y_{t−1}=e_{t}, so the differenced series, {ΔYt}, is stationary.
Many real time series cannot be reasonably modeled by stationary processes since they are not in statistical equilibrium but are evolving over time. However, we can frequently transform non stationary series into stationary series by simple techniques such as differencing.
Summary
In this module, we introduced a special data type called time series data. We use simple visualisations to plot time series data for descriptive purposes. Then we focused on the fundamentals of time series analysis such as mean function, autocovariance function, and autocorrelation function. We illustrated these concepts with the basic processes: the random walk, white noise, a simple moving average, and a random cosine wave. Finally, the fundamental concept of stationarity is introduced and the idea of differentiating time series to ensure stationarity is highlighted.