Time Series

In statistics, a “time series” could really be any sequence of values. The name arises because in the cases of interest the values are usually numbers arising from a sequence of successive measurements.
The theory of “time series analysis” then provides tools for expressing and testing various hypotheses about the nature of the data.

Statistics:
In general what we want to know is whether or not some statement or hypothesis about the source of the observed data is true or not, but statistics does not do that for us. Statistics just tells us whether or not the data we have observed should make us suspect that the statement is false – and just not having a good reason to disbelieve something is not really (by itself) a good reason to believe it.
(The hypothesis we want to test is often called the “null hypothesis” – I guess because it is the idea we are starting with so not rejecting it means there is no need to change our views)

Of course, as always in Statistics, there is the problem that when dealing with continuous random variables any particular value has probability zero. So when we want to express the idea that a result is “unlikely” to come from a certain model we are not referring to its actual probability in that model but to something else. In the case of a single real variable what we often look at if the probability in the model of having a value at least as far from the mean as the one observed (this probability is often called a “P-value”) or just ask whether the P-value is less than some small number (called the confidence level). When the P-value for the observed value is less than the confidence level we say that the observation conflicts with the model at confidence level c.

Note to reject H0 at the 5% level does NOT mean that we can say the probability of H0 being true is just 5%. What is says is just that if H0 is true then the probability of seeing something as far from the predicted mean as we did is just 5% (or that we will only see something that extreme one time in twenty). By itself this says nothing about the probability of H0 being true and in fact such a probability may be a meaningless concept if our situation has not been selected at random from a whole family of situations where some of them have H0 true and others have it false.

Of course this only works for a unimodal distribution, but fortunately the averages of many results tend to be not just unimodal but close to a very specific distribution (called the normal distribution). In a more complex situation, especially if we are not averaging over a large sample of similarly generated values, it may be that a better approach would be to look instead at the model’s probability density in the neighborhood of the observed value. But we’ll see later if that is necessary.

Statistics normally deals with situations involving some uncertainty. In the simplest cases this may arise from measurement error or because our observations involve choosing random elements from a population but it could also result from some uncontrolled external factors contributing to the quantity we are measuring,

In order to reflect this, typically the null hypothesis has deterministic and stochastic aspects – for example it may be that the

Probability:

When we are uncertain as to whether a proposition is true or false, the idea of how “likely” it is to actually be true is something we can sometimes attempt to quantify with the idea of “probability”. But if I just say “I am 90% certain” of something, that won’t mean anything to the listener unless we have a common understanding of how that number is arrived at. So we need a more precise non-subjective definition. One way of defining the concept of probability in a non-subjective way is to use it only when we agree that a certain finite set of mutually exclusive propositions are equally likely and the proposition of interest can be expressed as a combinations of simple statements about which of the equally likely cases are true. (All of the possible scenarios of a card game can be assigned probabilities this way so long as we start with the basic agreement that when any set of cards is properly shuffled all possible cases for the top card are equally likely.)
Sometimes when we don’t have any finite set of equally likely cases we may still have good reason to agree on a starting point other than all cases being equally likely. For example, if a quantity is the result of combining many small independent effects with well defined probability distributions for each of them, then it can be shown that under fairly general assumptions the combined result will have a distribution very close to a standard type (which is often called the “normal” distribution for that very reason). Once we have a context in which there is a well-defined concept of probability, it can be shown that if we have many independent copies of the situation (either concurrently or sequentially in time) then the proportion of cases in which a proposition is true will have a very high probability of being close to the probability of that proposition in a single copy of the situation. This gives us a way of estimating the probability of a complex event by observation rather than calculation, but conceptually this is a bit circular since we can never be sure of any bound on how far off our estimate is – only that it has a low probability of being far from the “true” value.

Regardless of whether one takes this latter “frequentist” view, or the approach based on starting with an agreed “sample space”, it is important to be very precise about what the proposition whose probability we are discussing really is. A good example of how the language can be abused is in the meteorologist’s familiar “probability of precipitation” numbers. There is only one tomorow, and if the physics of weather is deterministic then an entity with sufficient knowledge and computing power should know now for sure whether it will or will not rain tomorrow so there is no probability value involved (other than 0 or 1) . What the weatherperson probably really means by giving a p.o.p. of 25% is that, among all the days on record for which whatever set of observations thay have chosen match those of today, one in four had rain the following day. This is not actually “the” probability of rain tomorrow, as it depends on the choices of measurement variables and the precisions with which they are compared. But it soes mean something of course. If the meteorologists have done their work properly it should be true that if every time they say p.o.p.=25%, you bet to win $3 if it rains and lose $1 if it doesn’t, then you can expect to break even “in the long run”. But if someone with a better meteorologist has the option of refusing your game except when they think the chances are lower than 25% then you will lose. Neither your meteoroligst’s estimate nor the better one is actually “the” p.o.p.,and as I said above, there really is no such quantity (except in the sense of it being either 1 or 0 and our not knowing which). In a deterministic world no actual event (in the normal sense of the word) has a probability other than 0 or 1 and other cases arise only by considering large classes of events (like all possible repetitions of a game or experiment).

Some people like to think of probability in cases like this as properties not of the event itself but of our state of knowledge, and in a sense that is valid. But put that baldly it makes it very tempting to assume wrongly that just because we don’t know which of several cases applies we can treat them as equally likely. One famous example of this is the ‘Monty Hall” problem which embarrasingly confounded many people including an otherwise very good mathematician – Paul Erdos. Another is the so-called “Tuesday’s boy” problem which asks whether the probability of someone with two children baving one be a girl given that one is a boy is changed by specifying that the boy was born on a Tuesday. In both of these the apparent “paradox” is resolved by carefully specifying the exact experiment or game to which the “probability” in question applies.

Any competent professional who asserts a probability value should be able to quickly answer the question “what is the experiment or game to which that probability applies?” Failure to do so is a sure indicator that the person is either incompetent or trying to pull the wool over your eyes.

When we describe an experiment as “repeatable” we are referring just to whatever sequence of activities and recorded observations that we perform – which may include controlling for duplication of any number of ambient factors but not necessarily all of them. When an ambient factor that we had not previously held fixed is observed to affect the outcome we may revise the experiment to control for it too, but we will never capture all of them.
So whenever we try a repeatable experiment such as a coin toss there are factors influencing the outcome which we do not control. This means that the outcome may not always be the same. In fact for a coin toss most of us are incapable of identifying and controlling so many factors that we have no control whatsoever of the outcome.
In terms of what we consciously do all such tosses are identical and any particular toss is like a random selection from the set of all possible tosses.
Probability theory provides a way of describing and dealing with the uncertainty of results that arise when we are given one out of a number of cases with no way of knowing in advance which case we are given.
The analysis is easiest when we have a finite number of cases all of which are “equally likely” (I have put “equally likely” in quotes because it is really at this point an undefined term – and may well remain so as attempts to define it in terms of ultimately equal frequencies or lack of knowledge or whatever are all unsatisfactory), but the starting point can be taken more generally to be any family of cases with some assignment of assumed “probabilities” to its subsets(also undefined like “equally likely” but again supposed to have something to do with ultimate frequency) where the intuitive idea that the chances of having at least one or other of two independent possibilities is the sum of the chances of each one separately is reflected in the fact that the probability values define a measure on the sample space with total measure being 1 …etc.etc.
The main point I want to make here about probability is that it applies to a repeatable experiment or game and not to an event itself without regard to the context in which we are considering it. Failure to understand this is at the root of many apparent “paradoxes” involving probability such as the Tuesday’s child, the magic envelope, and the Monty Hall problem

Time Series:
One natural question to ask about a time series is whether or not it is consistent with random variation about a fixed mean, and if that is rejected (I’d like to say “ruled out” but as noted above in statistics rejection is never absolute) then a next possible question is whether it is consistent with random variation about some non constant deterministic function of index number (or “time”).

Note: Any time series of data is consistent with any completely deterministic model which happens to exactly match its data points (eg any sequence of n values can be matched exactly with a polynomial of degree n) but the goal is generally to attribute as much of the variation as possible to randomness and have correspondingly as few deterministic parameters as possible.

Leave a Reply