Time Series w multiple lagged indep variables...

How is a time series regression equation with multiple lagged independent variables/periods formed (say AR(2)), and how is the coefficient on that second lagged period determined or different from the first ?? I guess im wondering, if it’s all the same data (time series) from which the equation is based, how are the coefficients different - why would the slopes be different??

It’s easier to tell with a traditional multiple regression bc you’re comparing one or more independent variables with the dependent variable. But in this case, aren’t you relying on all the same data to generate the regression equation.

If the problem says there are 40 observations (say 40 quarters), why dos the equation not include 40 lagged periods, that is AR(40)?

One last somewhat related question: what if the lagged coefficient is greater than 1 ? The time series has a finite mean reverting level if the abs of B1 is < 1. What if it’s greater than 1, or will it never be great than 1 ??

Thanks for any help! Hope that’s clear.

A time series regression with multiple lagged values of the dependent variable can be understood in the traditional sense of a regression equation.Suppose we wanted to understand how the GDP at time t is related to past values of itself : GDP(t) = alpha+beta*GDP(t-1)+gamma*GDP(t-2)+error Now, it is possible for the GDP(t) NOT to be related to its one period lag at all (i.e beta is not different from 0 in a statistical sense) but highly dependent on its two period lagged values (i.e.gamma is statistically different from 0). Hence it is quite intuitive that the slope coefficients of an AR§ model may differ. Other than reasons of parsimony (keeping a model as simple as possible without sacrificing rigour) and making sure the regression assumptions are in check, i guess there is no intrinsic reason why you shouldn’t use the 40 observations. But you want to keep the model as small and as tractable as possible…if you can explain alot of the variation in the dependent variable by using only 2 or 3 lags and assuming the error term conforms to the classic assumptions…you should be good to go. Regarding the last question…in time series regression coefficients only make sense if the series we are examining are stationary, which is to say that the mean and variance of the series stay constant over time. in an AR(1) model: 1.If the absolute value of the coefficient is 1 then we have a unit root (non stationary) the variance of this series increases without limit 2.if the absolute value of the coefficient is <1 then we have a mean reverting process (stationary) here we have some finite mean and variance for the series 3.if the absolute value of the coefficient is >1 then we have an explosive series no need to bother about this one …only the first two types of time series can be found in econ/finance

Regarding “3.if the absolute value of the coefficient is >1 then we have an explosive series no need to bother about this one …only the first two types of time series can be found in econ/finance”, do you have references? I was wondering about this case as well and why the CFAI book doesn’t mention it. Btw, happy new year!

Well I suppose you could find some info on googlebooks …for example this page talks a little about unitroots and the footnote talks about explosive roots with some further references. http://books.google.co.uk/books?id=OQzbF5-a_7UC&pg=PA404&lpg=PA404&dq=absolute+value+of+coefficient+greater+than+1+is+explosive+series&source=bl&ots=wqYdMRPCfU&sig=PSrlO7Xjqc7tXKel9QDT_JmD_tc&hl=en&sa=X&ei=CTMAT4SALMb98gOknOn9CQ&ved=0CB4Q6AEwAA#v=onepage&q=absolute%20value%20of%20coefficient%20greater%20than%201%20is%20explosive%20series&f=false The simplest reference is real world data…look at inflation series,GDP series,stock returns etc…and you will see that only the first two types are really found in the real world…if these series had an explosive root…the variance would increase exponentially with time but in reality the variance of lets say stock returns are high in some periods but never continue to be high forever after, which, if i understand correctly would be the case if the series were explosive… Having said that, we could always transform a nonstationaly series into a stationary one… the CFAI book only scratches the surface of anything it touches so dont expect too much from it :slight_smile: happy new year all!

In practice, you can start out with 40 (or say 20), but you’ll quickly find out that the ones longer back are not statistically significant. Why: because data tends to be cyclical, say the first 1,2,3, then 4 (if quartely) or 12 (if monthly) are most significant. The others influence the current data through the more recent only. The 8th serie influences through the 4th serie, e.g., x(0) is indirectly f (x(8)) because x(0) = g(x(4)) and x(4) = g(x(8))

re your first point, if you take it back to basics, all regressions are trying to explain the dependent variable by looking at the independent variables. AR models are no different, they are just using lagged values of the dependent variable instead of other candidates. to use Alladin’s example, an AR model for GDP would try to explain GDP in the current quarter by looking at GDP in previous quarters. the coefficients are different because they’re measuring the relationship between different things. eg for GDP, the coefficient for t-1 is based on the relationship between current GDP and GDP in the previous quarter. Whereas the coefficient for t-2 is based on the relationship between current GDP, and GDP two quarters ago. hope that makes sense.

I guess my confusion is even more basic. For a single regression or even a multiple regression, I can visually see the process by starting with a scatter plot of (x,y) pairs and determining the regression equation from the chart, but how does it work with a time series ?? I assume you plot the, say 40 quarterly, sales data and run the regression to generate an equation based on a plot with 40 points but, how does it specifically tie or single out the t-1 point from that. Isn’t the slope based on the entire plot and not just t-1 — how do you determine the variation from t-1 (or t-4 for that matter) when the regression is based on all 40 prior points ?? And lastly, why would the regression not have to be AR(40) since you’re relying on 40 prior occurrences? Thanks for any help and sorry for the litany of questions, but I think they pretty much all tie back to the basic question of how the process is constructed. THANKS AGAIN!

In a sense, aren’t all regressions, and specifically time series based models, based on past or lagged values…so how is an AR model any different?? Is the graph/scatter plot that an AR model applied to any different than the graph of a typical trend model?

My understanding of the concept is in AR models the previous value of the same variable represents the current value. For this reason mean reversion is necessary. You see if there is AR(1) then it means that this current value is fully described or represented by the previous value. How it is tested is another area let’s first understand the concept. If there are 4 values. 4th value represents 3rd which in turn represents 2nd and which represents 1st then the model is AR(1). If on the other hand the 3rd value defines 2nd which in turn defines 1st but the 4th value is not defined by the 3rd then we could check whether every 4th is not defined? If it is so then we could add another lagged variable previous to the 1st value. If adding that value fits in the equation and the autocorrelation becomes in significant then we could interpret it in a way that the 4th value is not fully described by the 3rd value but also by the value prior to 1st. This model would be termed as AR(2) as it contains two lagged variables. If b<1 it means that the value is mean reverting x(t) = 50 + 0.5x(t-1) Case 1 X>mean (100) suppose x(t-1) = 104 then x = 50 + 0.5(104) = 102 put this value again x(t+1) = 50 + 0.5(102) x(t+1) = 101 x(t+2) = 50 + 0.5 (101) x(t+2) = 100.5 Case 2 x<100 (mean) x = 50+0.5x(t-1) x(t-1) = 90 x = 50 + 0.5(90) x = 50 + 45 x = 95 x(t+1) = 50 + 0.5(95) x(t+1) = 50 +47.5 x(t+1) = 97.5 You see the value of x is decreasing and it is reverting to its mean 100 when x > 100 whereas it is increasing and reverting to 100 when x<100 when b = 1 Unit root then the current value is the best estimation of the next value and the error term is random and the time series experiences random walk (like the case of index) when b >1 then it is not a linear regression as 1.1 means increasing by 10% x(t) = 10 + 1.5x(t-1) suppose x(t-1) = 24 then x = 10 + 1.5(24) x = 10 + 36 x = 46 x(t+1) = 10 + 1.5 (46) x(t+1) = 10 + 69 x(t+1) = 79 It is exponential increase of 50%. Hope it helps…

If a question says an analyst uses 40 observations (say quarterly observations) and is applying an AR model, how can they use an AR(1) model instead of an AR(40) model ? How can you develop a slope or run a regression if you’re only using the most recent outcome to explain the current? What is on the axes (x,y) if you graphed the data?

i think you are overthinking this :stuck_out_tongue: maybe check out some vids on youtube to get a more visual take on the issues http://www.youtube.com/watch?v=kJ_Os5iP0IA&feature=related

Haha, I do tend to over think things. Thanks for the link!

I’m sorry but the youtube clips didn’t do much to address my questions regarding AR models. I guess im trying to visualize the process by thinking about the regression using say AR(1) against a time series of data. What would go on the x and y axes ? Is the x-axis the lagged values and the y-axis the current. If for example, the question says you have a 40 observations, would these historical outcomes be graphed first and then the model applied against the data ?? If so, how can an AR(1) model be used if you’re really looking at 40 historical/lagged outcomes…wouldn’t you have to use AR(40) ? How is a relationship (slope) established for any particular t-p? Would an AR(1) regression only compare the current and immediately preceding variable, and the slope is based on that line? If so, do you even need the other prior 39 data points ? But then again, you don’t know the current value… Is it like any other time series but the indep variable is a lagged value and not just time? Sorry guys! I feel like i know this but i refuse to accept it and am just confusing myself.

BMiller12 You are confused. Here is an example, say a series of 8 data points. -0.3002 -1.2777 0.2443 1.2765 1.1984 1.7331 -2.1836 You want to run an autocorrelation, and want to test AR(1), so you run this data pair series of 7 pairs Y X -0.3002 -1.2777 -1.2777 0.2443 0.2443 1.2765 1.2765 1.1984 1.1984 1.7331 1.7331 -2.1836 Just like you would with any X Y data series, e.g., this AR(1) has a correlation of -0.03. You seems to be confused over the term: AR(1) means the series compared to itself with ONE time lagged (here all of 7 data points compared to itself lagged one time period), NOT ONE data point with another datapoint (e.g., the last data point with the second to last), therefore it is NOT called AR(8) as you seem to be calling it. Just to make it clearer, AR(2) would be Y X -0.3002 0.2443 -1.2777 1.2765 0.2443 1.1984 1.2765 1.7331 1.1984 -2.1836

elcfa, that is awesome. Great examples! I really do appreciate it. I was having trouble understanding how the data formed pairs (x,y) and I was further treating each observation as an indep variable or lag, when its the lag (time) between the data points that each indep variable is based on (hence my confusion wanting to call a problem with 40 observations AR(40)). It seems so obvious and intuitive now, thanks again! I guess one quick last question: Does each additional lagged indep variable (increasing the order), impact the slope and intercept of the preceding order - like when you add additional indep variables to a multiple regression model, they have a net effect on the preceding indep variable’s measurement ?? Many thanks!

Adding more n normally does improve the accuracy, but not always. An example in case, for a perfect seasonal data series where you have repeated sales cycle (e.g, a sinus sales curve with sales (Jan this yr) =sales (Jan last yr), sales (Feb this yr) =sales (Feb last yr), the obvious perfect forecast would be AR(12), but if you start from AR(1) to AR(12), you’ll see the adjusted R2 vary significantly. In reality, you don’t add them indiscriminately. You would try first to see which coefficients are most significant, then try to see those coefficients make sense (based on the nature of the data), then try to run different combos and choose one model both make most sense and highest mathematical accuracy.