Hi,
Can someone explain what is it and how to solve questions related to this thing?Doesn’t make much sense to me.
Hi,
Can someone explain what is it and how to solve questions related to this thing?Doesn’t make much sense to me.
A dummy variable can take the value 0 or 1 so you can think of it as an ON/OFF switch.
Assume you have the model Sales = b0 + b1*(Closed/Open) + e
Presumably it explains the volume of sales depending on whether the shop is Open or Closed. (Smartest econometrics model ever specified)
x1 (Closed/Open) can take only these two values:
0 = Closed 1 = Open
You would obviously expect a super strong positive correlation here. Now, our statistics program returns a b1 = 10 and a b0 = 0.2
Somebody asks you “What is the volume of sales when the shop is closed?”
Well, for a closed shop, our variable takes the value 0, so -> Sales = 0.2 + 10*0 + e -> 0.2
What happens if the shop is open? Switch on, the value is 1 so -> Sales = 0.2 + 10*1 + e-> 10.2
What if we have more than 2 states? Let’s say we need to specify sales as a function of the day of the week.
x1 = Monday (1 for YES, 0 for NO) x2 = Tuesday (1 for YES, 0 for NO) . . .
The logic is the same with one catch : Use always 1 less variable than the total list. So for 7 days, we need 6 variables. To see why, imagine we specify b1 = monday through to b6 = saturday and we omit sunday.
Now the model returns for example : Sales = b0 + b1x1 +b2x2 +…b6x6 +e
If somebody asks whats the sales for sunday, well all the variables b1 to b6 are 0 so Sales = b0 (the intercept)
Again with the ON/OFF concept, what is the sales for tuesday? Well switch off (0) every variable and leave on (1) only x2 : Sales = b0 + b2x2 + e
I hope this drives the intuition a bit…
A little elaboration in this: if the conditions for using dummy variables _ are not _ both mutually exclusive and collectively exhaustive (recall your Level I Quant), then for n conditions you use _ n _ dummy variables. If the conditions _ are mutually exclusive and collectively exhaustive_, then for n conditions you use n – 1 dummy variables. For example:
In a model for the price of a house we have dummy variables for whether the house has a fireplace, has a swimming pool, and is in a neighborhood that has a local park with play equipment. These three conditions are not mutually exclusive, so we need three dummy variables.
In a model for the price of a house we can have a two-car garage, a three-car garage, or a four-car garage (there are no houses without garages, with only one-car garages, nor with garages holding more than four cars). These three possibilities are mutually exclusive and collectively exhaustive, so we use _ two _ dummy variables. (Note: which two we use doesn’t matter; we can use one for the 2-car and another for the 3-car, one for the 2-car and another for the 4-car, or one for the 3-car and another for the 4-car.) If we mistakenly use three dummy variables in this situation, our model will suffer from multicollinearity.
i find this tuff
You’re not alone. If it weren’t tough, we’d have third-graders doing it. (And we’d find something else to do that _ is _ tough.)
s2000 bro but u explain very well