Dummy Variable and problems related to it

Hi,

Can someone explain what is it and how to solve questions related to this thing?Doesn’t make much sense to me.

A dummy variable can take the value 0 or 1 so you can think of it as an ON/OFF switch.

Assume you have the model Sales = b0 + b1*(Closed/Open) + e

Presumably it explains the volume of sales depending on whether the shop is Open or Closed. (Smartest econometrics model ever specified)

x1 (Closed/Open) can take only these two values:

0 = Closed 1 = Open

You would obviously expect a super strong positive correlation here. Now, our statistics program returns a b1 = 10 and a b0 = 0.2

Somebody asks you “What is the volume of sales when the shop is closed?”

Well, for a closed shop, our variable takes the value 0, so -> Sales = 0.2 + 10*0 + e -> 0.2

What happens if the shop is open? Switch on, the value is 1 so -> Sales = 0.2 + 10*1 + e-> 10.2

What if we have more than 2 states? Let’s say we need to specify sales as a function of the day of the week.

x1 = Monday (1 for YES, 0 for NO) x2 = Tuesday (1 for YES, 0 for NO) . . .

The logic is the same with one catch : Use always 1 less variable than the total list. So for 7 days, we need 6 variables. To see why, imagine we specify b1 = monday through to b6 = saturday and we omit sunday.

Now the model returns for example : Sales = b0 + b1x1 +b2x2 +…b6x6 +e

If somebody asks whats the sales for sunday, well all the variables b1 to b6 are 0 so Sales = b0 (the intercept)

Again with the ON/OFF concept, what is the sales for tuesday? Well switch off (0) every variable and leave on (1) only x2 : Sales = b0 + b2x2 + e

I hope this drives the intuition a bit…

A little elaboration in this: if the conditions for using dummy variables _ are not _ both mutually exclusive and collectively exhaustive (recall your Level I Quant), then for n conditions you use _ n _ dummy variables. If the conditions _ are mutually exclusive and collectively exhaustive_, then for n conditions you use n – 1 dummy variables. For example:

In a model for the price of a house we have dummy variables for whether the house has a fireplace, has a swimming pool, and is in a neighborhood that has a local park with play equipment. These three conditions are not mutually exclusive, so we need three dummy variables.

In a model for the price of a house we can have a two-car garage, a three-car garage, or a four-car garage (there are no houses without garages, with only one-car garages, nor with garages holding more than four cars). These three possibilities are mutually exclusive and collectively exhaustive, so we use _ two _ dummy variables. (Note: which two we use doesn’t matter; we can use one for the 2-car and another for the 3-car, one for the 2-car and another for the 4-car, or one for the 3-car and another for the 4-car.) If we mistakenly use three dummy variables in this situation, our model will suffer from multicollinearity.

i find this tuff

You’re not alone. If it weren’t tough, we’d have third-graders doing it. (And we’d find something else to do that _ is _ tough.)

s2000 bro but u explain very well smiley