RM | Unit 220 - Dummy coding with more than two categories
Book: Analysing data using linear models
Chapter 6: 6.6, 6.7
Chapter 6.6: Dummy coding for more than two groups
Take for instance a variable Country, where in your data set, there are three different values for this
variable, for instance, Norway, Sweden and Finland, or Zimbabwe, Congo and South-Africa. Let’s call
these countries A, B and C. We can code this Country variable with three categories into two dummy
variables in the following way. First, we create a variable countryA. This is a dummy variable, or
indicator variable, that indicates whether a person comes from country A or not. Those persons that do are
coded 1, and those that do not are coded 0. Next, we create a dummy variable countryB that indicates
whether or not a person comes from country B. Again, persons that do are coded 1, and those that do not
are coded 0.
Note that we have now for every value of Country (A, B, or C) a unique combination of the
variables countryA and countryB. All persons from country A have a 1 for countryA and a 0 for
countryB; all those from country B have a 0 for countryA and a 1 for countryB, and all those from
country C have a 0 for countryA and a 0 for countryB. Therefore a third dummy variable countryC 197 is
not necessary (i.e., is redundant): the two dummy variables give us all the country information we need.
Chapter 6.7: Analysing categorical predictor variables in R
Check chapter in the book
Book: Analysing data using linear models
Chapter 6: 6.6, 6.7
Chapter 6.6: Dummy coding for more than two groups
Take for instance a variable Country, where in your data set, there are three different values for this
variable, for instance, Norway, Sweden and Finland, or Zimbabwe, Congo and South-Africa. Let’s call
these countries A, B and C. We can code this Country variable with three categories into two dummy
variables in the following way. First, we create a variable countryA. This is a dummy variable, or
indicator variable, that indicates whether a person comes from country A or not. Those persons that do are
coded 1, and those that do not are coded 0. Next, we create a dummy variable countryB that indicates
whether or not a person comes from country B. Again, persons that do are coded 1, and those that do not
are coded 0.
Note that we have now for every value of Country (A, B, or C) a unique combination of the
variables countryA and countryB. All persons from country A have a 1 for countryA and a 0 for
countryB; all those from country B have a 0 for countryA and a 1 for countryB, and all those from
country C have a 0 for countryA and a 0 for countryB. Therefore a third dummy variable countryC 197 is
not necessary (i.e., is redundant): the two dummy variables give us all the country information we need.
Chapter 6.7: Analysing categorical predictor variables in R
Check chapter in the book