Introduction to Statistics (MAT/SST 115.03 2008S)

A Chi-Square Analysis Example


A number of people asked for an example of chi-square analysis in which neither variable is binary. So, let's explore the relationship between where Grinnell ranked in the student's choices (variable CHOICE, column 3) and whether the student reported that there was nothing better to do (variable REASON06, column 63).

We start by gathering our data.

> fys = read.csv("/home/rebelsky/Stats115/Data/first-year-survey-200x.csv")
> CR6 = fys[,c(3,63)]
> head(CR6)
  CHOICE REASON06
1      4        1
2      4        1
3      4        3
4      4        1
5      4        1
6      3        1
> table(CR6)
      REASON06
CHOICE   1   2   3
     1  16   2   1
     2  14   3   1
     3  54  11   4
     4 204  20   9

We'd like to turn this into a data frame. We are using REASON06 as the explanatory variable (columns) and choice as the response.

> CR6table = table(CR6)
> CR6frame = data.frame(row.names = c("Less Than Third", "Third", "Second", "First"),
+   NotImportant = CR6table[,1],
+   SomewhatImportant = CR6table[,2],
+   VeryImportant= CR6table[,3])
> CR6frame
                NotImportant SomewhatImportant VeryImportant
Less Than Third           16                 2             1
Third                     14                 3             1
Second                    54                11             4
First                    204                20             9
> chisq.test(CR6frame)
	Pearson's Chi-squared test
data:  CR6frame 
X-squared = 4.5718, df = 6, p-value = 0.5998
Warning message:
In chisq.test(CR6frame) : Chi-squared approximation may be incorrect

Why do we get that warning message? Because some entries are too small. (Recall that to use a chi-square test, we need all of the entries to be at least five.) We also note that the chi square test says that these results are not improbable. Nonetheless, we press on with the example.

We find the expected value matrix.

> CR6expected = rowSums(CR6frame) %o% colSums(CR6frame)/sum(CR6frame)
> CR6expected
                NotImportant SomewhatImportant VeryImportant
Less Than Third     16.14159          2.017699     0.8407080
Third               15.29204          1.911504     0.7964602
Second              58.61947          7.327434     3.0530973
First              197.94690         24.743363    10.3097345
> (CR6expected-CR6frame)^2/CR6expected
                NotImportant SomewhatImportant VeryImportant
Less Than Third  0.001242043      0.0001552554    0.03018165
Third            0.109165028      0.6198377581    0.05201573
Second           0.364034244      1.8407186525    0.29367706
First            0.185100080      0.9093141909    0.16638687

Wasn't that fun?

Now, let's deal with the problem that some of the values are too small. We're going to need to combine the first two rows and the last two columns of the frame. Unfortunately, this is one of those problems for which I know no good solution, other than building it manually.

> newCR6frame = data.frame(row.names = c("Third or Less", "Second", "First"),
+ NotImportant =  c(30, 54, 204),
+ Important = c(7, 15, 29))
> newCR6frame
              NotImportant Important
Third or Less           30         7
Second                  54        15
First                  204        29
> chisq.test(newCR6frame)
	Pearson's Chi-squared test
data:  newCR6frame 
X-squared = 4.0847, df = 2, p-value = 0.1297

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.