Introduction to Statistics (MAT/SST 115.03 2008S)
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
A number of people asked for an example of chi-square analysis in which neither variable is binary. So, let's explore the relationship between where Grinnell ranked in the student's choices (variable CHOICE, column 3) and whether the student reported that there was nothing better to do (variable REASON06, column 63).
We start by gathering our data.
>fys = read.csv("/home/rebelsky/Stats115/Data/first-year-survey-200x.csv")>CR6 = fys[,c(3,63)]>head(CR6)CHOICE REASON061 4 12 4 13 4 34 4 15 4 16 3 1>table(CR6)REASON06CHOICE 1 2 31 16 2 12 14 3 13 54 11 44 204 20 9
We'd like to turn this into a data frame. We are using REASON06 as the explanatory variable (columns) and choice as the response.
>CR6table = table(CR6)>CR6frame = data.frame(row.names = c("Less Than Third", "Third", "Second", "First"),+NotImportant = CR6table[,1],+SomewhatImportant = CR6table[,2],+VeryImportant= CR6table[,3])>CR6frameNotImportant SomewhatImportant VeryImportantLess Than Third 16 2 1Third 14 3 1Second 54 11 4First 204 20 9>chisq.test(CR6frame)Pearson's Chi-squared testdata: CR6frameX-squared = 4.5718, df = 6, p-value = 0.5998Warning message:In chisq.test(CR6frame) : Chi-squared approximation may be incorrect
Why do we get that warning message? Because some entries are too small. (Recall that to use a chi-square test, we need all of the entries to be at least five.) We also note that the chi square test says that these results are not improbable. Nonetheless, we press on with the example.
We find the expected value matrix.
>CR6expected = rowSums(CR6frame) %o% colSums(CR6frame)/sum(CR6frame)>CR6expectedNotImportant SomewhatImportant VeryImportantLess Than Third 16.14159 2.017699 0.8407080Third 15.29204 1.911504 0.7964602Second 58.61947 7.327434 3.0530973First 197.94690 24.743363 10.3097345>(CR6expected-CR6frame)^2/CR6expectedNotImportant SomewhatImportant VeryImportantLess Than Third 0.001242043 0.0001552554 0.03018165Third 0.109165028 0.6198377581 0.05201573Second 0.364034244 1.8407186525 0.29367706First 0.185100080 0.9093141909 0.16638687
Wasn't that fun?
Now, let's deal with the problem that some of the values are too small. We're going to need to combine the first two rows and the last two columns of the frame. Unfortunately, this is one of those problems for which I know no good solution, other than building it manually.
>newCR6frame = data.frame(row.names = c("Third or Less", "Second", "First"),+NotImportant = c(30, 54, 204),+Important = c(7, 15, 29))>newCR6frameNotImportant ImportantThird or Less 30 7Second 54 15First 204 29>chisq.test(newCR6frame)Pearson's Chi-squared testdata: newCR6frameX-squared = 4.0847, df = 2, p-value = 0.1297
Primary: [Front Door] [Syllabus] [Current Outline] [R] - [Academic Honesty] [Instructions]
Groupings: [Applets] [Assignments] [Data] [Examples] [Handouts] [Labs] [Outlines] [Projects] [Readings] [Solutions]
External Links: [R Front Door] [SamR's Front Door]
Copyright (c) 2007-8 Samuel A. Rebelsky.
This work is licensed under a Creative Commons
Attribution-NonCommercial 2.5 License. To view a copy of this
license, visit http://creativecommons.org/licenses/by-nc/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.