1 00:00:00,000 --> 00:00:04,166 2 00:00:04,166 --> 00:00:10,466 Hello and welcome to a video on chi-squares using Excel. Today we're going to be talking about 3 00:00:10,466 --> 00:00:10,499 4 00:00:10,500 --> 00:00:16,600 how to perform a chi-square analysis. So first let's take a look at the formula for chi-square which is 5 00:00:16,600 --> 00:00:21,366 in the center of our screen here. This is the symbol for chi-square and 6 00:00:21,366 --> 00:00:23,499 7 00:00:23,500 --> 00:00:30,100 then we have equals the sum of in parentheses observed minus expected squared 8 00:00:30,100 --> 00:00:30,433 9 00:00:30,433 --> 00:00:36,533 all over expected. So we're going to find this value and then sum it up. 10 00:00:36,533 --> 00:00:39,099 11 00:00:39,100 --> 00:00:45,166 O stands for observed frequencies and east stands for expected frequencies. So what are our 12 00:00:45,166 --> 00:00:51,199 observed frequencies? Well that's typically the data that we already have in 13 00:00:51,200 --> 00:00:57,333 a contingency table. A contingency table is a two by 14 00:00:57,333 --> 00:00:57,366 15 00:00:57,366 --> 00:01:03,266 two or three by three or some other arrangement of numbers 16 00:01:03,266 --> 00:01:03,399 17 00:01:03,400 --> 00:01:09,633 in a table form such as this. Today we have a two by two contingency table that we'll 18 00:01:09,633 --> 00:01:09,666 19 00:01:09,666 --> 00:01:13,532 be using and it's two by two because it is two rows and two columns. 20 00:01:13,533 --> 00:01:16,933 21 00:01:16,933 --> 00:01:23,233 Now the first thing that we'll need to do now that we have our observed values 22 00:01:23,233 --> 00:01:27,599 is find out what exactly are our expected values. 23 00:01:27,600 --> 00:01:29,466 24 00:01:29,466 --> 00:01:36,232 So to do that we'll first 25 00:01:36,233 --> 00:01:40,933 26 00:01:40,933 --> 00:01:47,199 need our marginal totals. So to find the marginal total you just add the column 27 00:01:47,200 --> 00:01:48,666 28 00:01:48,666 --> 00:01:54,832 that you're looking at. So 10 is the marginal total for this column. I'm just going to click 29 00:01:54,833 --> 00:02:01,166 and drag this over. So the marginal total for this column is also 10. 30 00:02:01,166 --> 00:02:01,832 31 00:02:01,833 --> 00:02:07,266 All right. Now you 32 00:02:07,266 --> 00:02:12,466 33 00:02:12,466 --> 00:02:18,799 can see we've got the same numbers 34 00:02:18,800 --> 00:02:24,933 over here that we do over here. We're just going to do this exact problem step by step. 35 00:02:24,933 --> 00:02:27,566 36 00:02:27,566 --> 00:02:33,032 So now that we have our marginal totals we can find our expected values. 37 00:02:33,033 --> 00:02:34,766 38 00:02:34,766 --> 00:02:40,966 We find our expected values. By multiplying our marginal totals together and dividing 39 00:02:40,966 --> 00:02:47,199 them by the total of those marginal totals. So for example, 40 00:02:47,200 --> 00:02:53,333 for this first box, we'll look at the same box up here. So we're in low, low, we look at low, 41 00:02:53,333 --> 00:02:53,366 42 00:02:53,366 --> 00:02:59,899 low. And that's a seven. So the marginal 43 00:02:59,900 --> 00:03:05,666 total for seven, there are two, is one of them is 10. The other one 44 00:03:05,666 --> 00:03:08,399 45 00:03:08,400 --> 00:03:13,833 is 11. So we'll take 10 times 11 46 00:03:13,833 --> 00:03:14,933 47 00:03:14,933 --> 00:03:20,933 and divide it by 20. 20 is the total of all of our marginal 48 00:03:20,933 --> 00:03:27,133 totals. And we get 5.5. Let's do the same thing 49 00:03:27,133 --> 00:03:27,166 50 00:03:27,166 --> 00:03:33,499 again, but for the next one. So now we have low high. Low high is 4 up here. Our marginal 51 00:03:33,500 --> 00:03:33,866 52 00:03:33,866 --> 00:03:40,132 totals you're 10 and 11 once more. So we're going to multiply those together and divide 53 00:03:40,133 --> 00:03:42,233 54 00:03:42,233 --> 00:03:43,966 by 20. 55 00:03:43,966 --> 00:03:48,332 56 00:03:48,333 --> 00:03:54,566 Okay, let's do that again. Now we're on high 57 00:03:54,566 --> 00:03:54,599 58 00:03:54,600 --> 00:04:00,600 low. High low is 3. So we're going to 59 00:04:00,600 --> 00:04:00,633 60 00:04:00,633 --> 00:04:05,966 take 10 times 9 divided by 20. 61 00:04:05,966 --> 00:04:13,366 62 00:04:13,366 --> 00:04:19,266 Okay, now one more time 10 times 9 divided 63 00:04:19,266 --> 00:04:19,932 64 00:04:19,933 --> 00:04:25,999 by 20. Okay, now we have our expected values. We can see these are the same ones 65 00:04:26,000 --> 00:04:32,066 that we got before for this problem. So we're still still on track. The next thing we need to find 66 00:04:32,066 --> 00:04:38,132 is observed minus expected squared. This 67 00:04:38,133 --> 00:04:38,166 68 00:04:38,166 --> 00:04:44,199 top part of our problem right here. Okay, 69 00:04:44,200 --> 00:04:48,566 70 00:04:48,566 --> 00:04:54,632 so the observed value, let's 71 00:04:54,633 --> 00:04:54,666 72 00:04:54,666 --> 00:04:59,566 open up a formula equals. Our observed value, 73 00:04:59,566 --> 00:05:00,799 74 00:05:00,800 --> 00:05:06,733 oops, do this in parentheses, observed value, 75 00:05:06,733 --> 00:05:07,933 76 00:05:07,933 --> 00:05:14,199 minus expected, or sorry, we did that backwards. Observed value, 77 00:05:14,200 --> 00:05:14,566 78 00:05:14,566 --> 00:05:19,466 minus expected value, raised to the power of to hit 79 00:05:19,466 --> 00:05:21,732 80 00:05:21,733 --> 00:05:28,099 enter, get to 0.25, one more time, observed value, 81 00:05:28,100 --> 00:05:34,133 got to put that in parentheses. Observed value, minus expected value, raised 82 00:05:34,133 --> 00:05:39,999 to the power of two, one more time, 83 00:05:40,000 --> 00:05:40,166 84 00:05:40,166 --> 00:05:46,299 this time I'm going to remember the parentheses. Observed value, minus expected value, 85 00:05:46,300 --> 00:05:46,666 86 00:05:46,666 --> 00:05:51,699 raised to the power of two, okay, and now 87 00:05:51,700 --> 00:05:53,166 88 00:05:53,166 --> 00:05:59,699 our final one, observed, minus expected raised 89 00:05:59,700 --> 00:06:01,100 90 00:06:01,100 --> 00:06:03,900 to the power of two. 91 00:06:03,900 --> 00:06:09,700 92 00:06:09,700 --> 00:06:15,700 So now that we have this top part of our formula, we can divide it by expected. So 93 00:06:15,700 --> 00:06:21,833 now we're going to have observed minus expected, raised to the power of two, divided 94 00:06:21,833 --> 00:06:21,866 95 00:06:21,866 --> 00:06:27,032 by expected. To do that, all we have to do is 96 00:06:27,033 --> 00:06:28,299 97 00:06:28,300 --> 00:06:34,533 take this value and divide it by our expected value. Let's do that one more time 98 00:06:34,533 --> 00:06:35,633 99 00:06:35,633 --> 00:06:41,266 for each box, divided by expected. You're going to line up your 100 00:06:41,266 --> 00:06:41,799 101 00:06:41,800 --> 00:06:45,666 boxes this way, tables, 102 00:06:45,666 --> 00:06:50,166 103 00:06:50,166 --> 00:06:52,099 divided by expected. 104 00:06:52,100 --> 00:06:56,800 105 00:06:56,800 --> 00:07:01,833 Okay, so we've got our 106 00:07:01,833 --> 00:07:03,899 107 00:07:03,900 --> 00:07:10,066 observed minus expected values squared, and then divided by expected right here. 108 00:07:10,066 --> 00:07:11,132 109 00:07:11,133 --> 00:07:17,533 Now, to finish our chi-square, all we need to do is sum these 110 00:07:17,533 --> 00:07:23,633 numbers together. Okay, and we 111 00:07:23,633 --> 00:07:23,666 112 00:07:23,666 --> 00:07:29,666 got 1.8182, which is the same number we got over here. Now, if you wanted to 113 00:07:29,666 --> 00:07:35,799 know the significance of this number, there are two different ways you could do that. The first is you could consult 114 00:07:35,800 --> 00:07:35,833 115 00:07:35,833 --> 00:07:41,066 a table for chi-square and plug in your degrees of freedom and this number, 116 00:07:41,066 --> 00:07:41,866 117 00:07:41,866 --> 00:07:47,532 the other is that you could use Excel to find the p-value. 118 00:07:47,533 --> 00:07:50,699 119 00:07:50,700 --> 00:07:56,833 And it has two different functions that you can use for this. The first is chi-test, 120 00:07:56,833 --> 00:07:56,866 121 00:07:56,866 --> 00:08:03,799 which is an outdated function, but it does still work, and the second is chi-square.test. 122 00:08:03,800 --> 00:08:03,833 123 00:08:03,833 --> 00:08:09,899 So, if you have a newer version of Excel and it long no longer has chi-test, chi-square.test works 124 00:08:09,900 --> 00:08:09,933 125 00:08:09,933 --> 00:08:15,999 the exact same way. Now, to use chi-square.test, it acts as for our actual range, it 126 00:08:16,000 --> 00:08:22,066 means our observed range comma expected range. So, you 127 00:08:22,066 --> 00:08:22,099 128 00:08:22,100 --> 00:08:28,266 just select all of your expected values, and then you close up where at the C. And 129 00:08:28,266 --> 00:08:34,432 you get 0.17753. You can see we've got a couple different options for chi-square 130 00:08:34,433 --> 00:08:37,466 131 00:08:37,466 --> 00:08:42,366 testing here, just out of 132 00:08:42,366 --> 00:08:47,032 133 00:08:47,033 --> 00:08:52,866 curiosity. Let's see what they do. No, we're not 134 00:08:52,866 --> 00:08:56,399 135 00:08:56,400 --> 00:09:02,800 doing that. Okay. I was hoping maybe we would find something it could do a chi-square 136 00:09:02,800 --> 00:09:09,000 for us, but that does not seem to be the case. Okay, this is all that you need to know how to do in order to do the chi-square 137 00:09:09,000 --> 00:09:09,033 138 00:09:09,033 --> 00:09:15,299 for your homework once you've collapsed your variables. Please let us know if you run into any problems doing 139 00:09:15,300 --> 00:09:15,333 140 00:09:15,333 --> 00:09:21,799 the chi-square. Um oh one more thing before I let you go. Uh the p-value here. 141 00:09:21,800 --> 00:09:28,200 We interpret this in one of two ways. You either reject the null hypothesis, or you fail to reject 142 00:09:28,200 --> 00:09:28,233 143 00:09:28,233 --> 00:09:34,199 the null hypothesis. And you'll know from your homework and from the class that we 144 00:09:34,200 --> 00:09:34,733 145 00:09:34,733 --> 00:09:40,866 reject the null hypothesis, if that p-value is less than 0.05, if our alpha is 146 00:09:40,866 --> 00:09:47,566 0.05. And if our alpha is 0.05, and it's greater than 0.05, 147 00:09:47,566 --> 00:09:47,999 148 00:09:48,000 --> 00:09:54,266 than we fail to reject the null hypothesis. Okay, you should be all set. 149 00:09:54,266 --> 00:09:54,399 150 00:09:54,400 --> 00:09:59,433 Have fun learning Excel.