1 00:00:00,000 --> 00:00:00,400 2 00:00:00,400 --> 00:00:06,700 Hello, and welcome to a video on calculating odds in risk ratios in Python. Today, we won't be loading in any data. 3 00:00:06,700 --> 00:00:06,733 4 00:00:06,733 --> 00:00:12,866 So let's start with loading our libraries and get right to it. You can see from scypy.stats, we're importing Fisher exact 5 00:00:12,866 --> 00:00:19,066 and we're importing NumPy as NP, so we'll go ahead and load those in. First, let's calculate 6 00:00:19,066 --> 00:00:19,099 7 00:00:19,100 --> 00:00:25,500 the odds and risk ratio manually. Let's use some variables from an example problem. Let's say we have a contingency 8 00:00:25,500 --> 00:00:25,533 9 00:00:25,533 --> 00:00:31,599 table like this, where A equals 40, B equals 60, C equals 30, and D equals 10 00:00:31,600 --> 00:00:38,033 70. You can see we're using a new notation to put the information into these variables this time. 11 00:00:38,033 --> 00:00:39,966 12 00:00:39,966 --> 00:00:46,166 Now first, let's say we want to calculate our odds ratio. We could do this by dividing 13 00:00:46,166 --> 00:00:52,166 A by B over C divided by D. And if you need a visual representation of what that looks like, 14 00:00:52,166 --> 00:00:52,199 15 00:00:52,200 --> 00:00:58,500 I have one here. So A divided 16 00:00:58,500 --> 00:01:04,800 by B over C divided by D. A, B, C, and 17 00:01:04,800 --> 00:01:05,000 18 00:01:05,000 --> 00:01:11,000 D are arranged in a two by two contingency table. And sometimes seeing this makes 19 00:01:11,000 --> 00:01:13,700 it helpful to visualize what we're doing with the math. 20 00:01:13,700 --> 00:01:17,900 21 00:01:17,900 --> 00:01:23,900 We can also find our risk ratio by dividing by the risks, dividing two risks 22 00:01:23,900 --> 00:01:30,033 together, basically. We'll go back over this in a moment below. But for right now, let's 23 00:01:30,033 --> 00:01:30,066 24 00:01:30,066 --> 00:01:35,932 run this code. And you can see we found our odds ratio, which is 1.5 repeating and 25 00:01:35,933 --> 00:01:36,499 26 00:01:36,500 --> 00:01:42,666 our risk ratio, which is 1.3 almost repeating. Now, this will 27 00:01:42,666 --> 00:01:48,799 work the same way if we define each variable separately. We're 40, 60, 30, and 70 each get put 28 00:01:48,800 --> 00:01:54,933 into their own initialized variable. We can find the 29 00:01:54,933 --> 00:02:01,066 odds ratio in a different way this time with a times d divided by b times c. 30 00:02:01,066 --> 00:02:01,332 31 00:02:01,333 --> 00:02:07,133 So a times d divided by b times c will give you the same result 32 00:02:07,133 --> 00:02:07,633 33 00:02:07,633 --> 00:02:13,666 as what we did above. We can also find our risk ratio 34 00:02:13,666 --> 00:02:19,699 by dividing by two risks. So we'll take a divided by 35 00:02:19,700 --> 00:02:19,733 36 00:02:19,733 --> 00:02:26,066 a plus b, because a plus b is the marginal total 37 00:02:26,066 --> 00:02:27,566 38 00:02:27,566 --> 00:02:33,966 of this 40 and 60. C plus d would be the marginal total down here. When it click calculate 39 00:02:33,966 --> 00:02:39,132 here and you can see now, we've got our marginal totals out here, 40 00:02:39,133 --> 00:02:40,766 41 00:02:40,766 --> 00:02:45,466 170 and 130. So this would be 40 42 00:02:45,466 --> 00:02:48,066 43 00:02:48,066 --> 00:02:54,432 divided by 100. Is the risk ratio for successes for unemployed or for employed people 44 00:02:54,433 --> 00:03:00,533 in this case. Let's say that this is a table for success and interviews if you're employed versus 45 00:03:00,533 --> 00:03:01,366 unemployed. 46 00:03:01,366 --> 00:03:09,966 47 00:03:09,966 --> 00:03:15,699 Okay, and then we have risk of C, which is the same thing, which is this 48 00:03:15,700 --> 00:03:16,066 49 00:03:16,066 --> 00:03:21,766 divided by, or yes, divided by 30 plus 70, which is 30 50 00:03:21,766 --> 00:03:22,166 51 00:03:22,166 --> 00:03:28,232 divided by 100. And then we 52 00:03:28,233 --> 00:03:34,499 can find the risk ratio of those two risks by dividing risk of A divided by risk of C. You can also 53 00:03:34,500 --> 00:03:34,533 54 00:03:34,533 --> 00:03:40,533 do this all in one step, which you can see is just combining these two into one line. And 55 00:03:40,533 --> 00:03:46,533 when I run that, we can see they both turn out exactly the same result. 56 00:03:46,533 --> 00:03:47,233 57 00:03:47,233 --> 00:03:53,433 And it is the same result we got above as well. In Python, there's 58 00:03:53,433 --> 00:03:53,466 59 00:03:53,466 --> 00:03:59,699 also a function from the scipy.stats package that we perform this automatically, given an array. An array 60 00:03:59,700 --> 00:03:59,733 61 00:03:59,733 --> 00:04:05,833 in Python is like a list, except it can only store values at the same data type. So if we put numbers into it, they 62 00:04:05,833 --> 00:04:11,966 have to be all numbers. Okay, now the same 63 00:04:11,966 --> 00:04:11,999 64 00:04:12,000 --> 00:04:17,833 is above where we had a, b, and we gave it some information. 65 00:04:17,833 --> 00:04:18,866 66 00:04:18,866 --> 00:04:24,932 This time, we're saying odds ratio, comma, p value, is equal to fischer 67 00:04:24,933 --> 00:04:31,133 exact table. And our table is right here. The reason we're able 68 00:04:31,133 --> 00:04:36,766 to do this is because we know fischer exact, returns two items from it. 69 00:04:36,766 --> 00:04:43,132 70 00:04:43,133 --> 00:04:48,966 So the odds ratio from fischer's test is 1.56. Yep, that's what we were expecting from above. 71 00:04:48,966 --> 00:04:49,299 72 00:04:49,300 --> 00:04:54,866 There is no quick way to do a risk ratio like this, so you will just have to do it by hand in Python, unfortunately. 73 00:04:54,866 --> 00:04:55,299 74 00:04:55,300 --> 00:05:01,466 But if you'd like to use this method to find your odds ratio, you are more than welcome to. All 75 00:05:01,466 --> 00:05:02,532 right, happy coding.