1 00:00:00,200 --> 00:00:01,366 2 00:00:01,366 --> 00:00:07,499 Hello, and welcome to a video on frequency tables in Python. First, let's load in some data. And we'll do this just 3 00:00:07,500 --> 00:00:07,533 4 00:00:07,533 --> 00:00:13,533 as we usually do. You may notice I'm importing NumPy as NP, and that's for an example that I'll be using below, but 5 00:00:13,533 --> 00:00:13,566 6 00:00:13,566 --> 00:00:18,766 it's not necessary to load in your data or make a frequency table. Let's go ahead and run this block of code. 7 00:00:18,766 --> 00:00:19,799 8 00:00:19,800 --> 00:00:26,066 And now we can see I've got IV1, IV2, and DV available to use. Next, 9 00:00:26,066 --> 00:00:26,099 10 00:00:26,100 --> 00:00:32,533 let's take a look at our data using a frequency table. Now is a great time to talk about methods versus functions. 11 00:00:32,533 --> 00:00:33,499 12 00:00:33,500 --> 00:00:39,366 Think of a function as a tool you use by calling its name and giving it some input inside parentheses. 13 00:00:39,366 --> 00:00:39,932 14 00:00:39,933 --> 00:00:46,466 For example, NP.mean, open parentheses, D-F-IV2, close parentheses, 15 00:00:46,466 --> 00:00:46,632 16 00:00:46,633 --> 00:00:52,899 is a function that calculates the mean of the column IV2. The function is wrapped around the data it operates 17 00:00:52,900 --> 00:00:52,933 18 00:00:52,933 --> 00:00:59,066 on, and we can see this below. Print is also a function. 19 00:00:59,066 --> 00:01:01,999 20 00:01:02,000 --> 00:01:08,200 A method is like a function, but it's attached to a specific object, like a variable or data frame, 21 00:01:08,200 --> 00:01:08,333 22 00:01:08,333 --> 00:01:14,399 and is applied directly to it. For example, .value_counts() is a method that you apply to a 23 00:01:14,400 --> 00:01:14,433 24 00:01:14,433 --> 00:01:20,499 column, like df['IV2'], to count the occurrences of each unique value. You write 25 00:01:20,500 --> 00:01:20,533 26 00:01:20,533 --> 00:01:26,699 it after the variable with a dot in between. So here 27 00:01:26,700 --> 00:01:32,433 we're going to make a frequency table. And we're going to use two methods to create our frequency table data frame. 28 00:01:32,433 --> 00:01:33,099 29 00:01:33,100 --> 00:01:39,200 First, we have dot value counts, which does exactly what you think it's going to do. 30 00:01:39,200 --> 00:01:39,366 31 00:01:39,366 --> 00:01:45,499 It counts the values. Next, we have dot reset index, which is a little more obscure in what it's doing. 32 00:01:45,500 --> 00:01:46,033 33 00:01:46,033 --> 00:01:50,799 It adds the value counts we just made to our data frame as a new column. 34 00:01:50,800 --> 00:01:54,733 35 00:01:54,733 --> 00:02:00,999 Then we'll name our columns right here using the dot columns 36 00:02:01,000 --> 00:02:01,900 method. 37 00:02:01,900 --> 00:02:10,366 38 00:02:10,366 --> 00:02:16,332 And there you go, you see we've got our frequencies. We have a 96, 39 00:02:16,333 --> 00:02:16,566 40 00:02:16,566 --> 00:02:22,666 20 times in our data, 92, appears 16 times, 80, appears 11 times and so 41 00:02:22,666 --> 00:02:22,699 42 00:02:22,700 --> 00:02:28,766 on and so forth. Great. So that's a frequency table. But what 43 00:02:28,766 --> 00:02:28,799 44 00:02:28,800 --> 00:02:34,800 if we wanted it to be an ascending order? Because this is a descending order by default, 96 is 45 00:02:34,800 --> 00:02:34,833 46 00:02:34,833 --> 00:02:40,999 at the top and then it goes down all the way to 78.75. We'll add another method 47 00:02:41,000 --> 00:02:47,466 in between the two we already have called dot sort index. So now our code is just slightly different. 48 00:02:47,466 --> 00:02:47,766 49 00:02:47,766 --> 00:02:53,832 We have dot value counts same as before followed by dot sort index followed by dot reset index. 50 00:02:53,833 --> 00:02:56,533 51 00:02:56,533 --> 00:03:02,199 We'll name our columns again using the dot columns and print our frequency table. 52 00:03:02,200 --> 00:03:03,800 53 00:03:03,800 --> 00:03:08,066 Okay, great. This time we can see it starts at 50 and it ends at 96. 54 00:03:08,066 --> 00:03:13,766 55 00:03:13,766 --> 00:03:19,966 Now it's an ascending order. Neat. But what if we want to add a column for cumulative percentage? While 56 00:03:19,966 --> 00:03:26,132 there's not a quick way to do this in Python, we can use a formula to help us create a new column in our data frame to do just that. 57 00:03:26,133 --> 00:03:27,133 58 00:03:27,133 --> 00:03:33,033 You'll notice we have our new methods in this snippet of code.cumulative sum and dot sum. 59 00:03:33,033 --> 00:03:34,799 60 00:03:34,800 --> 00:03:40,933 So first we'll make our frequency table, same as before, and then we'll add that cumulative 61 00:03:40,933 --> 00:03:40,966 62 00:03:40,966 --> 00:03:47,632 percentage. We're going to create a column in frequency table called cumulative percentage. 63 00:03:47,633 --> 00:03:50,533 64 00:03:50,533 --> 00:03:56,899 And then in frequency table, we're going to take frequency, the variable that we defined above right 65 00:03:56,900 --> 00:03:56,933 66 00:03:56,933 --> 00:04:02,699 here, apply the dot cumulative sum method to it, 67 00:04:02,700 --> 00:04:03,833 68 00:04:03,833 --> 00:04:10,366 divide it by frequency table, frequency dot sum times 100. 69 00:04:10,366 --> 00:04:13,366 70 00:04:13,366 --> 00:04:19,032 And for your variables, the only thing that will change is 71 00:04:19,033 --> 00:04:19,399 72 00:04:19,400 --> 00:04:25,566 the variable that you put here. Everything else can stay the same. Now, let's print our frequency table. 73 00:04:25,566 --> 00:04:26,099 74 00:04:26,100 --> 00:04:29,533 And we can see we have a cumulative percentage and it's an ascending order. 75 00:04:29,533 --> 00:04:33,733 76 00:04:33,733 --> 00:04:37,266 And that's it. That should be enough to get you started on making your own frequency table.