1
00:00:00,200 --> 00:00:01,366


2
00:00:01,366 --> 00:00:07,499
Hello, and welcome to a video on frequency tables in Python. First, let's load in some data. And we'll do this just

3
00:00:07,500 --> 00:00:07,533


4
00:00:07,533 --> 00:00:13,533
as we usually do. You may notice I'm importing NumPy as NP, and that's for an example that I'll be using below, but

5
00:00:13,533 --> 00:00:13,566


6
00:00:13,566 --> 00:00:18,766
it's not necessary to load in your data or make a frequency table. Let's go ahead and run this block of code.

7
00:00:18,766 --> 00:00:19,799


8
00:00:19,800 --> 00:00:26,066
And now we can see I've got IV1, IV2, and DV available to use. Next,

9
00:00:26,066 --> 00:00:26,099


10
00:00:26,100 --> 00:00:32,533
let's take a look at our data using a frequency table. Now is a great time to talk about methods versus functions.

11
00:00:32,533 --> 00:00:33,499


12
00:00:33,500 --> 00:00:39,366
Think of a function as a tool you use by calling its name and giving it some input inside parentheses.

13
00:00:39,366 --> 00:00:39,932


14
00:00:39,933 --> 00:00:46,466
For example, NP.mean, open parentheses, D-F-IV2, close parentheses,

15
00:00:46,466 --> 00:00:46,632


16
00:00:46,633 --> 00:00:52,899
is a function that calculates the mean of the column IV2. The function is wrapped around the data it operates

17
00:00:52,900 --> 00:00:52,933


18
00:00:52,933 --> 00:00:59,066
on, and we can see this below. Print is also a function.

19
00:00:59,066 --> 00:01:01,999


20
00:01:02,000 --> 00:01:08,200
A method is like a function, but it's attached to a specific object, like a variable or data frame,

21
00:01:08,200 --> 00:01:08,333


22
00:01:08,333 --> 00:01:14,399
and is applied directly to it. For example, .value_counts() is a method that you apply to a

23
00:01:14,400 --> 00:01:14,433


24
00:01:14,433 --> 00:01:20,499
column, like df['IV2'], to count the occurrences of each unique value. You write

25
00:01:20,500 --> 00:01:20,533


26
00:01:20,533 --> 00:01:26,699
it after the variable with a dot in between. So here

27
00:01:26,700 --> 00:01:32,433
we're going to make a frequency table. And we're going to use two methods to create our frequency table data frame.

28
00:01:32,433 --> 00:01:33,099


29
00:01:33,100 --> 00:01:39,200
First, we have dot value counts, which does exactly what you think it's going to do.

30
00:01:39,200 --> 00:01:39,366


31
00:01:39,366 --> 00:01:45,499
It counts the values. Next, we have dot reset index, which is a little more obscure in what it's doing.

32
00:01:45,500 --> 00:01:46,033


33
00:01:46,033 --> 00:01:50,799
It adds the value counts we just made to our data frame as a new column.

34
00:01:50,800 --> 00:01:54,733


35
00:01:54,733 --> 00:02:00,999
Then we'll name our columns right here using the dot columns

36
00:02:01,000 --> 00:02:01,900
method.

37
00:02:01,900 --> 00:02:10,366


38
00:02:10,366 --> 00:02:16,332
And there you go, you see we've got our frequencies. We have a 96,

39
00:02:16,333 --> 00:02:16,566


40
00:02:16,566 --> 00:02:22,666
20 times in our data, 92, appears 16 times, 80, appears 11 times and so

41
00:02:22,666 --> 00:02:22,699


42
00:02:22,700 --> 00:02:28,766
on and so forth. Great. So that's a frequency table. But what

43
00:02:28,766 --> 00:02:28,799


44
00:02:28,800 --> 00:02:34,800
if we wanted it to be an ascending order? Because this is a descending order by default, 96 is

45
00:02:34,800 --> 00:02:34,833


46
00:02:34,833 --> 00:02:40,999
at the top and then it goes down all the way to 78.75. We'll add another method

47
00:02:41,000 --> 00:02:47,466
in between the two we already have called dot sort index. So now our code is just slightly different.

48
00:02:47,466 --> 00:02:47,766


49
00:02:47,766 --> 00:02:53,832
We have dot value counts same as before followed by dot sort index followed by dot reset index.

50
00:02:53,833 --> 00:02:56,533


51
00:02:56,533 --> 00:03:02,199
We'll name our columns again using the dot columns and print our frequency table.

52
00:03:02,200 --> 00:03:03,800


53
00:03:03,800 --> 00:03:08,066
Okay, great. This time we can see it starts at 50 and it ends at 96.

54
00:03:08,066 --> 00:03:13,766


55
00:03:13,766 --> 00:03:19,966
Now it's an ascending order. Neat. But what if we want to add a column for cumulative percentage? While

56
00:03:19,966 --> 00:03:26,132
there's not a quick way to do this in Python, we can use a formula to help us create a new column in our data frame to do just that.

57
00:03:26,133 --> 00:03:27,133


58
00:03:27,133 --> 00:03:33,033
You'll notice we have our new methods in this snippet of code.cumulative sum and dot sum.

59
00:03:33,033 --> 00:03:34,799


60
00:03:34,800 --> 00:03:40,933
So first we'll make our frequency table, same as before, and then we'll add that cumulative

61
00:03:40,933 --> 00:03:40,966


62
00:03:40,966 --> 00:03:47,632
percentage. We're going to create a column in frequency table called cumulative percentage.

63
00:03:47,633 --> 00:03:50,533


64
00:03:50,533 --> 00:03:56,899
And then in frequency table, we're going to take frequency, the variable that we defined above right

65
00:03:56,900 --> 00:03:56,933


66
00:03:56,933 --> 00:04:02,699
here, apply the dot cumulative sum method to it,

67
00:04:02,700 --> 00:04:03,833


68
00:04:03,833 --> 00:04:10,366
divide it by frequency table, frequency dot sum times 100.

69
00:04:10,366 --> 00:04:13,366


70
00:04:13,366 --> 00:04:19,032
And for your variables, the only thing that will change is

71
00:04:19,033 --> 00:04:19,399


72
00:04:19,400 --> 00:04:25,566
the variable that you put here. Everything else can stay the same. Now, let's print our frequency table.

73
00:04:25,566 --> 00:04:26,099


74
00:04:26,100 --> 00:04:29,533
And we can see we have a cumulative percentage and it's an ascending order.

75
00:04:29,533 --> 00:04:33,733


76
00:04:33,733 --> 00:04:37,266
And that's it. That should be enough to get you started on making your own frequency table.