1 00:00:00,600 --> 00:00:02,466 2 00:00:02,466 --> 00:00:08,099 Hello and welcome to a video on Python. This assignment will ask you to create variables. 3 00:00:08,100 --> 00:00:08,600 4 00:00:08,600 --> 00:00:15,033 Variables in Python are like a container for information. For example, we could use a variable called 5 00:00:15,033 --> 00:00:15,066 6 00:00:15,066 --> 00:00:21,199 name to hold information about a name. So here I'm going to set a variable called 7 00:00:21,200 --> 00:00:27,566 name equal to Jane. Because we're using a text string, I'm going to enclose it in quotation marks. 8 00:00:27,566 --> 00:00:28,432 9 00:00:28,433 --> 00:00:34,499 Next we'll print name to our results. I'll click this button here to run the 10 00:00:34,500 --> 00:00:40,533 code. When you first open a notebook, it 11 00:00:40,533 --> 00:00:46,566 may take a moment for it to actually run your commands. But here you see below, 12 00:00:46,566 --> 00:00:46,599 13 00:00:46,600 --> 00:00:52,533 we've printed the name Jane. I could change this to something else, 14 00:00:52,533 --> 00:00:54,533 15 00:00:54,533 --> 00:00:59,666 and now it prints Smith. We'll change it back to Jane. This is a 16 00:00:59,666 --> 00:01:01,799 17 00:01:01,800 --> 00:01:08,000 statistics course, however. So we're usually more concerned with variables that contain numbers. 18 00:01:08,000 --> 00:01:08,833 19 00:01:08,833 --> 00:01:15,166 Here I've created a variable called var1, and I've set it equal to 57. We'll 20 00:01:15,166 --> 00:01:21,366 print var1 below. In Python, two stars, two asterisks, 21 00:01:21,366 --> 00:01:21,566 22 00:01:21,566 --> 00:01:28,566 means raised to the power of when using it as an operation. So if I say 23 asterisks 23 00:01:28,566 --> 00:01:28,966 24 00:01:28,966 --> 00:01:34,199 asterisks 2, it will raise it to the power of 2 and we'll square it. 25 00:01:34,200 --> 00:01:35,200 26 00:01:35,200 --> 00:01:41,633 We'll save that in a variable called var2 and we'll print it below. A division 27 00:01:41,633 --> 00:01:47,733 symbol means divide. So 23 forward slash 2 will 28 00:01:47,733 --> 00:01:48,699 29 00:01:48,700 --> 00:01:54,766 divide 23 by 2. We'll save that in a variable called var3 and print it below. 30 00:01:54,766 --> 00:01:57,432 31 00:01:57,433 --> 00:02:03,633 We can also use order of operations here too. So if we wanted to do them in a certain order, we could 32 00:02:03,633 --> 00:02:03,666 33 00:02:03,666 --> 00:02:10,132 do something like var4 equals parentheses 23 divided by 2, close parentheses, 34 00:02:10,133 --> 00:02:16,266 all raised to the power of 2, and that will print below as var4. Let's go ahead and 35 00:02:16,266 --> 00:02:16,299 36 00:02:16,300 --> 00:02:22,233 print everything out. So you can see we've set 57 to 57. There we go. 37 00:02:22,233 --> 00:02:23,099 38 00:02:23,100 --> 00:02:29,166 23 raised to the power of 2. 23 squared is 529. 23 divided by 2 39 00:02:29,166 --> 00:02:29,199 40 00:02:29,200 --> 00:02:35,966 is 11.5 and 23 divided by 2 raised to the power of 2. So 11.5 41 00:02:35,966 --> 00:02:36,766 42 00:02:36,766 --> 00:02:42,966 times 11.5 is 132.25. If you've done a little 43 00:02:42,966 --> 00:02:42,999 44 00:02:43,000 --> 00:02:49,233 programming before, you may have noticed I didn't have to tell Python that var1 is a number. It will figure out what 45 00:02:49,233 --> 00:02:49,266 46 00:02:49,266 --> 00:02:55,366 kind of variable it is based on the information you put in it. That's good. Maybe we want to enter 47 00:02:55,366 --> 00:03:02,199 in a string of numbers. We call this string of numbers a list. A list doesn't have to contain numbers. 48 00:03:02,200 --> 00:03:08,266 In fact, we can make a list of just about anything. In this case, we'll make a list called 49 00:03:08,266 --> 00:03:12,766 var. And I'm just going to put in some random numbers here. Let's print that out. 50 00:03:12,766 --> 00:03:14,566 51 00:03:14,566 --> 00:03:20,632 This might be pretty useful. What if we wanted to multiply every number in var by var 1? In other 52 00:03:20,633 --> 00:03:20,666 53 00:03:20,666 --> 00:03:26,732 words, we want to multiply 57 times 2, then 57 times 5, then by 2, 54 00:03:26,733 --> 00:03:26,766 55 00:03:26,766 --> 00:03:32,232 etc. If we try to simply multiply the two together, let's see what happens. 56 00:03:32,233 --> 00:03:33,066 57 00:03:33,066 --> 00:03:39,366 So vars_mult, meaning I'm multiplying vars together. And you can name your variables, whatever 58 00:03:39,366 --> 00:03:39,399 59 00:03:39,400 --> 00:03:45,966 you want, so long as they're sensible and make sense to you and whoever's reading your code, equals 60 00:03:45,966 --> 00:03:51,499 vars times var1. The asterisk is the multiplication symbol in Python. 61 00:03:51,500 --> 00:03:53,500 62 00:03:53,500 --> 00:03:59,600 We'll print that out as vars_mult. Well, that's not what we wanted, right? 63 00:03:59,600 --> 00:04:00,000 64 00:04:00,000 --> 00:04:06,000 What Python has done instead is take this list and repeat it 57 times. Maybe this 65 00:04:06,000 --> 00:04:06,266 66 00:04:06,266 --> 00:04:12,599 would be useful in other circumstances, but not this time. This is where pandas, a package in Python, 67 00:04:12,600 --> 00:04:19,100 comes in. Let's try to use it to solve this problem. First, we'll need to import pandas. 68 00:04:19,100 --> 00:04:20,033 69 00:04:20,033 --> 00:04:26,166 You may need to install pandas first. Do that with the following line. I'll take the hashtag off 70 00:04:26,166 --> 00:04:32,032 of it. It's exclamation point pip and that's what you use in front of any installation you're doing. 71 00:04:32,033 --> 00:04:32,399 72 00:04:32,400 --> 00:04:37,466 Install pandas. Once you've installed pandas, you don't need to do it again. 73 00:04:37,466 --> 00:04:42,199 74 00:04:42,200 --> 00:04:48,400 Now, this next line, letss Python know we want to use pandas and create an abbreviation for us to use so we aren't 75 00:04:48,400 --> 00:04:54,633 typing pandas over and over. We'll import pandas as PD. So we're 76 00:04:54,633 --> 00:04:54,666 77 00:04:54,666 --> 00:04:59,632 telling Python import this library so that I can use it in my code. 78 00:04:59,633 --> 00:05:01,033 79 00:05:01,033 --> 00:05:07,099 We'll create a new variable called DF to stand for data frame. When we enter this data, 80 00:05:07,100 --> 00:05:14,166 the A part of this code is for the name of the column. So DF equals 81 00:05:14,166 --> 00:05:14,199 82 00:05:14,200 --> 00:05:20,200 PD.dataframe. So data frame here is a function from pandas that 83 00:05:20,200 --> 00:05:20,233 84 00:05:20,233 --> 00:05:26,566 we're using. These words will make more sense the longer you code and get used to hearing 85 00:05:26,566 --> 00:05:32,866 them and seeing them. We're opening a set of parentheses and then inside that parentheses, 86 00:05:32,866 --> 00:05:32,899 87 00:05:32,900 --> 00:05:38,900 we're opening a set of brackets. A is going to be the name of this column where 88 00:05:38,900 --> 00:05:43,900 we have these numbers attached to it. So let's print DF. 89 00:05:43,900 --> 00:05:46,266 90 00:05:46,266 --> 00:05:52,532 Okay, so this first column you see here is our index. The index and Python always starts 91 00:05:52,533 --> 00:05:52,566 92 00:05:52,566 --> 00:05:58,832 at zero. So we know that we have nine numbers here, not just eight. 93 00:05:58,833 --> 00:06:00,033 94 00:06:00,033 --> 00:06:06,233 The A column here is exactly what we named it above and we could have named this anything. We could 95 00:06:06,233 --> 00:06:13,366 have changed this name to puppy or a var one or time in seconds. 96 00:06:13,366 --> 00:06:13,566 97 00:06:13,566 --> 00:06:19,766 Whatever it is we were trying to measure or capture. You want this name to reflect 98 00:06:19,766 --> 00:06:25,099 what that item is so that it makes more sense in your code. In this case, we're just going to call it A. 99 00:06:25,100 --> 00:06:27,133 100 00:06:27,133 --> 00:06:33,499 Let's create a new column called B by multiplying A by var one that we created earlier. 101 00:06:33,500 --> 00:06:33,600 102 00:06:33,600 --> 00:06:39,900 So up here in our code, we've already run this section of code where we created var one. That 103 00:06:39,900 --> 00:06:46,200 means that it's available to us to use down here in this new section of code where we're calling 104 00:06:46,200 --> 00:06:52,400 var one. If we hadn't run that first section of code, we would get an error saying that 105 00:06:52,400 --> 00:06:58,533 that variable does not exist. So first, we're going to create our new data frame 106 00:06:58,533 --> 00:06:58,566 107 00:06:58,566 --> 00:07:04,599 and we're going to set it um set the name as b. So df 108 00:07:04,600 --> 00:07:04,633 109 00:07:04,633 --> 00:07:10,666 open braces b equals df open braces a 110 00:07:10,666 --> 00:07:16,599 and this is how we refer to different variables in a data frame. Multiply by var one and then 111 00:07:16,600 --> 00:07:17,233 112 00:07:17,233 --> 00:07:23,233 let's print our new data frame and you can see we have a just as we did before but now 113 00:07:23,233 --> 00:07:28,499 we have a new column called b where all of those variables were multiplied by 57. 114 00:07:28,500 --> 00:07:31,433 115 00:07:31,433 --> 00:07:37,266 So that's more like it. We've already kind of talked about what a data frame is in pandas but we 116 00:07:37,266 --> 00:07:37,432 117 00:07:37,433 --> 00:07:43,433 can do all sorts of useful things with data frames. Now what if we have a data frame already that 118 00:07:43,433 --> 00:07:49,533 we want to import into colab to use with pandas? Let's go ahead and try that now. First, make 119 00:07:49,533 --> 00:07:49,566 120 00:07:49,566 --> 00:07:55,732 sure you have the data you want to use saved in your Google Drive. We'll use the classdataS20 file that 121 00:07:55,733 --> 00:07:55,766 122 00:07:55,766 --> 00:08:01,799 you'll use for your homework as an example. We've already mounted pandas above. 123 00:08:01,800 --> 00:08:02,000 124 00:08:02,000 --> 00:08:08,000 So in order to do this, you would want to say import pandas as PD in 125 00:08:08,000 --> 00:08:14,000 addition to from google.colab import drive. You have to have both 126 00:08:14,000 --> 00:08:14,033 127 00:08:14,033 --> 00:08:17,766 working at the same time for this to work. 128 00:08:17,766 --> 00:08:20,699 129 00:08:20,700 --> 00:08:26,700 Next we have a line where we're mounting our Google Drive and this will be the same for everyone. 130 00:08:26,700 --> 00:08:27,133 131 00:08:27,133 --> 00:08:33,199 It's just drive.mount/content/drive. Make sure to put 132 00:08:33,200 --> 00:08:39,233 your data in a folder called statlab all lowercase and move your copy of the class 133 00:08:39,233 --> 00:08:39,266 134 00:08:39,266 --> 00:08:45,166 data into it. That way your file path will look the exact same as this. 135 00:08:45,166 --> 00:08:46,599 136 00:08:46,600 --> 00:08:52,500 So you'll set your file path and it will be equal to open 137 00:08:52,500 --> 00:08:52,966 138 00:08:52,966 --> 00:09:01,599 "/content/drive/MyDrive/statlab/classdataS20.xlsx". 139 00:09:01,600 --> 00:09:04,033 140 00:09:04,033 --> 00:09:10,066 Next, we'll create a data frame called DF. You can name your data frame anything so if you wanted to name 141 00:09:10,066 --> 00:09:15,132 it class data or something else that just makes more sense to you, please feel free to do so. 142 00:09:15,133 --> 00:09:16,566 143 00:09:16,566 --> 00:09:22,899 And we'll set that equal to PD, so pandas, we're using another pandas function. Read 144 00:09:22,900 --> 00:09:22,933 145 00:09:22,933 --> 00:09:28,999 Excel, and then we're going to set that argument to file path right here. An argument is 146 00:09:29,000 --> 00:09:29,033 147 00:09:29,033 --> 00:09:36,466 something that goes inside the set of parentheses after your function. We'll talk more about arguments in a later assignment. 148 00:09:36,466 --> 00:09:37,666 149 00:09:37,666 --> 00:09:43,699 Next, we'll print DF.head, which is the top of 150 00:09:43,700 --> 00:09:43,733 151 00:09:43,733 --> 00:09:49,766 our data frame. This first time that I'm running 152 00:09:49,766 --> 00:09:56,032 this, it's going to ask if if I want to give it permission to connect to Google Drive. I'm going to say connect to Google 153 00:09:56,033 --> 00:10:02,133 Drive. I'll select my email address or your UALR address is what you want to 154 00:10:02,133 --> 00:10:07,166 select and hit continue and then hit continue and this 155 00:10:07,166 --> 00:10:09,432 156 00:10:09,433 --> 00:10:11,366 will take it just a moment to do. 157 00:10:11,366 --> 00:10:18,332 158 00:10:18,333 --> 00:10:24,033 There we go. So we can see we have three variables, IV1, IV2, and DV. 159 00:10:24,033 --> 00:10:27,199 160 00:10:27,200 --> 00:10:33,066 So that was a lot of steps. Basically, we told Python. We'd like to access a file on our Google Drive and 161 00:10:33,066 --> 00:10:33,632 162 00:10:33,633 --> 00:10:39,799 we created a file path to the exact file we wanted to access. Finally, we used pandas to read 163 00:10:39,800 --> 00:10:45,933 the Excel file using that file path. Pretty good. Let's see if we can make some new variables now. 164 00:10:45,933 --> 00:10:46,133 165 00:10:46,133 --> 00:10:52,366 You'll notice these already have names. So instead of referring to them as A or B, we'll call 166 00:10:52,366 --> 00:10:58,499 them IV1, IV2, and DV. What if we want to multiply IV1 and 167 00:10:58,500 --> 00:11:04,000 IV2 together? Let's call this variable IV3. 168 00:11:04,000 --> 00:11:04,700 169 00:11:04,700 --> 00:11:10,466 So df['IV3] = df['IV1'] * df['IV2']. 170 00:11:10,466 --> 00:11:10,799 171 00:11:10,800 --> 00:11:15,500 And we'll go ahead and print the first couple of results from that data frame. 172 00:11:15,500 --> 00:11:17,633 173 00:11:17,633 --> 00:11:23,833 Okay, so as expected, these are some pretty big numbers. These were multiplying 80 times 174 00:11:23,833 --> 00:11:29,866 88.5 and getting 7,880. But you 175 00:11:29,866 --> 00:11:29,899 176 00:11:29,900 --> 00:11:36,133 can see that it multiplies each one down the row. So 78 times 93, 7254, 80 177 00:11:36,133 --> 00:11:42,333 times 80, 6400. And you can just sort of manually check these if you feel like you want to to make sure 178 00:11:42,333 --> 00:11:48,433 that what you're doing is what you think you're doing. Usually if you're getting the wrong 179 00:11:48,433 --> 00:11:48,466 180 00:11:48,466 --> 00:11:54,566 results here, it is a person error and not a Python error. So something 181 00:11:54,566 --> 00:11:54,599 182 00:11:54,600 --> 00:11:59,866 you've done in the code is wrong or your calculations are off or something like that. 183 00:11:59,866 --> 00:12:01,799 184 00:12:01,800 --> 00:12:08,500 We could also use that variable we created earlier, var1. So we've created that constant remember 185 00:12:08,500 --> 00:12:15,133 of 57. Let's divide IV3 by var1 and make a new variable IV4. 186 00:12:15,133 --> 00:12:16,699 187 00:12:16,700 --> 00:12:22,800 So df IV4 equals df IV3 divided by var1 and we'll go ahead and 188 00:12:22,800 --> 00:12:26,033 print the first few results as that. 189 00:12:26,033 --> 00:12:30,099 190 00:12:30,100 --> 00:12:36,200 So now you can see we've taken IV3 and divided it by 57 and we've ended up with this 191 00:12:36,200 --> 00:12:36,233 192 00:12:36,233 --> 00:12:42,233 column of numbers here. Nice. That should be all you need to get started on your first 193 00:12:42,233 --> 00:12:48,066 homework assignment. Please let us know if you have any questions about this material and happy coding.