1 00:00:00,000 --> 00:00:01,100 2 00:00:01,100 --> 00:00:07,266 Hello, and welcome to a video on subsetting in Excel. Subsetting data is when we take our 3 00:00:07,266 --> 00:00:13,032 data frame and make a smaller, more manageable version of it by eliminating information 4 00:00:13,033 --> 00:00:13,266 5 00:00:13,266 --> 00:00:18,099 that we don't want to have from it. So for example, 6 00:00:18,100 --> 00:00:19,333 7 00:00:19,333 --> 00:00:25,233 this could work by sorting your data from largest 8 00:00:25,233 --> 00:00:25,433 9 00:00:25,433 --> 00:00:31,566 to smallest and getting rid of everything that doesn't meet a cutoff point, or you may only want zeros or 10 00:00:31,566 --> 00:00:31,599 11 00:00:31,600 --> 00:00:35,833 ones from your data frame. There are a lot of different ways that you might sort your data. 12 00:00:35,833 --> 00:00:38,166 13 00:00:38,166 --> 00:00:44,199 First, you want to make sure you have a copy of your data frame. You never want to work with the original data frame 14 00:00:44,200 --> 00:00:49,700 when you're subsetting in Excel, because you may lose your data by doing that, 15 00:00:49,700 --> 00:00:50,266 16 00:00:50,266 --> 00:00:56,399 and you may not be able to get it back. So here, I've made a copy several 17 00:00:56,400 --> 00:01:02,466 copies of my data frame to work with, so we're gonna go to sheet two, and I'm going 18 00:01:02,466 --> 00:01:02,499 19 00:01:02,500 --> 00:01:04,700 to select the entire data frame. 20 00:01:04,700 --> 00:01:09,833 21 00:01:09,833 --> 00:01:15,933 There we go. Now, I'll go to sort and filter, 22 00:01:15,933 --> 00:01:17,466 23 00:01:17,466 --> 00:01:21,399 sort smallest to largest, or you can do custom sort, 24 00:01:21,400 --> 00:01:23,633 25 00:01:23,633 --> 00:01:29,599 let's custom sort, because I'm interested in RDV in this instance, and I would like to 26 00:01:29,600 --> 00:01:30,100 27 00:01:30,100 --> 00:01:36,800 have only the instance of RDV that are over a thousand. So let's sort RDV, 28 00:01:36,800 --> 00:01:38,633 29 00:01:38,633 --> 00:01:43,499 smallest to largest. Actually, let's do largest to smallest and click OK. 30 00:01:43,500 --> 00:01:45,366 31 00:01:45,366 --> 00:01:51,532 Now, you can see all of my data has been sorted for me, and I can simply 32 00:01:51,533 --> 00:01:52,999 33 00:01:53,000 --> 00:01:59,266 delete anything that is not over a thousand, and because I'm not working with my original data frame, 34 00:01:59,266 --> 00:02:05,366 it's safe for me to do this. So you can see I've selected at the cutoff point 35 00:02:05,366 --> 00:02:10,066 of a thousand right here, and there we have it. Now I have just our 36 00:02:10,066 --> 00:02:12,566 37 00:02:12,566 --> 00:02:18,732 values over a thousand. There is another way to do this. Let's take a look 38 00:02:18,733 --> 00:02:24,333 at my third sheet here. Let's once again select the entire data frame. 39 00:02:24,333 --> 00:02:27,733 40 00:02:27,733 --> 00:02:33,899 And now we'll go to format as table. You can select any one of these options. They're just for aesthetics. 41 00:02:33,900 --> 00:02:35,900 42 00:02:35,900 --> 00:02:42,066 I have selected the labels in this table, meaning the IV1, IV2, DB, et cetera. So I'm going to click 43 00:02:42,066 --> 00:02:47,166 the my table has headers button, and then click OK. 44 00:02:47,166 --> 00:02:49,532 45 00:02:49,533 --> 00:02:54,033 Now we can see that we have these little arrows next to our 46 00:02:54,033 --> 00:02:56,199 47 00:02:56,200 --> 00:03:02,266 variable names. So I'll click one of these arrows, go to 48 00:03:02,266 --> 00:03:02,299 49 00:03:02,300 --> 00:03:05,966 number filters, and we'll say greater than 50 00:03:05,966 --> 00:03:08,832 51 00:03:08,833 --> 00:03:13,833 1,000. And click OK. 52 00:03:13,833 --> 00:03:15,666 53 00:03:15,666 --> 00:03:21,732 Now it's automatically filtered the data set, so that all of the values that are greater than 1,000 are present. 54 00:03:21,733 --> 00:03:23,466 55 00:03:23,466 --> 00:03:29,499 This would be the preferred way to filter your data in Excel, because it doesn't require the two to delete any of your 56 00:03:29,500 --> 00:03:29,533 57 00:03:29,533 --> 00:03:35,766 data, but still filters your data for you. But if for some reason you don't have the format as table, or if you can't figure 58 00:03:35,766 --> 00:03:41,799 out how to do it in your version of Excel, then the first method will work just fine. All right, have a great day and have fun! 59 00:03:41,800 --> 00:03:41,833 60 00:03:41,833 --> 00:03:42,866 fun learning Excel.