From bc5df74ceb78a237e4692469d191e6107061a8b3 Mon Sep 17 00:00:00 2001 From: Maxwell Millar-Blanchaer Date: Thu, 24 Mar 2022 20:28:09 -0700 Subject: [PATCH] Adding in the Bootcamp --- Bootcamp/Python_Bootcamp_Part1.ipynb | 853 +++++++++++++++++++ Bootcamp/Python_Bootcamp_Part2.ipynb | 1172 ++++++++++++++++++++++++++ Bootcamp/quantum_mechanics.grades | 65 ++ 3 files changed, 2090 insertions(+) create mode 100644 Bootcamp/Python_Bootcamp_Part1.ipynb create mode 100644 Bootcamp/Python_Bootcamp_Part2.ipynb create mode 100644 Bootcamp/quantum_mechanics.grades diff --git a/Bootcamp/Python_Bootcamp_Part1.ipynb b/Bootcamp/Python_Bootcamp_Part1.ipynb new file mode 100644 index 0000000..e116de5 --- /dev/null +++ b/Bootcamp/Python_Bootcamp_Part1.ipynb @@ -0,0 +1,853 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# It's day one and I know nothing — where do I start?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A \"bootcamp\" tutorial for the absolute basics of Python in 2 parts\n", + "## PART 1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Phys 134L Note: These tutorials are based on those by [Imad Pasha & Christopher Agostino](https://prappleizer.github.io/)*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Welcome. In this tutorial, we're going to work through the absolute basics of the Python programming language. I highly recommend reading Chapter 2 of the textbook either before, or concurrently, with working through this tutorial. It will cover much the same ground, but here you'll get to actively practice the techniques described. \n", + "\n", + "Let's start with the concept of a **declaration**. In Python, our primary goal is to perform calculations - you can think of it as an extremely powerful powerful calculator, and indeed, much of what we build in python amount to pipelines that string simple mathematical computations together and perform them on data. \n", + "\n", + "In order to work with more complex data than a pocket calculator (in which we type in the numbers to be computed directly), Python allows us to declare **variables** to store those values for later use. See below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "variable_1 = 5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "variable_2 = 6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "output = variable_1 + variable_2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(output)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All I did was store the numbers 5 and 6 into variables (which I lazily and poorly named 'variable_1' and 'variable_2'), and then computed their sum. **Note:** One additional takeaway here is the underscore in the variable names - you cannot have spaces in variable names, so often programmers use underscores when needed. We'll talk later about best practices when naming variables. \n", + "\n", + "You might be asking, \"Why did you go through the work of declaring those variables and adding them, when you could've just done:\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "5+6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this particular case, you are absolutely right. If I wanted to know the sum of 5 and 6, I could've just typed it in. Indeed, even if I needed to save that output somewhere, I could've done" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "output2 = 5+6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And now, I can at any time look at that:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(output2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or, I can use it in further calculations:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "output3 = output2*5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(output3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now might be a good time to go over the native mathematical operations available to you in Python (we've seen 2 so far). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "addition = 5 + 5\n", + "subtraction = 5 - 5\n", + "multiplication = 5 * 5\n", + "division = 5 / 6 #be careful of Python version!!\n", + "exponentiation = 5**2\n", + "modulus = 5 % 3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What did I mean by \"Be careful of Python Version\" in the comment above (**note:** you can use comments in lines of code, via the # symbol, to leave descriptions and instructions in your code that aren't seen or run by the interpreter). \n", + "\n", + "Let's do an experiment:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(division)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you're running this notebook in Python version 2.xx, the result will be 0. \n", + "\n", + "This is related to the **datatypes**. In Python version 2.xx (as opposed to any version of Python 3), numbers like those I've been using are treated as **integers** - the computer only knows their values to the \"one's place,\" and thus finds 5/6 to be 0. \n", + "\n", + "In Python 3, they resolved this issues by performing all calculations in **floating point** -- which means including the decimial values. We can do that manually ourselves in a few ways:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "5.0 / 6.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "float(5) / 6 " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In Python 2.x the above illustrates 3 points. The first is that when directly typing in numbers, just adding a \".\" turns the integer into a float, meaning the calculation is done correctly. The second is that you can force any integer to be a float by **typecasting** it using the command I showed (the same goes in reverse, you can use int(some_variable) to round what might be a floating point number to an integer. The third is that only one number in a calculation needs to be float for the whole calculation to be performed as a floating point operation - I could've chose either the 5 or the 6 to make a float, but only need to choose one. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In python three you can do integer division using the // operator: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "5.0//6.0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we see that Python carries out the integer division, but then the returned data type is a float. This is because the input values were float too. Note the subtle difference when the inputs are integers: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "5//6" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise 1: A simple calculation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the box below, create 3 variables which hold your age and the ages of both of your parents. Then, set a variable named \"age_average\" that is equal to the average of your three ages. Be careful of order of operations! You can group operations, just like in PEMDAS math, using soft parenthesis \"()\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#your code here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(age_average)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Types in Python" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So far, we have been working entirely with numbers (integers and floats). You can tell what data type a variable is at any time using the \"type()\" command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 5\n", + "y = 6.0\n", + "print(type(x))\n", + "print(type(y))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "While, at the very bottom of things, your data will always be numbers like this, Python's power comes in when you start looking at it's other data types, which are primarily set up to contain numbers in an organized way. Here are the basic data types in Python:\n", + "\n", + " * Integers\n", + " * Floats\n", + " * Booleans (True or False)\n", + " * Lists (collections of items) \n", + " * Dictionaries (collections accessed via \"keys\") \n", + " * Strings (contained in quotes \"like this\")\n", + " * Tuples (like lists, but immutable (unchangeable)) \n", + " \n", + "In the next few sections, we are going to learn about these data types (skipping integers and floats, which we've mostly covered)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Booleans" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Booleans have a one of two states,\"True\" or \"False\". Try setting a variable equal to True or False in the box below - you should see Python \"color\" the word to indicate syntactically that it is a special word in Python that has a specific meaning." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "true = True\n", + "false = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "true" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These come in handy when we are employing **condiditional statements** (coming up below) in which we say \"Hey code, if some condition is True, do \"X,\" else if some other condition is True, do \"Y\". \n", + "\n", + "Often we are using booleans without recognizing it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Lists" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lists are the most forgiving container in Python. Basically, we can shove whatever we want into a list, be they different data types, or even lists within lists. Of course, the usefulness of non-uniform containers becomes limited - the advantage to storing lots of numbers in a list is that you can then perform operations on them all without worrying that some won't work. \n", + "\n", + "Here's how we define a list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_1 = ['a string', 5, 6.0, True, [5,6,6]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The list above is kind of a mess - you'd almost never want a list to contain such a variety of things in it, but I wanted to highlight that, in principle, Python doesn't care what you stick inside a list. In addition to manually specifiying what is in a list, we can use some generative functions to make lists for us when they have a regular form:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "count_to_10 = list(range(1,11))\n", + "print(count_to_10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I've done above is run the \"range()\" function, which generates an \"iterator\" type. We won't go into the details of what that means here, but by **typecasting** it as a list, it creates a list of numbers counting up. \n", + "\n", + "The form of the function is range(start, stop, step), where \"start\" is inclusive and \"stop\" is exclusive (e.g. [1,11) ). If I wanted 0-9, I would do" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(range(10))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "using the shortcut that if only 1 argument is used, \"start\" is assumed to be 0 and \"step\" is assumed to be 1 (it has to be an integer). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise 2: skip-count\n", + "Generate a list below containing the numbers 2, 4, 6, 8, ... 100 and save it into a variable called skip_count. Then, below, print it to see you did it right." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "skip_count = " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The tutorial on Loops and Conditionals is a great place to jump to once you have the hang of what lists are and how they're defined - that tutorial shows how we use them.\n", + "\n", + "### Indexing\n", + "When we want to see what is in a list, we print it. But sometimes we want to \"pull out\" individual elements of a list and use them in calculations. For that, we need to \"slice\" or **index** the list for the element we want. Lists are indexed such that the first element is assigned \"0\" (to remember this, I got into the habit of calling it the \"0th index\"). For example, using the list from above:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_1[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Basically, we put closed brackets at the end of the variable name and specify which index we want. We can also pull multiple:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_1[0:2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the 2 is not inclusive (it pulls 0th and 1st). We can also specify a skip:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_1[0:5:2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So what about that list within a list? If we want a specific number from it, we can use double indexing. Also, I'll use this as a spot to show **negative indexing** which lets you count backwards from the end of a list (if that happens to be easier):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list_1[-1][1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I did was pull the -1st element (final element, 2nd to last would be -2, etc), which was the inner list, and then I indexed *that* for it's 1st element. \n", + "\n", + "**Try it out:** In the cell below, get it to output the \"s\" in \"a string\" in the 0th element of the list. (It works the same way). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dictionaries\n", + "Dictionaries are like lists, but rather than indexing them by element number as we were doing above, we index them by a special \"key\" that we assign to each \"value\". For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ages = {'Sam':5,'Sarah':6,'Kim':9,'Mukund':17}\n", + "ages['Sarah']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dictionaries are inherently unordered - the order I define things within the dictionary doesn't matter, only that I know the key associated with each value. For some applications, this has advantages over a list. Here, if I were storing the ages of students in a class and was only interested in things like the average and median age, a list would be fine. But if I needed to know *who* was *which* age throughout my analysis, a list would require me to impose that the order went \"sam, sarah, kim, mukund,\" and to remember that order when indexing, and if the order in the list changes, keep track of those too. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Strings" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We've already seen strings used - It lets you store things like words (or, later on, filepaths) in your code that otherwise don't have meaning as far as Python is concerned. Strings are the most forgiving data type of all; you can stick literally anything inside. If you have a data file with tons of different types of data in it, Python will often just read in everything as strings and let you work out how to convert the proper things into ints, floats, etc. Strings are iterable - they can be indexed like lists, character by character. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Tuples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We don't use them to often, so just check out this example and follow aloong!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "example_list = [1,2,5]\n", + "example_tuple = (1,2,5)\n", + "example_list[0] = 2\n", + "print(example_list)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So I've successfully defined it as a tuple and list, and changed the 0th entry of the list. But what if I try on the tuple?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "example_tuple[0] = 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thus, I get an error. So basically, if you make a list-type item and want to make sure it can never be adjusted in your code, you can make it a tuple. \n", + "\n", + "For more examples of the basic manipulations you can make to these basic data types, check out Chapter 2 of the textbook, which lays some of them out!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercise 3: Bringing it together (kind of)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try a single, longer example that (tries to) bring in the things we've seen above. You'll find that part 2 of this bootcamp covers things like iterating/for loops and conditionals, which drastically increase what you can do with Python. Nevertheless, here's some practice with the basic operations. \n", + "\n", + "Let's say you're the teacher of your school's introductory Quantum Mechanics class. You've just graded their first midterm, and are shocked, (*shocked*) to see so many low scores (You thought the midterm was totally reasonable!) \n", + "\n", + "Before you post their individual scores, which might give some students a heart attack, you decide to calculate the distribution statistics of the exam first, so that each student can compare their score to the average, etc. \n", + "\n", + "The scores are (out of 120): 100, 68, 40, 78, 81, 65, 39, 118, 46, 78, 9, 37, 43, 87, 54, 29, 95, 87, 111, 65, 43, 53, 47, 16, 98, 82, 58, 5, 49, 67, 60, 76, 16, 111, 65, 61, 73, 63, 115, 72, 76, 48, 75, 101, 45, 46, 82, 57, 17, 88, 90, 53, 32, 28, 50, 91, 93, 7, 63, 88, 55, 37, 67, 0, 79.\n", + "\n", + "Your first step to analyzing these numbers should be to put them in a list (call it \"scores\"). Do that in a cell below (you can copy and paste from above, just add the list syntax). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "scores = #your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, the first thing we need to do is calculate the average score. Later on, we'll see that there are external functions you can import into Python that will just do this for you, but for now let's calculate it manually (it's easy enough, right?). \n", + "\n", + "Above, we saw use of the sum() function - as it turns out, you can run the sum() function on a list (so long as it only contains numbers) and it will tell you the sum. The only other thing you'll need to calculate the average is the len() function, which returns the number of elements in a list/array. Using those two, define a variable below called \"average_score\" and calculate it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "average_score = #your code here\n", + "average_score" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Great, so we now know what the average score on the test was. Let's figure out what that is in percent. In the cell below, calculate the percentage value of the average score by dividing it by the number of points on the test, and mulitplying that by 100 in the same line. Then, run the cell - you'll see a nice sentence output that lists the percentage, take a look at the line I wrote that does this and see if you can glean how it worked." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "avg_score_percent = #your code here\n", + "\n", + "shortened = str(avg_score_percent) #turn it into a string\n", + "statement = \"The average score on the test was a {}%\".format(str_version[0:5]) #use indexing on the string to drop the everything past the second decimal\n", + "print(statement)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, so the other thing students are always interested in is the standard deviation from the mean - this basically will tell them whether they get an A, B, C, D, or F on the test assuming you curve. Let's be nice, and assume we are going to curve the average to a flat B (85%). The formula for a standard deviation is \n", + "$$ \n", + "s = \\sqrt{\\frac{\\sum_{1}^{N}(x_i - \\mu)^2}{N-1}}\n", + "$$\n", + "\n", + "where $\\mu$ is the average and N is the total number of scores." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We already know how to get N, and we know what $\\mu$ is as well. So to calculate this, we need to know how to calculate the quantity on the top of the fraction. This is actually kind of tricky with the methods we have on hand, so I'm going to introduce a new concept: Numpy (numerical python) arrays. I'm going to get into these in detail in part 2 of the bootcamp, but for now, see the example below for elucidation on why we're about to use them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "arr_version = np.array(scores)\n", + "print(scores-1)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, so I can't subtract an integer from a list. What if I try the array version?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(arr_version-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you look, you should see that each of those scores is the original score with one subtracted off it. Your spidey senses should be tingling then for how we can leverage this functionality to calculate our STD. In the cell below, fill in the variable I'm calling \"top_frac\" to calculate this quantity:\n", + "$$\n", + "\\sum_{i=1}^N (x_i - \\mu)^2\n", + "$$\n", + "\n", + "Notice here that you don't have to actually calculate it one by one - if we first compute a single array that represents each score with the mean subtracted off and then that value squared, then we finish off top_frac just by summing up that array as we've done before. Feel free to use my variable \"arr_version\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "top_frac = #your code here\n", + "print(top_frac)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With that done, we can easily apply the formula to get the final STD - **Hint:** the function np.sqrt() will be useful here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "STD_scores = #your code here\n", + "print(STD_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alright! If you've done everything correctly, you should have found that the average score is a 61/120, with a stadard deviation of 28. Let's, for fun, make a helpful plot to show the students their scores. Don't worry about how the plotting stuff works just yet, we'll dive into it more in part 2, but see if you can figure out what each part of the command is doing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "%matplotlib inline\n", + "\n", + "plt.hist(scores,alpha=0.5)\n", + "plt.axvline(61,color='k',label=\"Mean\")\n", + "plt.axvline(89,ls='--',color='k',label=\"+1 STD\")\n", + "plt.axvline(33,ls='--',color='k',label=\"-1 STD\")\n", + "plt.xlabel('score (out of 120)')\n", + "plt.ylabel('Number of Students')\n", + "plt.legend()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Nice! It looks like our formula for standard deviation successfully describes the original distribution of scores pretty well. Now, how to get them to do better on midterm 2...." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Wrap up" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I hope this super-basic introduction has given you a glimpse at some of the basic functionality of Python. Of course, Python is way more powerful than what has been shown here. I call this Part 1 because once you know the basic data types, how to define variables, and do some simple math on them, we are going to need to jump into new concepts — for loops and conditional statements, as well as invoke new libraries (like numpy and matplotlib) to do make further progress. If you're ready to do that, head on over to Part 2! " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Bootcamp/Python_Bootcamp_Part2.ipynb b/Bootcamp/Python_Bootcamp_Part2.ipynb new file mode 100644 index 0000000..5629866 --- /dev/null +++ b/Bootcamp/Python_Bootcamp_Part2.ipynb @@ -0,0 +1,1172 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# It's Day 1.5 and I know (almost) nothing: Bootcamp Part II" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Phys 134L Note: These tutorials are based on those by [Imad Pasha & Christopher Agostino](https://prappleizer.github.io/)*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Welcome to Part II of the bootcamp! Now that we've covered the basics, we can start diving into what makes programming useful and powerful. To begin, let's quickly review what we learned in Part I of this tutorial. *Note: You'll need to run all code cells of this tutorial in order.*\n", + "\n", + "### Variable Declarations\n", + "We learned that to store information (of any kind) in Python, we want to set a variable name equal to that information, and then use that name to perform calculations on it \n", + "\n", + "### Data Types\n", + "We learned that Python has different rules for different kinds of data — it performs calculations differently on integers than on floats, treats lists differently than numpy arrays, etc. Figuring out what data type is the most efficient and effective way to work with your data is one of the key conceptual skills to learn when programming. \n", + "\n", + "### Lists and Indexing\n", + "We learned that the \"default\" way to store simple data (say, a bunch of numbers) is in a **list**, which can then be **indexed** by element number (starting with zero) to extract values from the list. We learned that lists can be fed into certain functions, like sum(), to return the sum of all numbers in the list (assuming the list is, indeed, all numbers). \n", + "\n", + "### Debugging (barely)\n", + "You probably didn't notice, but we practiced a little bit of debugging as well — we printed out lists to make sure they were filled with the numbers we wanted after a calculation, a simple form of debugging! \n", + "\n", + "## What we will cover in Part II \n", + "By the end of this tutorial, I hope you will be able to handle the first task a professor might give you when starting to do research with them — loading up some data from a simple ASCII file, performing some calculations, and plotting it. To do all this, we will need to learn a bit of the Numpy Library, some **conditional statements** and **loops**, and some basic plotting techniques. Without further ado, let's jump in! " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Numpy, Scipy, Matplotlib, and Beyond\n", + "In Part I of this tutorial, we had to resort to calling a special data type that was not native to Python — the Numpy Array. This was useful to us because of a special behavior: Operating math on an array performs the operation on each value in the array, useful for say, subtracting the mean from every value and then squaring every value. \n", + "\n", + "But what *is* Numpy, actually? \n", + "\n", + "Believe it or not, from a mathematical perspective, what you saw in Part I was just about the limit of Python's native math functionality. You can add, subtract, multiply, exponentiate, and take modulos. To do anything more complicated — like, say, calculate a sine or cosine, we need to actually **import** libraries of functions which can accomplish these tasks. \n", + "\n", + "#### What's a function? \n", + "It's useful to take a second to make sure we're on the same page about functions. A function is something that takes one or more inputs, and spits something out. When, in math class, you write y = sin(x), \"sin\" is the function you are using. The \"x\" you are plugging in is the *argument* of that function, and you are storing its *output* in the variable \"y\". If I use the range() function, to create a list from 1 to 10, " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = range(1,11)\n", + "print(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then \"1\" and \"11\" are arguments to the range function, and it's output is stored in \"y\". Note that print() is also a function — it takes in the argument \"y\" and spits out its value onto the screen. \n", + "\n", + "Back to the task at hand. If I want to calculate the sine of a number, $x$, I can't do that in native Python. But luckily, many clever people have crafted libraries of functions which can. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, so to use these libraries of functions, I have to **import** them into my code, as I have above. Notice that I could just \"import numpy\" as well — but Python lets you give the library a \"nickname\" shorthand so that in your code, you don't have to type out \"numpy\" every time. In other libraries, you can choose whatever you want, but generally, numpy is imported as np and matplotlib.pyplot (a subset of matplotlib with the plotting commands we'll be using) as plt. Don't worry about the \"inline\" — it's just required to make plots appear in this notebook rather than a separate window.\n", + "\n", + "Now, I can create my sine:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.linspace(0,10,100)\n", + "y = np.sin(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Woah! New function alert! I also just used np.linspace(), a function which, unlike range (which has you pick a start and stop and advances by integer (or multiples) steps in between), lets you pick a start, a stop, and a number of subdivisions, and then will try to space them evenly. Read it as \"give me an array from 0 to 10 with 100 evenly spaced points.\" Read line two as \"Give me an array that contains the sine of each value in the x array.\" \n", + "\n", + "Now let's whip out our plotting:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.plot(x,y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is the absolute barebones you can plot: $x$ against $y$. We'll get into fancier plotting techniques a bit later. \n", + "\n", + "## Loading Data from a File\n", + "It's day 1 of your new research assignment. You've just met with the professor or post-doc. They've probably given you like 10,000 papers to read (skim). They might also have given you a file or two of data, and told you to \"familiarize\" yourself with the format, get it into Python, and make some plots. \n", + "\n", + "This is what we are going to learn to do now. \n", + "\n", + "### Loadtxt() and Genfromtxt()\n", + "Astronomical data are stored in a huge variety of file formats and organizational schemes. Let's start with the most simple and build up. In Ye Olde Days, basically all data were kept in plaintext ASCII files — in short, text. Things have changed recently, though often times data tables are still the most efficient means of storage, they are now wrapped inside file formats like FITS and HDF5 to make them more portable and stable over time. At the end of the day, we are most interested in getting past those layers of protocols to the raw numbers underneath, which we want sitting around in arrays we can mess with. \n", + "\n", + "Let's start with that most simple of cases: The ASCII text file. \n", + "From the dependencies, you should have downloaded the \"quantum_mechanics.grades\" file from the website. *Note:* Often, when working with ascii, programmers will set up their programs to output text files with extensions that indicate what they are (like data.spec and data.phot), rather than .txt — but rest assured, they are all text files and can be viewed in a text viewer like Sublime. \n", + "\n", + "Make sure that the text file is in the same directory you are running this notebook in. Your first task: Look up the **documentation** for Numpy's loadtxt() function — you can do so online (google the function) or by typing help(np.loadtxt) in a cell here in this Notebook. Once you've done that, use the cell below to load the data into a variable called class_grades. We're going to continue with our dataset from Part I.\n", + "\n", + "The file you are loading contains 2 columns, the first is the name of the student, the second is their grade on the exam. If you just try to run np.loadtxt() on the file, you'll get an error, because it cannot turn the names into floats. Make sure to look at the \"dtype\" option in the documentation, and try setting it to string — this will turn all things into strings, but we can extract the numbers easily later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class_grades =" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the cell below, once you get it to load without throwing an error, print the array to see what it looks like. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class_grades" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Take a closer look at the printed array. Notice how there are outer brackets surrounding a bunch of bracketed pairs? In the cell below, index the array for it's 0th element and see what happens:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should get the first ordered pair out — Sam, and his grade of 100. What happens if you index *that* mini-array for it's 1st element? Do that below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You should get the string \"100\". In the cell below, try to use the \"typecasting\" function we learned in Part I to convert the string \"100\" into a float. No need to save it into a variable, just try the command and see that it returns 100 without the string symbols:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So now we are starting to see a methodology for extracting the numbers out of the strings. Our next step is going to be creating a dictionary containing the student names as \"keys\" and the student scores as \"values.\" But for that, we are going to use a **for-loop**. \n", + "\n", + "### For-Loops\n", + "There are two primary looping methods in Python: For-loops and While-loops. We'll focus on For-loops for a second. \n", + "\n", + "A For-loop allows you to specify what's known as an iterator — usually an increasing array of indices — which let's you run a block of code over and over again under slightly different circumstances. For example, what if we wanted to advance through our class_grades array, and on a new line each time, print the name of each student. I could do that with the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in class_grades:\n", + " print(i[0])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "OK, so what just happened? By saying \"For i in class grades,\" I was telling the computer that class_grades was a container with multiple \"things\" in it (those ordered pairs we saw earlier). I told it \"Hey, for every \"thing\" in class grades, print out that \"thing\" indexed at 0.\" \n", + "\n", + "Notice that this worked because I, the programmer, knew that not only was class_grades something that could be advanced through, but that the subparts of class_grades themselves had subparts to be indexed. If I had said print(i[2]) I would've gotten an error, because we know each mini-array in class_grades has only a 0th and 1st element. \n", + "\n", + "Let's see another example. Remember the range() function? I can use that as a iterator as well:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(10):\n", + " print('My age is: {}'.format(i))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Remember, range(10) can be treated as its output, which is [0,1,2,3,4,5,6,7,8,9]. You could, to see it more clearly, say:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "thing_to_loop_over = range(10)\n", + "for thing in thing_to_loop_over:\n", + " print('My age is: {}'.format(thing))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I'm also highlighting here that while \"i\" is a standard choice for an outer loop iterating variable name (followed by \"j\" and \"k\"), you can use whatever you want as long as it's consistent in the loop. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I can also use loops to fill an empty array with values, e.g." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "to_fill = []\n", + "for i in class_grades: #check the length of class_grades, and create a counting array that long\n", + " to_fill.append(float(i[1]))\n", + "print(to_fill)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I've done here is make a list containing all the scores, as floats! For each item in class grades, I first take the 1st element (the i[1] part), then I force it to be a float (this wouldn't work if they weren't numbers in the strings), then I **appended** that value to the empty to_fill array outside. Appending to lists is easy, as shown, as a *method* of lists. So to add anything to the end of a list, you just write list_name.append(thing_to_add). \n", + "\n", + "### Exercise 1: A dictionary of student names / scores\n", + "OK, it's time for you to dive in! Once you've gotten the hang of the above, and maybe played around a bit with it, try the following. \n", + "\n", + "In the cell block below, define an empty dictionary called class_dictionary. Then, initialize a for-loop that goes through the class_grades array, and puts each student name (as a string) as a key, and their score (as a float) as a value. \n", + "\n", + "You can set new values in a dictionary even easier than appending, simply use \n", + "\n", + "dictionary_name['new key'] = new value\n", + "\n", + "*Note: Here all our keys are strings, and this is often the use-case for dictionaries, but it is not required. You can make dictionaries whose keys are, for example, integers. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Does it look like it worked? Try indexing for the grade of Peter in the cell below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Did you get 67.0? \n", + "\n", + "Hmmm... I wonder, who did the best and worst on the midterm? You might want to send one a congrats, and reach out to the other for why they are struggling. Of course, we could just look through our class_grades array, but not only is that no fun, it would be annoying in a class of 900, and impossible in a data-set of millions. \n", + "\n", + "Luckily, Python, and Numpy, come with max() and min() functions we can use to do this (the subtly over when you want to use max() and np.max() can be brushed over for now — my general advice is to just default to using numpy versions of functions).\n", + "\n", + "\n", + "Remember the \"to_fill\" list I made above? That has the list, in the original order, of scores on the test. \n", + "In the cell below, set variables for \"max_score\" and \"min_score\" by using the np.max() and np.min() functions on that list, then print them out. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cool, so we know one person got a 0 and one got a 118/120. But *who* were they? Turns out, np has a handy function that tells us, instead of the max and min, the *positions* of the max and min in the array. Try the same block of code below, but use np.argmax() and np.argmin() instead, and save them into variables \"pos_max\" and \"pos_min\" (for position). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, so if we did that right, the minimum score is the 63rd element, and the maximum is the 7th. In the cell below, index our original class_grades array for the 63rd and 7th elements (and double index to pull just the name out). Who got the lowest and highest score?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(It should be Sean and Celeste). Let's say we want to set a grade cutoff, and figure out all students with scores above and below that cutoff. It makes some sense to pick the mean for that — last tutorial we calculated it manually, but here we can use np.mean()\n", + "\n", + "In the cell below, find the mean score by running np.mean() on the to_fill list. Save it to a variable called mean_score." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Looks like the mean score is around 62 / 120. How do we know if a student was above or below that score? That's, naturally, where conditional \"if-statements\" come in. \n", + "\n", + "## If Statements and other Conditionals\n", + "The problem I've posed, of figuring out whether a condition is true or not, is addressed in code via conditional statements. They run, logically, along the lines of \"IF something is TRUE, do THIS, IF something ELSE is TRUE, do THAT, OTHERWISE do SOMETHING ELSE.\" \n", + "\n", + "Here's an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "person_name = 'Finnaeus Fthrockbottom'\n", + "if len(person_name) < 10:\n", + " print('This is a short name')\n", + "else: \n", + " print('This is a long name')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So here, I asked whether the statement \"the length of the string person_name is less than 10\" was true or not. If it had been true, the first statement would've printed, but since it was not, the second one did. \n", + "\n", + "You can also stack multiple if statements in a row:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 'F' in person_name:\n", + " print('F Contained')\n", + "if 'n' in person_name:\n", + " print('n Contained')\n", + "if 'l' in person_name:\n", + " print('F Contained')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It prints that F and n are contained, but not for 'l', which is not. Here, all three if-statements are independent. But we can link them using elif statements, which are combinations of else and if:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 'F' in person_name:\n", + " print('F Contained')\n", + "elif 'n' in person_name:\n", + " print('n Contained')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happened here? We said \"If 'F' is in the name, print something, ELSE, IF it is NOT contained, but 'n' is contained, print something.\" But the first condition WAS met, so the else was never triggered. Thus, by mixing together if's, else's, and elif's, you can check conditions you are interested in. You can also ask multiple conditions in the same line:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if (\"F\" in person_name) and ('n' in person_name):\n", + " print('Both present')\n", + "\n", + "if (\"F\" in person_name) or ('n' in person_name):\n", + " print('One is present')\n", + "\n", + "if (\"l\" not in person_name) and ('y' not in person_name):\n", + " print('Neither present')\n", + "\n", + "if (\"l\" not in person_name) or ('F' not in person_name):\n", + " print('One is not present')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Take a moment parsing the above, seeing how you can string together conditionals. You can ask if things are in, or not in, other things, or you can compare values by asking if things are equal (==), not equal (!=), or greater than/less than (>, <). \n", + "\n", + "### Exercise: Sort the students\n", + "Below, write a for-loop that goes through the array class_grades and checks if their score is above or below the mean score. If their score is above the mean, append that students mini-array into an externally defined list called proficient_students, and if not, put them in one called struggling_students." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Here's what you should get, rerun the block with your own array to see if it matches\n", + "print(proficient_students)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So we've now seen how to get data from a text file into Python, where we can start interrogating it, and performing analysis and calculations with it. For a look at how to pull in data from more complicated systems, like FITS, check out the first part of the SDSS tutorial, and for a look at how to do more analysis on text files you've read in using loops and conditionals, check out the Neon Spectrum Centroiding tutorial. \n", + "\n", + "Now, the other thing Python is great for is powerful visualizations. We saw in Part I how we could plot the histogram of scores and get an idea of where one standard deviation on either side of the mean was. Now we are going to do a bit more plotting with this dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plotting a histogram\n", + "Let's jump back, to start, with that histogram from Part I, and see line by line how to make it. Don't feel discouraged if plotting commands seem like a whole new language over Python — they kind of are. It takes a lot of practice and experience to build up familiarity with what commands make plots look certain ways. For now, googling \"how to add to a plot\" is fine. \n", + "\n", + "The simplest kind of plot possible is plt.plot(), as for the sine wave we plotted above. But that doesn't really help us with the 1D data we have here (score). We need to extend into a new dimension to plot anything, which is why seeing how many people got each possible score is an interesting metric. That's a histogram. Matplotlib has a built in function to plot these. We can start by specifying nothing but the values to histogram (I'll use the simple to_fill array from above):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(to_fill)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By default, matplotlib picks a color, and creates ticks and labels as shown. What if I want the blue to actually be red? And semi transparent? We can do that:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(to_fill,color=\"r\",alpha=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It's hard to see because there's nothing behind it, but we now have a semi-transparent red plot. Now, looking, we have by default created 10 bins. We can create more or less:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(to_fill,bins=6,color='r',alpha=0.5)\n", + "plt.hist(to_fill,bins=15,color='b',alpha=0.7)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that by decreasing the number of bins, more students appear in each bin. By increasing the number of bins, the opposite effect occurs. Often, we want to normalize histograms:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(to_fill,bins=6,color='r',alpha=0.5,normed=True)\n", + "plt.hist(to_fill,bins=15,color='b',alpha=0.5,normed=True)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By normalizing, we can see the distributions overlaid on each other. It seems to me like the default of 10 was a decent number of bins. Let's try 9, un-normalize, and then add some labels: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(to_fill,bins=9,alpha=0.5)\n", + "plt.xlabel('Score (out of 120)')\n", + "plt.ylabel('Number of Students')\n", + "plt.title('Distribution of Student Scores for Midterm 1')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Often we are interested in knowing the percentiles of a distribution — the standard in statistics is the 16th, 50th, and 84th percentile. You can calculate these easily with np.percentile(array_like,#), where # is the percentile you want to calculate. \n", + "\n", + "I'll tell you that the function plt.axvline(value,ls='--',color='k') will plot a vertical black dashed line at a certain x-axis value \"value\". In the block below, reproduce the plot in the cell above, but with the standard percentile spots demarcated by vertical dashed lines. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here -- should reproduce the plot below when run." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Great. That gives us a pretty easy way of assigning grades — we can go D, C, B, A in the four categories created, or we can set the whole middle region to B, A above the 84th percentile, and C below the 16th percentile (if we are nice). \n", + "\n", + "Let's get a little bit fancier. I want to write a function that will compare the scores of any number of students, given a list of names. It should taken in the names as strings, and then create a horizontal bar plot showing their respective scores, with their names on the y axis. Why horizontal? Think about it — the score is always out of 120, and the width of our computer screen is fixed, while the number of names we enter is variable, and our computer can scroll to accomodate any reasonable height. This way, our names won't get squished trying to fit everything in. \n", + "\n", + "**Step One** A function that can take in different numbers of arguments. \n", + "Take a look at the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def a_function(arg1,arg2):\n", + " computation = arg1 + arg2\n", + " return computation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I can run the above and feed it two numbers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a_function(1,5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And you can see it did the computation and returned it. But what if I want to add three numbers?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a_function(1,5,6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What we get here is a \"TypeError.\" It's raised because our function was specified to take exactly 2 arguments (arg1 and arg2), but we gave it three. Shoutout to python's error message actually being helpful. OK, so how do we fix this? \n", + "\n", + "Here's one way:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def new_func(array_like):\n", + " out_sum = 0\n", + " for i in array_like:\n", + " out_sum += i\n", + " return out_sum" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I've done is force the user to enter a list of numbers, then iterated through and added them all up. (Yes, I could've just run np.sum() on the array_like, but what's the fun in that?). But that's just a workaround — sometimes, we need the function to take a truly variable number of inputs. \n", + "\n", + "That's where **args** and **kwargs** come in. Check this out:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def sum_func(arg1,arg2,*args):\n", + " out_sum = arg1+arg2\n", + " for i in args:\n", + " out_sum +=i \n", + " return out_sum" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What's going on? Let's test the function a bit:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum_func(1,2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum_func(1,2,3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum_func(1,2,3,4,5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By specifying \\*args as the final input to the function, we told python \"allow any extra arguments to be entered into this function, and store them in a list called args.\" Then, we calculated the first sum (the one that is required), and went through any extra numbers that might've been entered and added them in as well. \n", + "\n", + "There is a slightly different version of this that applies to a \"dictionary\" style way of doing things. See below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def dict_funct(arg1,arg2,**kwargs):\n", + " output_dict = {}\n", + " output_dict[arg1] = arg2\n", + " for i in kwargs.keys():\n", + " output_dict[i] = kwargs[i]\n", + " return output_dict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I've done is made a function that takes 2 things, and puts them in a dictionary where the first argument is a key and the second is a value (for illustration). Watch:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "one = dict_funct('key1',5)\n", + "one" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "two = dict_funct('key1',5,key2=6)\n", + "two" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In short, \\*\\*kwargs tells python \"allow the user to add extra variables to this function, but they have to be of the form a=b, and store those extra variables in a dictionary where each a is a key and each b is a value.\" \n", + "\n", + "Sometimes, args and kwargs are most useful not even because you want to use the extra optional arguments in a function, but because you want your intermediary function to allow anything to get dumped into it, and just return it and pass it all along to the next function in your program. \n", + "\n", + "OK. Back to our students. We want to compare at least a minimum of two students, and the ability to add in as many extra as we want. Our basic skeleton then will look something like" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def compare_students(student1,student2,*args):\n", + " \"some code here\"\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's practice making the bar plot. We'll be using plt.hbar(), which can take a list of strings (names) and corresponding list of values (scores), and make a bar plot (horizontal). See:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.barh([1,2,3],[4,3,6])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we want the 1,2,3 to actually be the names of students. So I can manually set the tick labels for the plot as follows: \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tick_labels = ('samantha','dave','malena')\n", + "plt.barh([1,2,3],[4,3,6])\n", + "plt.yticks([1,2,3],tick_labels)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cool! We're basically ready to go here. Using what I've illustrated above, make a function which takes any number of students in our sample above as strings, and makes the plot of their scores. It's up to you which way you chose to index out the student's scores, but the fastest way will be by invoking the class_dictionary we made above. Throw a title an axis labels on there while you're at it. Then test it out on first 2 students, then 3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#code here" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Try running your function with this block, and seeing if you get the plot below.\n", + "compare_students('Sarah','Josiah')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "compare_students('Sarah','Josiah','Malena')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There might be some nomenclature that's a bit unfamiliar to you in the way I designed my function, if you check my solutions, though you should have been able to accomplish what was needed using for loops and things we've learned so far. But to clue you in, I utilized two basic Python behaviors to accomplish my task in fewer lines: List addition, and list comprehension. \n", + "\n", + "List addition is simply the fact that to combine two lists into one, just add them:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[1,2,4] + [4,5,6]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Thus, if you have a list, and two separate values (the way you probably did when args is a list of names, but you have two strings of names floating around outside, you can make a consolidated list by putting the two floating strings into their own list and adding them to the rest. In my example, \n", + "\n", + "\\[student1\\] + \\[student2\\] + args\n", + "\n", + "has the same effect as\n", + "\n", + "\\[student1,student2\\] + args would have." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The other thing I did was a list comprehension. Watch the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "empty_list = []\n", + "for i in range(10):\n", + " empty_list.append(i*2)\n", + "empty_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I used a for-loop to fill an empty list with the values of range(10) each times two. I can also use the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "full_list = [i*2 for i in range(10)]\n", + "full_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Essentially, I compress the for-loop iteration into 1 line. Python knows I mean \"create a list with values that are i\\*2 for each i in range(10)\". I can do this in many situations, which saves me space in my code, and is often faster computationally as well. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Better Function Writing** \n", + "\n", + "Let's take a few steps to make my comparison function better. The first thing we want to do is add *documentation*. This tells people out to use the function. Usually, documention looks something like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def compare_students(student1,student2,*args):\n", + " '''\n", + " A function to produce a horizontal bar plot comparing the midterm scores of students\n", + " INPUTS:\n", + " student1 (string): the name of a student in the class_dictionary\n", + " student2 (string): the name of a student in the class_dictionary\n", + " *args (optional, string): any number of students from the class_dictionary\n", + " PRODUCES: \n", + " A bar plot \n", + " RETURNS: \n", + " NONE\n", + " '''\n", + " #Code goes here (not to spoil the above exercise!)\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, if someone is looking at our code, they can easily figure out that they need, for example, to have a dictionary called \"class_dictionary\" defined in their code for this function to work. Actually, the fact that my function requires that is bad, we'll get to that in a minute. If someone were using our code but not actually looking at the text file, they could type:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "help(compare_students)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And our documentation for it would pop up in their terminal, making it easy for them to make sure they are using it properly. \n", + "\n", + "Back to what I said about the class dictionary. Inside my function, I index class_dictionary to get the scores for individual students out. But what if class_dictionary wasn't defined in my code? My function couldn't run. If I copied and pasted my function into another file, it wouldn't run by default. In short, it's not **general**. It's best to make your code as reasonably generalizable as possible — it will help you re-use your own code later, and catch bugs. I can make my function more generalizable by requiring the user to *provide* a class_dictionary to the function. That truly isolates it, and means I can move it from file to file or know that my tests of it aren't importing problems from elsewhere in my code. \n", + "\n", + "But what if I don't want to manually type \"class_dictionary\" into my code when I run it, since, at least, in this file, I only have 1 commonly defined one? Check this out:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def compare_students(student1,student2,class_dict=class_dictionary,*args):\n", + " '''\n", + " A function to produce a horizontal bar plot comparing the midterm scores of students\n", + " INPUTS:\n", + " student1 (string): the name of a student in the class_dictionary\n", + " student2 (string): the name of a student in the class_dictionary\n", + " *args (optional, string): any number of students from the class_dictionary\n", + " PRODUCES: \n", + " A bar plot \n", + " RETURNS: \n", + " NONE\n", + " '''\n", + " #Code goes here!\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What I've done is added a new required argument to my function (I named it class_dict to avoid confusion with class_dictionary). But in the function call itself, I set the input of class_dict equal to the class_dictionary I have sitting in my code. **Note:** pre-set or \"default\" arguments in functions must be defined *after* all the required, undefault ones (I couldn't put class_dict=class_dictionary before student1 and student2. \n", + "\n", + "This is a reasonable compromise for my code — I don't have to type \n", + "compare_students('name','name2',class_dictionary,'other_name')\n", + "\n", + "every time— I can use my function as normal. BUT, if I move my function to another code file, it's clear that I need to manually enter a new dictionary, or set one named class_dictionary outside my function in my code for it to work. \n", + "\n", + "As a final edition, I'll update the documentation to include the parameters imposed on the input dictionary. But I'll also make it the most general (not set a default), and move class_dict to the front of the required arguments (just for the aesthetic of giving a dictionary, then as many names as you want (min 2), rather than 2 names, a dictionary, and then more names. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def compare_students(class_dict,student1,student2,*args):\n", + " '''\n", + " A function to produce a horizontal bar plot comparing the midterm scores of students\n", + " INPUTS:\n", + " class_dict (dict): A dictionary containing student names and exam scores of the form {'name' (str): score (float)}\n", + " student1 (string): the name of a student in the class_dictionary\n", + " student2 (string): the name of a student in the class_dictionary\n", + " *args (optional, string): any number of students from the class_dictionary\n", + " PRODUCES: \n", + " A bar plot \n", + " RETURNS: \n", + " NONE\n", + " '''\n", + " #Code here\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**ON YOUR OWN**\n", + "\n", + "Here's a couple of exercises to play around with this function to make it EVEN MORE general, which you should be able to do with some quick googling. \n", + "\n", + "1. The string matching from argument to dictionary key is exact — the user can't enter 'malena' if the key was 'Malena'. The best way around this might be to coerce all the strings to be all lower or upper case in the dictionary, and then coerce the user input to the function to be the same case (upper or lower) before attempting to query the dictionary. Look up how to make strings upper or lower case, and implement that in your function. Don't necessarily go out and change how we created the dictionary in the first place, /within/ your function, update all the names in the dict to meet this need. \n", + "2. What if someone enters a name in your function that, once problem 1 above is accomplished, isn't in the dictionary (not in class, or misspelled). As of now, your function will stop and throw a \"key error\", and say that the name is not in the dictionary. For the sake of exercise, let's change that behavior, and ignore it if a name isn't included moving on to all the other names and still producing the plot. Update your function such that if a name isn't in the dictionary, it prints a warning \"Warning, ___ wasn't in the class dict, continuing...\" so the user knows, but then still plots the rest of the (working) names. You could do this with an if-statement before actually querying the dict, or if you're adventurous, look up \"try and except statements\" online. \n", + "3. Look around plt.barh's documentation, and see if you can make your plot do the highest scorer in a different color than the rest. Note, the easiest way might be to go through once plotting all in one color, and plotting the new color for the top score bar on top of it. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alright! That's it for this tutorial. As always, I hope it was helpful to you. If you have any questions about it (or find typos), or a question about your own code as you're getting started, feel free to email me! I welcome all kinds of feedback. I hope to have a few more very-entry-level tutorials up soon, so stay tuned!" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python [anaconda2]", + "language": "python", + "name": "Python [anaconda2]" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Bootcamp/quantum_mechanics.grades b/Bootcamp/quantum_mechanics.grades new file mode 100644 index 0000000..72d6943 --- /dev/null +++ b/Bootcamp/quantum_mechanics.grades @@ -0,0 +1,65 @@ +Sam 100 +Joe 68 +Priya 40 +Minh 78 +Sarah 81 +Shivani 65 +Alex 39 +Celeste 118 +Kaitlyn 46 +Andrew 78 +Dhruv 9 +Wren 37 +Melanie 43 +Caroline 87 +Roger 54 +Dick 29 +Mariah 95 +Josiah 87 +Malena 111 +Steven 65 +Stephanie 43 +Riya 53 +Jose 47 +Ahmed 16 +Mariska 98 +Tom 82 +Isaac 58 +Letty 5 +Emily 49 +Peter 67 +Arie 60 +Michael 76 +Bahar 16 +Jasmine 111 +Vega 65 +Samantha 61 +Nick 73 +Nikoo 63 +Leo 115 +Madeline 72 +Caragh 76 +Grace 48 +Arjun 75 +Sahana 101 +Nils 45 +Isabel 46 +Adam 82 +Pauline 57 +Paul 17 +Trevor 88 +Allyn 90 +Cedric 53 +Christine 32 +Derek 28 +Divya 50 +Gibson 91 +Justin 93 +Kelly 7 +Kaley 63 +Kiara 88 +Levi 55 +Luis 37 +Sanni 67 +Sean 0 +Nikoo 79 \ No newline at end of file