Reference
Last updated on 2023-08-19 | Edit this page
Reference
Running and Quitting
- Python files have the
.pyextension. - Can be written in a text file or a Jupyter Notebook.
- Jupyter notebooks have the extension
.ipynb - Jupyter notebooks can be opened from Anaconda or
through the command line by entering
$ jupyter notebook- Markdown and HTML are allowed in markdown cells for documenting code.
- Jupyter notebooks have the extension
Variables and Assignment
- Variables are stored using
=.- Strings are defined in quotations
'...'. - Integers and floating point numbers are defined without quotations.
- Strings are defined in quotations
- Variables can contain letters, digits, and underscores
_.- Cannot start with a digit.
- Variables that start with underscores should be avoided.
- Use
print(...)to display values as text. - Can use indexing on strings.
- Indexing starts at 0.
- Position is given in square brackets
[position]following the variable name. - Take a slice using
[start:stop]. This makes a copy of part of the original string.-
startis the index of the first element. -
stopis the index of the element after the last desired element.
-
- Use
len(...)to find the length of a variable or string.
Data Types and Type Conversion
- Each value has a type. This controls what can be done with it.
-
intrepresents an integer -
floatrepresents a floating point number. -
strrepresents a string.
-
- To determine a variables type, use the built-in function
type(...), including the variable name in the parenthesis. - Modifying strings:
- Use
+to concatenate strings. - Use
*to repeat a string. - Numbers and strings cannot be added to on another.
- Convert string to integer:
int(...). - Convert integer to string:
str(...).
- Convert string to integer:
- Use
Built-in Functions and Help
- To add a comment, place
#before the thing you do not with to be executed. - Commonly used built-in functions:
-
min()finds the smallest value. -
max()finds the largest value. -
round()rounds off a floating point number. -
help()displays documentation for the function in the parenthesis.- Other ways to get help include holding down
shiftand pressingtabin Jupyter Notebooks.
- Other ways to get help include holding down
-
Libraries
- Importing a library:
- Use
import ...to load a library. - Refer to this library by using
module_name.thing_name.-
.indicates ‘part of’.
-
- Use
- To import a specific item from a library:
from ... import ... - To import a library using an alias:
import ... as ... - Importing the math library:
import math- Example of referring to an item with the module’s name:
math.cos(math.pi).
- Example of referring to an item with the module’s name:
- Importing the plotting library as an alias:
import matplotlib as mpl
Reading Tabular Data into DataFrames
- Use the pandas library to do statistics on tabular data. Load with
import pandas as pd.- To read in a csv:
pd.read_csv(), including the path name in the parenthesis.- To specify a column’s values should be used as row headings:
pd.read_csv('path', index_col='column name'), where path and column name should be replaced with the relevant values.
- To specify a column’s values should be used as row headings:
- To read in a csv:
- To get more information about a DataFrame, use
DataFrame.info, replacingDataFramewith the variable name of your DataFrame. - Use
DataFrame.columnsto view the column names. - Use
DataFrame.Tto transpose a DataFrame. - Use
DataFrame.describeto get summary statistics about your data.
Pandas DataFrames
- Select data using
[i,j]- To select by entry position:
DataFrame.iloc[..., ...]- This is inclusive of everything except the final index.
- To select by entry label:
DataFrame.loc[..., ...]- Can select multiple rows or columns by listing labels.
- This is inclusive to both ends.
- Use
:to select all rows or columns.
- To select by entry position:
- Can also select data based on values using
TrueandFalse. This is a Boolean mask.mask = subset > 10000- We can then use this to select values.
- To use a select-apply-combine operation we use
data.apply(lambda x: x > x.mean())wheremean()can be any operation the user would like to be applied to x.
Plotting
- The most widely used plotting library is
matplotlib.- Usually imported using
import matplotlib.pyplot as plt. - To plot we use the command
plt.plot(time, position). - To create a legend use
plt.legend(['label1', 'label2'], loc='upper left')- Can also define labels within the plot statements by using
plt.plot(time, position, label='label'). To make the legend show up, useplt.legend()
- Can also define labels within the plot statements by using
- To label x and y axis
plt.xlabel('label')andplt.ylabel('label')are used.
- Usually imported using
- Pandas DataFrames can be used to plot by using
DataFrame.plot(). Any operations that can be used on a DataFrame can be applied while plotting.- To plot a bar plot
data.plot(kind='bar')
- To plot a bar plot
Lists
- Defined within
[...]and separated by,.- An empty list can be created by using
[].
- An empty list can be created by using
- Can use
len(...)to determine how many values are in a list. - Can index just as done in previous lessons.
- Indexing can be used to reassign values
list_name[0] = newvalue.
- Indexing can be used to reassign values
- To add an item to a list use
list_name.append(), with the item to append in the parenthesis. - To combine two lists use
list_name_1.extend(list_name_2). - To remove an item from a list use
del list_name[index].
For Loops
- Start a for loop with
for number in [1, 2, 3]:, with the following lines indented.-
[1, 2, 3]is considered the collection. -
numberis the loop variable. - The action following the collection is the body.
-
- To iterate over a sequence of numbers use
range(start, end)
Conditionals
- Defined similarly to a loop, using
if variable conditional value:.- For example,
if variable > 5:.
- For example,
- Use
elif:for additional tests. - Use
else:for when if statement is not true. - Can combine more than one conditional by using
andoror. - Often used in combination with for loops.
- Conditions that can be used:
-
==equal to. -
>=greater than or equal to. -
<=less than or equal to. -
>greater than. -
<less than.
-
Looping Over Data Sets
- Use a for loop:
for filename in [file1, file2]: - To find a set of files using a pattern use
glob.glob- Must import first using
import glob. -
*indicates “match zero or more characters” -
?indicates “match exactly one character”- For example:
glob.glob(*.txt)will find all files that end with.txtin the current directory.
- For example:
- Must import first using
- Combine these by writing a loop using:
for filename in glob.glob(*.txt):
Writing Functions
- Define a function using
def function_name(parameters):. Replaceparameterswith the variables to use when the function is executed. - Run by using
function_name(parameters). - To return a result to the caller use
return ...in the function.
Variable Scope
- A local variable is defined in a function and can only be seen and used within that function.
- A global variable is defined outside of a function and can be seen or used anywhere after definition.
Programming Style
- Document your code.
- Use clear and meaningful variable names.
- Follow the PEP8 style guide when setting up your code.
- Use assertions to check for internal errors.
- Use docstrings to provide help.
Glossary
- Arguments
- Values passed to functions.
- Array
- A container holding elements of the same type.
- Boolean
-
An object composed of
TrueandFalse. - DataFrame
- The way Pandas represents a table; a collection of series.
- Element
- An item in a list or an array. For a string, these are the individual characters.
- Function
- A block of code that can be called and re-used elsewhere.
- Global variable
- A variable defined outside of a function that can be used anywhere.
- Index
- The position of a given element.
- Jupyter Notebook
- Interactive coding environment allowing a combination of code and markdown.
- Library
- A collection of files containing functions used by other programs.
- Local Variable
- A variable defined inside of a function that can only be used inside of that function.
- Mask
- A boolean object used for selecting data from another object.
- Method
-
An action tied to a particular object. Called by using
object.method. - Modules
- The files within a library containing functions used by other programs.
- Parameters
- Variables used when executing a function.
- Series
- A Pandas data structure to represent a column.
- Substring
- A part of a string.
- Variables
- Names for values.