Last updated on 2023-07-24 | Edit this page
- How can I create my own functions?
- Explain and identify the difference between function definition and function call.
- Write a function that takes a small, fixed number of arguments and produces a single result.
- Human beings can only keep a few items in working memory at a time.
- Understand larger/more complicated ideas by understanding and
- Components in a machine.
- Lemmas when proving theorems.
- Functions serve the same purpose in programs.
- Encapsulate complexity so that we can treat it as a single “thing”.
- Also enables re-use.
- Write one time, use many times.
- Begin the definition of a new function with
- Followed by the name of the function.
- Must obey the same rules as variable names.
- Then parameters in parentheses.
- Empty parentheses if the function doesn’t take any inputs.
- We will discuss this in detail in a moment.
- Then a colon.
- Then an indented block of code.
def print_greeting(): print('Hello!') print('The weather is nice today.') print('Right?')
- Defining a function does not run it.
- Like assigning a value to a variable.
- Must call the function to execute the code it contains.
- Functions are most useful when they can operate on different data.
- Specify parameters when defining a function.
- These become variables when the function is executed.
- Are assigned the arguments in the call (i.e., the values passed to the function).
- If you don’t name the arguments when using them in the call, the arguments will be matched to parameters in the order the parameters are defined in the function.
def print_date(year, month, day): = str(year) + '/' + str(month) + '/' + str(day) joined print(joined) 1871, 3, 19)print_date(
Or, we can name the arguments when we call the function, which allows us to specify them in any order and adds clarity to the call site; otherwise as one is reading the code they might forget if the second argument is the month or the day for example.
=3, day=19, year=1871)print_date(month
- Via Twitter:
()contains the ingredients for the function while the body contains the recipe.
return ...to give a value back to the caller.
- May occur anywhere in the function.
- But functions are easier to understand if
- At the start to handle special cases.
- At the very end, with a final result.
def average(values): if len(values) == 0: return None return sum(values) / len(values)
= average([1, 3, 4]) a print('average of actual values:', a)
average of actual values: 2.6666666666666665
print('average of empty list:', average())
average of empty list: None
- Remember: every function returns something.
- A function that doesn’t explicitly
returna value automatically returns
= print_date(1871, 3, 19) result print('result of call is:', result)
1871/3/19 result of call is: None
- Read the code below and try to identify what the errors are without running it.
- Run the code and read the error message. Is it a
- Fix the error.
- Repeat steps 2 and 3 until you have fixed all the errors.
def another_function print("Syntax errors are annoying.") print("But at least python tells us about them!") print("So they are usually not too hard to fix.")
def another_function(): print("Syntax errors are annoying.") print("But at least Python tells us about them!") print("So they are usually not too hard to fix.")
calling <function report at 0x7fd128ff1bf8> 22.5
A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows
calling pressure is 22.5
- What’s wrong in this example?
= print_time(11, 37, 59) result def print_time(hour, minute, second): = str(hour) + ':' + str(minute) + ':' + str(second) time_string print(time_string)
- After fixing the problem above, explain why running this example code:
= print_time(11, 37, 59) result print('result of call is:', result)
gives this output:
11:37:59 result of call is: None
- Why is the result of the call
The problem with the example is that the function
print_time()is defined after the call to the function is made. Python doesn’t know how to resolve the name
print_timesince it hasn’t been defined yet and will raise a
NameError: name 'print_time' is not defined
The first line of output
11:37:59is printed by the first line of code,
result = print_time(11, 37, 59)that binds the value returned by invoking
print_timeto the variable
result. The second line is from the second print call to print the contents of the
print_time()does not explicitly
returna value, so it automatically returns
import pandas as pd def min_in_data(filename): = pd.read_csv(filename) data return data.min()
Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty? What if the list has no negative numbers?
def first_negative(values): for v in ____: if ____: return ____
def first_negative(values): for v in values: if v < 0: return v
If an empty list or a list with all positive values is passed to this
function, it returns
=  my_list print(first_negative(my_list))
Earlier we saw this function:
def print_date(year, month, day): = str(year) + '/' + str(month) + '/' + str(day) joined print(joined)
We saw that we can call the function using named arguments, like this:
=1, month=2, year=2003)print_date(day
- What does
print_date(day=1, month=2, year=2003)print?
- When have you seen a function call like this before?
- When and why is it useful to call functions this way?
- We saw examples of using named arguments when working with
the pandas library. For example, when reading in a dataset using
data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country'), the last argument
index_colis a named argument.
- Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.
The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) to the computer and then the computer will print a label.
import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams = 70 + 20.0 * (2.0 * random.random() - 1.0) mass print(mass) # egg sizing machinery prints a label if mass >= 85: print("jumbo") elif mass >= 70: print("large") elif mass < 70 and mass >= 55: print("medium") else: print("small")
The if-block that classifies the eggs might be useful in other
situations, so to avoid repeating it, we could fold it into a function,
get_egg_label(). Revising the program to use the function
would give us this:
# revised version import random for i in range(10): # simulating the mass of a chicken egg # the (random) mass will be 70 +/- 20 grams = 70 + 20.0 * (2.0 * random.random() - 1.0) mass print(mass, get_egg_label(mass))
- Create a function definition for
get_egg_label()that will work with the revised program above. Note that the
get_egg_label()function’s return value will be important. Sample output from the above program would be
- A dirty egg might have a mass of more than 90 grams, and a spoiled
or broken egg will probably have a mass that’s less than 50 grams.
get_egg_label()function to account for these error conditions. Sample output could be
25 too light, probably spoiled.
def get_egg_label(mass): # egg sizing machinery prints a label = "Unlabelled" egg_label if mass >= 90: = "warning: egg might be dirty" egg_label elif mass >= 85: = "jumbo" egg_label elif mass >= 70: = "large" egg_label elif mass < 70 and mass >= 55: = "medium" egg_label elif mass < 50: = "too light, probably spoiled" egg_label else: = "small" egg_label return egg_label
Assume that the following code has been executed:
import pandas as pd = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0) data_asia = data_asia.loc['Japan']japan
- Complete the statements below to obtain the average GDP for Japan across the years reported for the 1980s.
= 1983 year = 'gdpPercap_' + str(year // ____) gdp_decade = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2avg
- Abstract the code above into a single function.
def avg_gdp_in_decade(country, continent, year): = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0) data_countries ____ ____ ____return avg
- How would you generalize this function if you did not know beforehand which specific years occurred as columns in the data? For instance, what if we also had data from years ending in 1 and 9 for each decade? (Hint: use the columns to filter out the ones that correspond to the decade, instead of enumerating them in the code.)
- The average GDP for Japan across the years reported for the 1980s is computed with:
= 1983 year = 'gdpPercap_' + str(year // 10) gdp_decade = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2avg
- That code as a function is:
def avg_gdp_in_decade(country, continent, year): = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) data_countries = data_countries.loc[country] c = 'gdpPercap_' + str(year // 10) gdp_decade = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2 avg return avg
- To obtain the average for the relevant years, we need to loop over them:
def avg_gdp_in_decade(country, continent, year): = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) data_countries = data_countries.loc[country] c = 'gdpPercap_' + str(year // 10) gdp_decade = 0.0 total = 0 num_years for yr_header in c.index: # c's index contains reported years if yr_header.startswith(gdp_decade): = total + c.loc[yr_header] total = num_years + 1 num_years return total/num_years
The function can now be called by:
In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in a geometrical space. A canonical example of a dynamical system is the logistic map, a growth model that computes a new population density (between 0 and 1) based on the current density. In the model, time takes discrete values 0, 1, 2, …
- Define a function called
logistic_mapthat takes two inputs:
x, representing the current population (at time
t), and a parameter
r = 1. This function should return a value representing the state of the system (population) at time
t + 1, using the mapping function:
f(t+1) = r * f(t) * [1 - f(t)]
whileloop, iterate the
logistic_mapfunction defined in part 1, starting from an initial population of 0.5, for a period of time
t_final = 10. Store the intermediate results in a list so that after the loop terminates you have accumulated a sequence of values representing the state of the logistic map at times
t = [0,1,...,t_final](11 values in total). Print this list to see the evolution of the population.
Encapsulate the logic of your loop into a function called
iteratethat takes the initial population as its first input, the parameter
t_finalas its second input and the parameter
ras its third input. The function should return the list of values representing the state of the logistic map at times
t = [0,1,...,t_final]. Run this function for periods
t_final = 100and
1000and print some of the values. Is the population trending toward a steady state?
def logistic_map(x, r): return r * x * (1 - x)
= 0.5 initial_population = 10 t_final = 1.0 r = [initial_population] population for t in range(t_final): population.append( logistic_map(population[t], r) )
- ```python def iterate(initial_population, t_final, r): population = [initial_population] for t in range(t_final): population.append( logistic_map(population[t], r) ) return population
for period in (10, 100, 1000): population = iterate(0.5, period, 1) print(population[-1]) ```
0.06945089389714401 0.009395779870614648 0.0009913908614406382
The population seems to be approaching zero.
Functions will often contain conditionals. Here is a short example that will indicate which quartile the argument is in based on hand-coded values for the quartile cut points.
def calculate_life_quartile(exp): if exp < 58.41: # This observation is in the first quartile return 1 elif exp >= 58.41 and exp < 67.05: # This observation is in the second quartile return 2 elif exp >= 67.05 and exp < 71.70: # This observation is in the third quartile return 3 elif exp >= 71.70: # This observation is in the fourth quartile return 4 else: # This observation has bad data return None 62.5)calculate_life_quartile(
That function would typically be used within a
but Pandas has a different, more efficient way of doing the same thing,
and that is by applying a function to a dataframe or a portion
of a dataframe. Here is an example, using the definition above.
= pd.read_csv('data/gapminder_all.csv') data 'life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile)data[
There is a lot in that second line, so let’s take it piece by piece.
On the right side of the
= we start with
data['lifeExp'], which is the column in the dataframe
lifExp. We use the
apply() to do what it says, apply the
calculate_life_quartile to the value of this column for
every row in the dataframe.
- Break programs down into functions to make them easier to understand.
- Define a function using
defwith a name, parameters, and a block of code.
- Defining a function does not run it.
- Arguments in a function call are matched to its defined parameters.
- Functions may return a result to their caller using