Content from Working With Variables

Last updated on 2023-04-26 | Edit this page

Estimated time: 65 minutes

## Overview

### Questions

- How can I store values and do simple calculations with them?

### Objectives

- Navigate among important sections of the MATLAB environment.
- Assign values to variables.
- Identify what type of data is stored in MATLAB arrays.
- Read tabular data from a file into a program.

## Introduction to the MATLAB GUI

Before we can start programming, we need to know a little about the MATLAB interface. Using the default setup, the MATLAB desktop contains several important sections:

- In the
**Command Window**we can execute commands. Commands are typed after the prompt`>>`

and are executed immediately after pressing`Enter`. - Alternatively, we can open the
**Editor**, write our code and run it all at once. The advantage of this is that we can save our code and run it again in the same way at a later stage. - The
**Workspace**contains all the variables which we have loaded into memory. - The
**Current Folder**window shows files in the current directory, and we can change the current folder using this window. -
**Search Documentation**on the top right of your screen lets you search for functions. Suggestions for functions that would do what you want to do will pop up. Clicking on them will open the documentation. Another way to access the documentation is via the`help`

command — we will return to this later.

## Working with variables

In this lesson we will learn how to manipulate the inflammation dataset with MATLAB. But before we discuss how to deal with many data points, we will show how to store a single value on the computer.

We can create a new variable by
assigning a value to it using `=`

:

At first glance nothing appears to have happened! We don’t get any output in the command window because we put a semi-colon after the variable assignment: this suppresses output, which is generally a good thing because it makes code run more quickly. Let’s run the command again without the semi-colon, and this time we have some output in the command window:

### OUTPUT

```
weight_kg =
55
```

A variable is just a name for a piece of data or *value*.
Variable names must begin with a letter, and are case sensitive. They
can contain also numbers or underscores. Examples of valid variable
names are `x`

, `f_0`

or
`current_temperature`

.

Once a variable has a value, we can print it using the
`disp`

function:

### OUTPUT

` 55`

or simply typing its name, followed by `Enter`

### OUTPUT

```
weight_kg =
55
```

Storing single values is fine, but how can we store multiple values in the same variable? We can create an array using square brackets, separating each value with a comma:

### OUTPUT

```
a =
1 2 3
```

In a similar way, we can create matrices using semi-colons to separate rows:

### OUTPUT

```
b =
1 2 3
4 5 6
```

Something to bear in mind about arrays and matrices is that all
values in an array must be of the same type e.g. all numbers or all
strings. It is however possible to convert between data types
e.g. `num2str`

which converts numbers to a string
representation.

So once we have a numeric value stored in a variable, we can do arithmetic with it:

### OUTPUT

`Weight in pounds: 121`

That last command combines several new concepts, so let’s break it down:

The `disp`

function takes a single argument — the value to
print. So if we want to print more than one value on a single line, we
can print an *array* of values (i.e. one argument), which we
create using square brackets, and recall that an array must contain
values all of the same type. In this case we convert the number to a
string so that we can print an array of characters.

We can change the value of a variable by assigning it a new one:

### OUTPUT

```
weight_kg =
57.5
```

Assigning a value to one variable does not change the values of other variables.

For example, we just changed the value of `weight_kg`

from
55 to 57.5, but `weight_lb`

hasn’t changed:

### OUTPUT

```
weight_lb =
121
```

Since `weight_lb`

doesn’t “remember” where its value came
from, it isn’t automatically updated when `weight_kg`

changes. This is important to remember, and different from the way
spreadsheets work.

Now that we know how to assign values to variables, let’s view a list of all the variables in our workspace:

### OUTPUT

```
Your variables are:
a b weight_kg weight_lb
```

To remove a variable from MATLAB, use the `clear`

command:

### OUTPUT

```
Your variables are:
a b weight_kg
```

Alternatively, we can look at the **Workspace**. The
workspace contains all variable names and assigned values that we
currently work with. As long as they pop up in the workspace, they are
universally available. It’s generally a good idea to keep the workspace
as clean as possible. To remove all variables from the workspace,
execute the command `clear`

on its own.

The first two lines assign the initial values to the variables, so
mass = 47.5 and age = 122. The next line evaluates
`mass * 2.0`

i.e. `47.5 * 2.0 = 95`

, *then*
assigns the result to the variable `mass`

. The last line
evaulates `age - 20`

i.e. `122 - 20`

,
*then* assigns the result to the variable `age`

. So
the final values are mass = 95, and age = 102.

The key point to understand here is that the expression to the right
of the `=`

sign is evaluated first, and the result is then
assigned to the variable specified to the left of the `=`

sign.

## Good practices for project organisation

Before we get started, let’s create some directories to help organise this project.

### Tip: Good Enough Practices for Scientific Computing

Good Enough Practices for Scientific Computing is a paper written by researchers involved with the Carpentries, which covers basic workflow skills for research computing. It recommends the following for project organization:

- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the
`doc`

directory. - Put raw data and metadata in the
`data`

directory, and files generated during clean-up and analysis in a`results`

directory. - Put source code for the project in the
`src`

directory, and programs brought in from elsewhere or compiled locally in the`bin`

directory. - Name all files to reflect their content or function.

We already have a `data`

directory in our
`matlab-novice-inflammation`

project directory, so we only
need to create a `results`

directory for this project. You
can use your computer’s file browser to create this directory. We’ll
save all our scripts and function files in the main project
directory.

A final step is to set the *current folder* in MATLAB to our
project folder. Use the **Current Folder** window in the
MATLAB GUI to browse to your project folder
(`matlab-novice-inflammation`

).

In order to check the current directory, we can run `pwd`

(print working directory). A second check we can do is to run the
`ls`

(list) command in the Command Window to list the
contents of the working directory — we should get the following
output:

### OUTPUT

`data results`

Reading data from files and writing data to them are essential tasks in scientific computing, and admittedly, something that we’d rather not spend a lot of time thinking about. Fortunately, MATLAB comes with a number of high-level tools to do these things efficiently, sparing us the grisly detail.

If we know what our data looks like (in this case, we have a matrix
stored as comma-separated values) and we’re unsure about what command we
want to use, we can search the documentation. Type
`read matrix`

into the documentation toolbar. MATLAB suggests
using `readmatrix`

. If we have a closer look at the
documentation, MATLAB also tells us, which in- and output arguments this
function has.

To load the data from our CSV file into MATLAB, type the following
command into the MATLAB command window, and press `Enter`:

This loads the data and assigns it to a variable,
*patient_data*. This is a good example of when to use a
semi-colon to suppress output — try re-running the command without the
semi-colon to find out why. You should see a wall of numbers printed,
which is the data from the file.

The expression `readmatrix(...)`

is a function call. Functions
generally need arguments to run.
In the case of the `readmatrix`

function, we need to provide
a single argument: the name of the file we want to read data from. This
argument needs to be a character string or string, so we put it in quotes.

Now that our data is in memory, we can start doing things with it. First, let’s find out its size:

### OUTPUT

```
ans =
60 40
```

The output tells us that the variable `patient_data`

refers to a table of values that has 60 rows and 40 columns.

MATLAB stores *all* data in the form of multi-dimensional
arrays. For example:

- Numbers, or
*scalars*are represented as two dimensional arrays with only one row and one column, as are single characters. - Lists of numbers, or
*vectors*are two dimensional as well, but the value of one of the dimensions equals one. By default vectors are row vectors, meaning they have one row and as many columns as there are elements in the vector. - Tables of numbers, or
*matrices*are arrays with more than one column and more than one row. - Even character strings, like sentences, are stored as an “array of characters”.

Normally, MATLAB arrays can’t store elements of different data types.
For instance, a MATLAB array can’t store both a `float`

and a
`char`

. To do that, you have to use a Cell
Array.

We can use the `class`

function to find out what type of
data lives inside an array:

### OUTPUT

```
ans =
'double'
```

This output tells us that `patient_data`

contains double
precision floating-point numbers. This is the default numeric data type
in MATLAB. If you want to store other numeric data types, you need to
tell MATLAB explicitly. For example, the command,

assigns the value `325`

to the name `x`

,
storing it as a 16-bit signed integer.

Content from Arrays

Last updated on 2023-04-26 | Edit this page

Estimated time: 20 minutes

## Overview

### Questions

- How can I access subsets of data?

### Objectives

- Select individual values and subsections from data.

## Array indexing

Now that we understand what kind of data can be stored in an array,
we need to learn the proper syntax for working with arrays in MATLAB. To
do this we will begin by discussing array **indexing**,
which is the method by which we can select one or more different
elements of an array. A solid understanding of array indexing will
greatly assist our ability to organize our data.

Let’s start by creating an 8-by-8 “magic” Matrix:

### OUTPUT

```
ans =
64 2 3 61 60 6 7 57
9 55 54 12 13 51 50 16
17 47 46 20 21 43 42 24
40 26 27 37 36 30 31 33
32 34 35 29 28 38 39 25
41 23 22 44 45 19 18 48
49 15 14 52 53 11 10 56
8 58 59 5 4 62 63 1
```

We want to access a single value from the matrix:

To do that, we must provide its index in parentheses:

### OUTPUT

`ans = 38`

Indices are provided as (row, column). So the index
`(5, 6)`

selects the element on the fifth row and sixth
column.

An index like `(5, 6)`

selects a single element of an
array, but we can also access sections of the matrix, or slices. To access a row of values:

we can do:

### OUTPUT

```
ans =
32 34 35 29 28 38 39 25
```

Providing `:`

as the index for a dimension selects
*all* elements along that dimension. So, the index
`(5, :)`

selects the elements on row `5`

, and
*all* columns—effectively, the entire row. We can also select
multiple rows,

### OUTPUT

```
ans =
64 2 3 61 60 6 7 57
9 55 54 12 13 51 50 16
17 47 46 20 21 43 42 24
40 26 27 37 36 30 31 33
```

and columns:

### OUTPUT

```
ans =
6 7 57
51 50 16
43 42 24
30 31 33
38 39 25
19 18 48
11 10 56
62 63 1
```

To select a submatrix,

we have to take slices in both dimensions:

### OUTPUT

```
ans =
36 30 31
28 38 39
45 19 18
```

We don’t have to take all the values in the slice—if we provide a stride. Let’s say we want to start with
row `2`

, and subsequently select every third row:

### OUTPUT

```
ans =
9 55 54 12 13 51 50 16
32 34 35 29 28 38 39 25
8 58 59 5 4 62 63 1
```

And we can also select values in a “checkerboard”,

by taking appropriate strides in both dimensions:

### OUTPUT

```
ans =
2 61 6 57
26 37 30 33
15 52 11 56
```

### Slicing

A subsection of an array is called a slice. We can take slices of character strings as well:

### MATLAB

```
>> element = 'oxygen';
>> disp(['first three characters: ', element(1:3)])
>> disp(['last three characters: ', element(4:6)])
```

### OUTPUT

```
first three characters: oxy
last three characters: gen
```

What is the value of

`element(4:end)`

? What about`element(1:2:end)`

? Or`element(2:end - 1)`

?For any size array, MATLAB allows us to index with a single colon operator (

`:`

). This can have surprising effects. For instance, compare`element`

with`element(:)`

. What is`size(element)`

versus`size(element(:))`

? Finally, try using the single colon on the matrix`M`

above:`M(:)`

. What seems to be happening when we use the single colon operator for slicing?

- Exercises using slicing

### MATLAB

```
element(4:end) % Select all elements from 4th to last
ans =
'gen'
element(1:2:end) % Select every other element starting at first
ans =
'oye
element(2:end-1) % Select elements starting with 2nd, until last-but-one
ans =
'xyge'
```

- The colon operator ‘flattens’ a vector or matrix into a column
vector. The order of the elements in the resulting vector comes from
appending each column of the original array in turn. Have a look at the
order of the values in
`M(:)`

vs`M`

Content from Plotting data

Last updated on 2023-04-26 | Edit this page

Estimated time: 35 minutes

## Overview

### Questions

- How can I process and visualize my data?

### Objectives

- Perform operations on arrays of data.
- Display simple graphs.

## Analyzing patient data

Now that we know how to access data we want to compute with, we’re
ready to analyze `patient_data`

. MATLAB knows how to perform
common mathematical operations on arrays. If we want to find the average
inflammation for all patients on all days, we can just ask for the mean
of the array:

### OUTPUT

`ans = 6.1487`

We couldn’t just do `mean(patient_data)`

because, that
would compute the mean of *each column* in our table, and return
an array of mean values. The expression `patient_data(:)`

*flattens* the table into a one-dimensional array.

To get details about what a function, like `mean`

, does
and how to use it, we can search the documentation, or use MATLAB’s
`help`

command.

### OUTPUT

```
mean Average or mean value.
S = mean(X) is the mean value of the elements in X if X is a vector.
For matrices, S is a row vector containing the mean value of each
column.
For N-D arrays, S is the mean value of the elements along the first
array dimension whose size does not equal 1.
mean(X,DIM) takes the mean along the dimension DIM of X.
S = mean(...,TYPE) specifies the type in which the mean is performed,
and the type of S. Available options are:
'double' - S has class double for any input X
'native' - S has the same class as X
'default' - If X is floating point, that is double or single,
S has the same class as X. If X is not floating point,
S has class double.
S = mean(...,NANFLAG) specifies how NaN (Not-A-Number) values are
treated. The default is 'includenan':
'includenan' - the mean of a vector containing NaN values is also NaN.
'omitnan' - the mean of a vector containing NaN values is the mean
of all its non-NaN elements. If all elements are NaN,
the result is NaN.
Example:
X = [1 2 3; 3 3 6; 4 6 8; 4 7 7]
mean(X,1)
mean(X,2)
Class support for input X:
float: double, single
integer: uint8, int8, uint16, int16, uint32,
int32, uint64, int64
See also median, std, min, max, var, cov, mode.
```

We can also compute other statistics, like the maximum, minimum and standard deviation.

### MATLAB

```
>> disp('Maximum inflammation:')
>> disp(max(patient_data(:)))
>> disp('Minimum inflammation:')
>> disp(min(patient_data(:)))
>> disp('Standard deviation:')
>> disp(std(patient_data(:)))
```

### OUTPUT

```
Maximum inflammation:
20
Minimum inflammation:
0
Standard deviation:
4.6148
```

When analyzing data though, we often want to look at partial statistics, such as the maximum value per patient or the average value per day. One way to do this is to assign the data we want to a new temporary array, then ask it to do the calculation:

### MATLAB

```
>> patient_1 = patient_data(1, :)
>> disp('Maximum inflation for patient 1:')
>> disp(max(patient_1))
```

### OUTPUT

```
Maximum inflation for patient 1:
18
```

We don’t actually need to store the row in a variable of its own. Instead, we can combine the selection and the function call:

### OUTPUT

`ans = 18`

What if we need the maximum inflammation for *all* patients,
or the average for each day? As the diagram below shows, we want to
perform the operation across an axis:

To support this, MATLAB allows us to specify the *dimension*
we want to work on. If we ask for the average across the dimension 1,
we’re asking for one summary value per column, which is the average of
all the rows. In other words, we’re asking for the average inflammation
per day for all patients.

### OUTPUT

```
ans =
Columns 1 through 10
0 0.5833 0.9833 1.6667 2.5333 3.0667 3.4667 3.9000 5.2333 5.1833
Columns 11 through 20
6.2167 6.4000 7.2167 8.0500 8.8667 9.6333 10.1833 10.1500 10.4167 9.8667
Columns 21 through 30
12.2000 12.1000 11.1833 11.0333 10.0000 8.5167 8.3833 8.5167 8.1167 6.2500
Columns 31 through 40
5.7667 5.6000 5.1333 3.9000 3.7000 2.7833 2.5500 1.2833 0.9667 0.6000
```

If we average across axis 2, we’re asking for one summary value per row, which is the average of all the columns. In other words, we’re asking for the average inflammation per patient over all the days:

### OUTPUT

```
ans =
5.4500
5.4250
6.1000
5.9000
5.5500
6.2250
5.9750
6.6500
6.6250
6.5250
6.7750
5.8000
6.2250
5.7500
5.2250
6.3000
6.5500
5.7000
5.8500
6.5500
5.7750
5.8250
6.1750
6.1000
5.8000
6.4250
6.0500
6.0250
6.1750
6.5500
6.1750
6.3500
6.7250
6.1250
7.0750
5.7250
5.9250
6.1500
6.0750
5.7500
5.9750
5.7250
6.3000
5.9000
6.7500
5.9250
7.2250
6.1500
5.9500
6.2750
5.7000
6.1000
6.8250
5.9750
6.7250
5.7000
6.2500
6.4000
7.0500
5.9000
```

We can quickly check the size of this array:

### OUTPUT

```
ans =
60 1
```

The size tells us we have a 60-by-1 vector, confirming that this is the average inflammation per patient over all days in the trial.

## Plotting

The mathematician Richard Hamming once said, “The purpose of computing is insight, not numbers,” and the best way to develop insight is often to visualize data. Visualization deserves an entire lecture (or course) of its own, but we can explore a few features of MATLAB here.

Let’s display a heat map of our data:

### MATLAB

```
>> imagesc(patient_data)
>> title('Inflammation')
>> xlabel('Day of trial')
>> ylabel('Patient number')
```

The `imagesc`

function represents the matrix as a color
image. Every value in the matrix is *mapped* to a color. Blue
regions in this heat map are low values, while yellow shows high values.
As we can see, inflammation rises and falls over a 40 day period.

It’s good practice to give the figure a `title`

, and to
label the axes using `xlabel`

and `ylabel`

so that
other people can understand what it shows (including us if we return to
this plot 6 months from now).

Let’s take a look at the average inflammation over time:

### MATLAB

```
>> plot(mean(patient_data, 1))
>> title('Daily average inflammation')
>> xlabel('Day of trial')
>> ylabel('Inflammation')
```

Here, we have calculated the average per day across all patients then
used the `plot`

function to display a line graph of those
values. The result is roughly a linear rise and fall, which is
suspicious: based on other studies, we expect a sharper rise and slower
fall. Let’s have a look at two other statistics: the maximum and minimum
inflammation per day across all patients.

### MATLAB

```
>> plot(max(patient_data, [], 1))
>> title('Maximum inflammation per day')
>> ylabel('Inflammation')
>> xlabel('Day of trial')
```

### MATLAB

```
>> plot(min(patient_data, [], 1))
>> title('Minimum inflammation per day')
>> ylabel('Inflammation')
>> xlabel('Day of trial')
```

Like `mean()`

, the functions `max()`

and
`min()`

can also operate across a specified dimension of the
matrix. However, the syntax is slightly different. To see why, run a
`help`

on each of these functions.

From the figures, we see that the maximum value rises and falls perfectly smoothly, while the minimum seems to be a step function. Neither result seems particularly likely, so either there’s a mistake in our calculations or something is wrong with our data.

The x-values are the indices of the y-data, so the first y-value is plotted against index 1, the second y-value against 2 etc.

MATLAB interpolates between the points on a 2D line plot.

It is often convenient to combine multiple plots into one figure
using the `subplot`

command which plots our graphs in a grid
pattern. The first two parameters describe the grid we want to use,
while the third parameter indicates the placement on the grid.

### MATLAB

```
>> subplot(1, 2, 1)
>> plot(max(patient_data, [], 1))
>> ylabel('max')
>> xlabel('day')
>> subplot(1, 2, 2)
>> plot(min(patient_data, [], 1))
>> ylabel('min')
>> xlabel('day')
```

Our work so far has convinced us that something is wrong with our first data file. We would like to check the other 11 the same way, but typing in the same commands repeatedly is tedious and error-prone. Since computers don’t get bored (that we know of), we should create a way to do a complete analysis with a single command, and then figure out how to repeat that step once for each file. These operations are the subjects of the next two lessons.

Content from Writing MATLAB Scripts

Last updated on 2023-04-26 | Edit this page

Estimated time: 35 minutes

## Overview

### Questions

- How can I save and re-use my programs?

### Objectives

- Write and save MATLAB scripts.
- Save MATLAB plots to disk.

So far, we’ve typed in commands one-by-one on the command line to get MATLAB to do things for us. But what if we want to repeat our analysis? Sure, it’s only a handful of commands, and typing them in shouldn’t take us more than a few minutes. But if we forget a step or make a mistake, we’ll waste time rewriting commands. Also, we’ll quickly find ourselves doing more complex analyses, and we’ll need our results to be more easily reproducible.

In addition to running MATLAB commands one-by-one on the command
line, we can also write several commands in a *script*. A MATLAB
script is just a text file with a `.m`

extension. We’ve
written commands to load data from a `.csv`

file and display
some plots of statistics about that data. Let’s put those commands in a
script called `plot_patient1.m`

, which we’ll save in our
current directory,`matlab-novice-inflammation`

.

To create a new script in the current directory, we use

then we type the contents of the script:

### MATLAB

```
patient_data = readmatrix('data/inflammation-01.csv');
% Plot average inflammation per day
figure
plot(mean(patient_data, 1))
title('Daily average inflammation')
xlabel('Day of trial')
ylabel('Inflammation')
```

Note that we are explicitly creating a new figure window using the
`figure`

command.

Try this on the command line:

MATLAB’s plotting commands only create a new figure window if one doesn’t already exist: the default behaviour is to reuse the current figure window as we saw in the previous episode. Explicitly creating a new figure window in the script avoids any unexpected results from plotting on top of existing figures.

You can get MATLAB to run the commands in the script by typing in the
name of the script (without the `.m`

) in the MATLAB command
line:

### The MATLAB path

MATLAB knows about files in the current directory, but if we want to
run a script saved in a different location, we need to make sure that
this file is visible to MATLAB. We do this by adding directories to the
MATLAB **path**. The *path* is a list of directories
MATLAB will search through to locate files.

To add a directory to the MATLAB path, we go to the `Home`

tab, click on `Set Path`

, and then on
`Add with Subfolders...`

. We navigate to the directory and
add it to the path to tell MATLAB where to look for our files. When you
refer to a file (either code or data), MATLAB will search all the
directories in the path to find it. Alternatively, for data files, we
can provide the relative or absolute file path.

In this script, let’s save the figures to disk as image files using
the `print`

command. In order to maintain an organised
project we’ll save the images in the `results`

directory:

### MATLAB

```
% Plot average inflammation per day
figure
plot(mean(patient_data, 1))
title('Daily average inflammation')
xlabel('Day of trial')
ylabel('Inflammation')
% Save plot in 'results' folder as png image:
print('results/average','-dpng')
```

### Help text

You might have noticed that we described what we want our code to do
using the percent sign: `%`

. This is another plus of writing
scripts: you can comment your code to make it easier to understand when
you come back to it after a while.

A comment can appear on any line, but be aware that the first line or
block of comments in a script or function is used by MATLAB as the
**help text**. When we use the `help`

command,
MATLAB returns the *help text*. The first help text line (known
as the **H1 line**) typically includes the name of the
program, and a brief description. The `help`

command works in
just the same way for our own programs as for built-in MATLAB functions.
You should write help text for all of your own scripts and
functions.

Let’s write an H1 line at the top of our script:

We can then get help for our script by running

Let’s modify our `plot_patient1`

script so that it creates
and saves sub-plots, rather than individual plots. As before we’ll save
the images in the `results`

directory.

### MATLAB

```
%PLOT_PATIENT1 Save plots of inflammation statistics to disk.
patient_data = readmatrix('data/inflammation-01.csv');
% Plot inflammation stats for first patient
figure
subplot(1, 3, 1)
plot(mean(patient_data, 1))
title('Average')
ylabel('Inflammation')
xlabel('Day')
subplot(1, 3, 2)
plot(max(patient_data, [], 1))
title('Max')
ylabel('Inflammation')
xlabel('Day')
subplot(1, 3, 3)
plot(min(patient_data, [], 1))
title('Min')
ylabel('Inflammation')
xlabel('Day')
% Save plot in 'results' directory as png image.
print('results/inflammation-01','-dpng')
```

When saving plots to disk, it’s sometimes useful to turn off their visibility as MATLAB plots them. For example, we might not want to view (or spend time closing) the figures in MATLAB, and not displaying the figures could make the script run faster.

Let’s add a couple of lines of code to do this:

### MATLAB

```
%PLOT_PATIENT1 Save plots of inflammation statistics to disk.
patient_data = readmatrix('data/inflammation-01.csv');
% Plot inflammation stats for first patient
figure('visible', 'off')
subplot(1, 3, 1)
plot(mean(patient_data, 1))
title('Average')
ylabel('inflammation')
xlabel('Day')
subplot(1, 3, 2)
plot(max(patient_data, [], 1))
title('Max')
ylabel('Inflammation')
xlabel('Day')
subplot(1, 3, 3)
plot(min(patient_data, [], 1))
title('Min')
ylabel('Inflammation')
xlabel('Day')
% Save plot in 'results' directory as png image.
print('results/inflammation-01','-dpng')
close()
```

We can ask MATLAB to create an empty figure window without displaying
it by setting its `'visible'`

property to `'off'`

,
like so:

When we do this, we have to be careful to manually “close” the figure after we are doing plotting on it - the same as we would “close” an actual figure window if it were open:

Content from Repeating With Loops

Last updated on 2023-04-26 | Edit this page

Estimated time: 50 minutes

## Overview

### Questions

- How can I repeat the same operations on multiple values?

### Objectives

- Explain what a for loop does.
- Correctly write for loops that repeat simple commands.
- Trace changes to a loop variable as the loops runs.
- Use a for loop to process multiple files

Recall that we have to do this analysis for every one of our dozen
datasets, and we need a better way than typing out commands for each
one, because we’ll find ourselves writing a lot of duplicate code.
Remember, code that is repeated in two or more places will eventually be
wrong in at least one. Also, if we make changes in the way we analyze
our datasets, we have to introduce that change in every copy of our
code. To avoid all of this repetition, we have to teach MATLAB to repeat
our commands, and to do *that*, we have to learn how to write
*loops*.

Suppose we want to print each character in the word “lead” on a line
of its own. One way is to use four `disp`

statements:

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
word = 'lead';
disp(word(1))
disp(word(2))
disp(word(3))
disp(word(4))
```

### OUTPUT

```
l
e
a
d
```

But this is a bad approach for two reasons:

It doesn’t scale: if we want to print the characters in a string that’s hundreds of letters long, we’d be better off typing them in.

It’s fragile: if we change

`word`

to a longer string, it only prints part of the data, and if we change it to a shorter one, it produces an error, because we’re asking for characters that don’t exist.

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
word = 'tin';
disp(word(1))
disp(word(2))
disp(word(3))
disp(word(4))
```

### ERROR

`error: A(I): index out of bounds; value 4 out of bound 3`

There’s a better approach:

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
word = 'lead';
for letter = 1:4
disp(word(letter))
end
```

### OUTPUT

```
l
e
a
d
```

This improved version uses a for loop to repeat an operation—in this case, printing to the screen—once for each element in an array.

The general form of a for loop is:

```
for variable = collection
do things with variable
end
```

The for loop executes the commands in the loop body for every value in the
array `collection`

. This value is called the loop variable, and we can call
it whatever we like. In our example, we gave it the name
`letter`

.

We have to terminate the loop body with the `end`

keyword,
and we can have as many commands as we like in the loop body. But, we
have to remember that they will all be repeated as many times as there
are values in `collection`

.

Our for loop has made our code more scalable, and less fragile. There’s still one little thing about it that should bother us. For our loop to deal appropriately with shorter or longer words, we have to change the first line of our loop by hand:

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
word = 'tin';
for letter = 1:3
disp(word(letter))
end
```

### OUTPUT

```
t
i
n
```

Although this works, it’s not the best way to write our loop:

We might update

`word`

and forget to modify the loop to reflect that change.We might make a mistake while counting the number of letters in

`word`

.

Fortunately, MATLAB provides us with a convenient function to write a better loop:

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
word = 'aluminum';
for letter = 1:length(word)
disp(word(letter))
end
```

### OUTPUT

```
a
l
u
m
i
n
u
m
```

This is much more robust code, as it can deal identically with words of arbitrary length. Loops are not only for working with strings, they allow us to do repetitive calculations regardless of data type. Here’s another loop that calculates the sum of all even numbers between 1 and 10:

### MATLAB

```
%LOOP_DEMO Demo script to explain loops
total = 0;
for even_number = 2 : 2 : 10
total = total + even_number;
end
disp('The sum of all even numbers between 1 and 10 is:')
disp(total)
```

It’s worth tracing the execution of this little program step by step.

### The debugger

We can use the MATLAB debugger to trace the execution of a program.

The first step is to set a **break point** by clicking
just to the right of a line number on the `-`

symbol. A red
circle will appear — this is the break point, and when we run the
script, MATLAB will pause execution at that line.

A green arrow appears, pointing to the next line to be run. To
continue running the program one line at a time, we use the
`step`

button.

We can then inspect variables in the workspace or by hovering the
cursor over where they appear in the code, or get MATLAB to evaluate
expressions in the command window (notice the prompt changes to
`K>>`

).

This process is useful to check your understanding of a program, in order to correct mistakes.

This process is illustrated below:

Since we want to sum only even numbers, the loop index
`even_number`

starts at 2 and increases by 2 with every
iteration. When we enter the loop, `total`

is zero - the
value assigned to it beforehand. The first time through, the loop body
adds the value of the first even number (2) to the old value of
`total`

(0), and updates `total`

to refer to that
new value. On the next loop iteration, `even_number`

is 4 and
the initial value of `total`

is 2, so the new value assigned
to `total`

is 6. After `even_number`

reaches the
final value (10), `total`

is 30; since this is the end of the
range for `even_number`

the loop finishes and the
`disp`

statements give us the final answer.

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:

### OUTPUT

`10`

### Performing Exponentiation

MATLAB uses the caret (`^`

) to perform exponentiation:

### OUTPUT

`125`

You can also use a loop to perform exponentiation. Remember that
`b^x`

is just `b*b*b*`

… `x`

times.

Let a variable `b`

be the base of the number and
`x`

the exponent. Write a loop to compute `b^x`

.
Check your result for `b = 4`

and `x = 5`

.

### Looping in Reverse

In MATLAB, the colon operator (`:`

) accepts a stride or skip argument between the
start and stop:

### OUTPUT

`1 4 7 10`

### OUTPUT

`11 8 5 2`

Using this, write a loop to print the letters of “aluminum” in reverse order, one letter per line.

### OUTPUT

```
m
u
n
i
m
u
l
a
```

## Analyzing patient data from multiple files

We now have almost everything we need to process multiple data files
using a loop and the plotting code in our `plot_patient1`

script.

We still need to generate a list of data files to process, and then we can use a loop to repeat the analysis for each file.

We can use the `dir`

command to return a **structure
array** containing the names of the files in the
`data`

directory. Each element in this *structure
array* is a **structure**, containing information about
a single file in the form of named **fields**.

### OUTPUT

```
files =
12×1 struct array with fields:
name
folder
date
bytes
isdir
datenum
```

To access the *name* field of the first file, we can use the
following syntax:

### OUTPUT

`inflammation-01.csv`

To get the modification date of the third file, we can do:

### OUTPUT

`26-Jul-2015 22:24:31`

A good first step towards processing multiple files is to write a
loop which prints the name of each of our files. Let’s write this in a
script `plot_all.m`

which we will then develop further:

### MATLAB

```
%PLOT_ALL Developing code to automate inflammation analysis
files = dir('data/inflammation-*.csv');
for i = 1:length(files)
file_name = files(i).name;
disp(file_name)
end
```

### OUTPUT

```
inflammation-01.csv
inflammation-02.csv
inflammation-03.csv
inflammation-04.csv
inflammation-05.csv
inflammation-06.csv
inflammation-07.csv
inflammation-08.csv
inflammation-09.csv
inflammation-10.csv
inflammation-11.csv
inflammation-12.csv
```

Another task is to generate the file names for the figures we’re
going to save. Let’s name the output file after the data file used to
generate the figure. So for the data set
`inflammation-01.csv`

we will call the figure
`inflammation-01.png`

. We can use the `replace`

command for this purpose.

The syntax for the `replace`

command is like this:

So for example if we have the string `big_shark`

and want
to get the string `little_shark`

, we can execute the
following command:

### OUTPUT

`little_shark`

Recall that we’re saving our figures to the `results`

directory. The best way to generate a path to a file in MATLAB is by
using the `fullfile`

command. This generates a file path with
the correct separators for the platform you’re using (i.e. forward slash
for Linux and macOS, and backslash for Windows). This makes your code
more portable which is great for collaboration.

Putting these concepts together, we can now generate the paths for the data files, and the image files we want to save:

### MATLAB

```
%PLOT_ALL Developing code to automate inflammation analysis
files = dir('data/inflammation-*.csv');
for i = 1:length(files)
file_name = files(i).name;
% Generate string for image name
img_name = replace(file_name, '.csv', '.png');
% Generate path to data file and image file
file_name = fullfile('data', file_name);
img_name = fullfile('results',img_name);
disp(file_name)
disp(img_name)
end
```

### OUTPUT

```
data/inflammation-01.csv
results/inflammation-01.png
data/inflammation-02.csv
results/inflammation-02.png
data/inflammation-03.csv
results/inflammation-03.png
data/inflammation-04.csv
results/inflammation-04.png
data/inflammation-05.csv
results/inflammation-05.png
data/inflammation-06.csv
results/inflammation-06.png
data/inflammation-07.csv
results/inflammation-07.png
data/inflammation-08.csv
results/inflammation-08.png
data/inflammation-09.csv
results/inflammation-09.png
data/inflammation-10.csv
results/inflammation-10.png
data/inflammation-11.csv
results/inflammation-11.png
data/inflammation-12.csv
results/inflammation-12.png
```

We’re now ready to modify `plot_all.m`

to actually process
multiple data files:

### MATLAB

```
%PLOT_ALL Print statistics for all patients.
% Save plots of statistics to disk.
files = dir('data/inflammation-*.csv');
% Process each file in turn
for i = 1:length(files)
file_name = files(i).name;
% Generate strings for image names:
img_name = replace(file_name, '.csv', '.png');
% Generate path to data file and image file
file_name = fullfile('data', file_name);
img_name = fullfile('results', img_name);
patient_data = readmatrix(file_name);
% Create figures
figure('visible', 'off')
subplot(2, 2, 1)
plot(mean(patient_data, 1))
title('Average')
ylabel('Inflammation')
xlabel('Day')
subplot(2, 2, 2)
plot(max(patient_data, [], 1))
title('Max')
ylabel('Inflammation')
xlabel('Day')
subplot(2, 2, 3)
plot(min(patient_data, [], 1))
title('Min')
ylabel('Inflammation')
xlabel('Day')
print(img_name, '-dpng')
close()
end
```

We run the modified script using its name in the Command Window:

The first three figures output to the `results`

directory
are as shown below:

Sure enough, the maxima of these data sets show exactly the same ramp as the first, and their minima show the same staircase structure.

We’ve now automated the analysis and have confirmed that all the data files we have looked at show the same artifact. This is what we set out to test, and now we can just call one script to do it. With minor modifications, this script could be re-used to check all our future data files.

Content from Making Choices

Last updated on 2023-04-26 | Edit this page

Estimated time: 40 minutes

## Overview

### Questions

- How can programs do different things for different data values?

### Objectives

- Construct a conditional statement using if, elseif, and else
- Test for equality within a conditional statement
- Combine conditional tests using AND and OR
- Build a nested loop

Our previous lessons have shown us how to manipulate data and repeat things. However, the programs we have written so far always do the same things, regardless of what data they’re given. We want programs to make choices based on the values they are manipulating.

The tool that MATLAB gives us for doing this is called a conditional statement, and it looks like this:

### OUTPUT

```
not greater
done
```

The second line of this code uses the keyword `if`

to tell
MATLAB that we want to make a choice. If the test that follows is true,
the body of the `if`

(i.e., the lines between `if`

and `else`

) are executed. If the test is false, the body of
the `else`

(i.e., the lines between `else`

and
`end`

) are executed instead. Only one or the other is ever
executed.

Conditional statements don’t have to have an `else`

block.
If there isn’t one, MATLAB simply doesn’t do anything if the test is
false:

### MATLAB

```
num = 53;
disp('before conditional...')
if num > 100
disp('53 is greater than 100')
end
disp('...after conditional')
```

### OUTPUT

```
before conditional...
...after conditional
```

We can also chain several tests together using `elseif`

.
This makes it simple to write a script that gives the sign of a
number:

### MATLAB

```
%CONDITIONAL_DEMO Demo script to illustrate use of conditionals
num = 53;
if num > 0
disp('num is positive')
elseif num == 0
disp('num is zero')
else
disp('num is negative')
end
```

One important thing to notice in the code above is that we use a
double equals sign `==`

to test for equality rather than a
single equals sign. This is because the latter is used to mean
assignment. In our test, we want to check for the equality of
`num`

and `0`

, not *assign* 0 to
`num`

. This convention was inherited from C, and it does take
a bit of getting used to…

During a conditional statement, if one of the conditions is true, this marks the end of the test: no subsequent conditions will be tested and execution jumps to the end of the conditional.

Let’s demonstrate this by adding another condition which is true.

### MATLAB

```
% Demo script to illustrate use of conditionals
num = 53;
if num > 0
disp('num is positive')
elseif num == 0
disp('num is zero')
elseif num > 50
% This block will never be executed
disp('num is greater than 50')
else
disp('num is negative')
end
```

We can also combine tests, using `&&`

(and) and
`||`

(or). `&&`

is true if both tests are
true:

### OUTPUT

`one part is not true`

`||`

is true if either test is true:

### OUTPUT

`at least one part is true`

In this case, “either” means “either or both”, not “either one or the other but not both”.

### True and False Statements

The conditions we have tested above evaluate to a logical value:
`true`

or `false`

. However these numerical
comparison tests aren’t the only values which are `true`

or
`false`

in MATLAB. For example, `1`

is considered
`true`

and `0`

is considered `false`

.
In fact, *any* value can be used in a conditional statement.

Run the code below in order to discover which values are considered
`true`

and which are considered `false`

.

### MATLAB

```
if ''
disp('empty string is true')
else
disp('empty string is false')
end
if 'foo'
disp('non empty string is true')
else
disp('non empty string is false')
end
if []
disp('empty array is true')
else
disp('empty array is false')
end
if [22.5, 1.0]
disp('non empty array is true')
else
disp('non empty array is false')
end
if [0, 0]
disp('array of zeros is true')
else
disp('array of zeros is false')
end
if true
disp('true is true')
else
disp('true is false')
end
```

Another thing to realize is that `if`

statements can also
be combined with loops. For example, if we want to sum the positive
numbers in a list, we can write this:

### MATLAB

```
numbers = [-5, 3, 2, -1, 9, 6];
total = 0;
for n = numbers
if n >= 0
total = total + n;
end
end
disp(['sum of positive values: ', num2str(total)])
```

### OUTPUT

`sum of positive values: 20`

With a little extra effort, we can calculate the positive and negative sums in a loop:

### MATLAB

```
pos_total = 0;
neg_total = 0;
for n = numbers
if n >= 0
pos_total = pos_total + n;
else
neg_total = neg_total + n;
end
end
disp(['sum of positive values: ', num2str(pos_total)])
disp(['sum of negative values: ', num2str(neg_total)])
```

### OUTPUT

```
sum of positive values: 26
sum of negative values: -6
```

We can even put one loop inside another:

### OUTPUT

```
1a
1b
2a
2b
3a
3b
```

Currently, our script `plot_all.m`

reads in data, analyzes
it, and saves plots of the results. If we would rather display the plots
interactively, we would have to remove (or *comment out*) the
following code:

And, we’d also have to change this line of code, from:

to:

This is not a lot of code to change every time, but it’s still work
that’s easily avoided using conditionals. Here’s our script re-written
to use *conditionals* to switch between saving plots as images
and plotting them interactively:

### MATLAB

```
%PLOT_ALL Save plots of statistics to disk.
% Use variable plot_switch to control interactive plotting
% vs saving images to disk.
% plot_switch = 0: show plots interactively
% plot_switch = 1: save plots to disk
plot_switch = 0;
files = dir('data/inflammation-*.csv');
% Process each file in turn
for i = 1:length(files)
file_name = files(i).name;
% Generate strings for image names:
img_name = replace(file_name, '.csv', '.png');
% Generate path to data file and image file
file_name = fullfile('data', filename);
img_name = fullfile('results', img_name);
patient_data = readmatrix(file_name);
% Create figures
if plot_switch == 1
figure('visible', 'off')
else
figure('visible', 'on')
end
subplot(2, 2, 1)
plot(mean(patient_data, 1))
title('Average')
ylabel('Inflammation')
xlabel('Day')
subplot(2, 2, 2)
plot(max(patient_data, [], 1))
title('Max')
ylabel('Inflammation')
xlabel('Day')
subplot(2, 2, 3)
plot(min(patient_data, [], 1))
title('Min')
ylabel('Inflammation')
xlabel('Day')
if plot_switch == 1
print(img_name, '-dpng')
close()
end
end
```

Content from Creating Functions

Last updated on 2023-04-26 | Edit this page

Estimated time: 65 minutes

## Overview

### Questions

- How can I teach MATLAB how to do new things?

### Objectives

- Compare and contrast MATLAB function files with MATLAB scripts.
- Define a function that takes arguments.
- Test a function.
- Recognize why we should divide programs into small, single-purpose functions.

If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot some simple statistics. But we have twelve files to check, and may have more in future. In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.

Let’s start by defining a function `fahr_to_kelvin`

that
converts temperatures from Fahrenheit to Kelvin:

### MATLAB

```
function ktemp = fahr_to_kelvin(ftemp)
%FAHR_TO_KELVIN Convert Fahrenheit to Kelvin
ktemp = ((ftemp - 32) * (5/9)) + 273.15;
end
```

A MATLAB function *must* be saved in a text file with a
`.m`

extension. The name of that file must be the same as the
function defined inside it. The name must start with a letter and cannot
contain spaces. So, you will need to save the above code in a file
called `fahr_to_kelvin.m`

. Remember to save your m-files in
the current directory.

The first line of our function is called the *function
definition*, and it declares that we’re writing a function named
`fahr_to_kelvin`

, that has a single input
parameter,`ftemp`

, and a single output parameter,
`ktemp`

. Anything following the function definition line is
called the *body* of the function. The keyword `end`

marks the end of the function body, and the function won’t know about
any code after `end`

.

A function can have multiple input and output parameters if required, but isn’t required to have any of either. The general form of a function is shown in the pseudo-code below:

### MATLAB

```
function [out1, out2] = function_name(in1, in2)
%FUNCTION_NAME Function description
% This section below is called the body of the function
out1 = something calculated;
out2 = something else;
end
```

Just as we saw with scripts, functions must be *visible* to
MATLAB, i.e., a file containing a function has to be placed in a
directory that MATLAB knows about. The most convenient of those
directories is the current working directory.

We can call our function from the command line like any other MATLAB function:

### OUTPUT

`ans = 273.15`

When we pass a value, like `32`

, to the function, the
value is assigned to the variable `ftemp`

so that it can be
used inside the function. If we want to return a value from the
function, we must assign that value to a variable named
`ktemp`

-–in the first line of our function, we promised that
the output of our function would be named `ktemp`

.

Outside of the function, the variables `ftemp`

and
`ktemp`

aren’t visible; they are only used by the function
body to refer to the input and output values.

This is one of the major differences between scripts and functions: a script can be thought of as automating the command line, with full access to all variables in the base workspace, whereas a function can only read and write variables from the calling workspace if they are passed as arguments — i.e. a function has its own separate workspace.

Now that we’ve seen how to convert Fahrenheit to Kelvin, it’s easy to convert Kelvin to Celsius.

### MATLAB

```
function ctemp = kelvin_to_celsius(ktemp)
%KELVIN_TO_CELSIUS Convert from Kelvin to Celcius
ctemp = ktemp - 273.15;
end
```

Again, we can call this function like any other:

### OUTPUT

`ans = -273.15`

What about converting Fahrenheit to Celsius? We could write out the formula, but we don’t need to. Instead, we can compose the two functions we have already created:

### MATLAB

```
function ctemp = fahr_to_celsius(ftemp)
%FAHR_TO_CELSIUS Convert Fahrenheit to Celcius
ktemp = fahr_to_kelvin(ftemp);
ctemp = kelvin_to_celsius(ktemp);
end
```

Calling this function,

we get, as expected:

### OUTPUT

`ans = 0`

This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-larger chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here—typically half a dozen to a few dozen lines—but they shouldn’t ever be much longer than that, or the next person who reads it won’t be able to understand what’s going on.

### OUTPUT

```
259.8167
278.1500
273.1500
0
```

`ktemp`

is 0 because the function
`fahr_to_kelvin`

has no knowledge of the variable
`ktemp`

which exists outside of the function.

Once we start putting things in functions so that we can re-use them, we need to start testing that those functions are working correctly. To see how to do this, let’s write a function to center a dataset around a particular value:

We could test this on our actual data, but since we don’t know what the values ought to be, it will be hard to tell if the result was correct, Instead, let’s create a matrix of 0’s, and then center that around 3:

### OUTPUT

```
ans =
3 3
3 3
```

That looks right, so let’s try out `center`

function on
our real data:

It’s hard to tell from the default output whether the result is correct–this is often the case when working with fairly large arrays–but, there are a few simple tests that will reassure us.

Let’s calculate some simple statistics:

### OUTPUT

`0.00000 6.14875 20.00000`

And let’s do the same after applying our `center`

function
to the data:

### OUTPUT

` -6.1487 -0.0000 13.8513`

That seems almost right: the original mean was about 6.1, so the lower bound from zero is now about -6.1. The mean of the centered data isn’t quite zero–we’ll explore why not in the challenges–but it’s pretty close. We can even go further and check that the standard deviation hasn’t changed:

### OUTPUT

`5.3291e-15`

The difference is very small. It’s still possible that our function is wrong, but it seems unlikely enough that we should probably get back to doing our analysis. We have one more task first, though: we should write some documentation for our function to remind ourselves later what it’s for and how to use it.

### MATLAB

```
function out = center(data, desired)
%CENTER Center data around a desired value.
%
% center(DATA, DESIRED)
%
% Returns a new array containing the values in
% DATA centered around the value.
out = (data - mean(data(:))) + desired;
end
```

Comment lines immediately below the function definition line are
called “help text”. Typing `help function_name`

brings up the
help text for that function:

### OUTPUT

```
Center Center data around a desired value.
center(DATA, DESIRED)
Returns a new array containing the values in
DATA centered around the value.
```

### Testing a Function

Write a function called

`normalise`

that takes an array as input and returns an array of the same shape with its values scaled to lie in the range 0.0 to 1.0. (If L and H are the lowest and highest values in the input array, respectively, then the function should map a value v to (v - L)/(H - L).) Be sure to give the function a comment block explaining its use.Run

`help linspace`

to see how to use`linspace`

to generate regularly-spaced values. Use arrays like this to test your`normalise`

function.

```
```
function out = normalise(in)
%NORMALISE Return original array, normalised so that the
% new values lie in the range 0 to 1.
H = max(max(in));
L = min(min(in));
out = (in-L)/(H-L);
end
```
{: .language-matlab}
```

```
```
a = linspace(1, 10); % Create evenly-spaced vector
norm_a = normalise(a); % Normalise vector
plot(a, norm_a) % Visually check normalisation
```
{: .language-matlab}
```

### Convert a script into a function

Write a function called `plot_dataset`

which plots the
three summary graphs (max, min, std) for a given inflammation data
file.

The function should operate on a single data file, and should have
two parameters: `file_name`

and `plot_switch`

.
When called, the function should create the three graphs produced in the
previous lesson. Whether they are displayed or saved to the
`results`

directory should be controlled by the value of
`plot_switch`

i.e. `plot_dataset('data/inflammation-01.csv', 0)`

should
display the corresponding graphs for the first data set;
`plot_dataset('data/inflammation-02.csv', 1)`

should save the
figures for the second dataset to the `results`

directory.

You should mostly be reusing code from the `plot_all`

script.

Be sure to give your function help text.

### MATLAB

```
function plot_dataset(file_name, plot_switch)
%PLOT_DATASET Perform analysis for named data file.
% Create figures to show average, max and min inflammation.
% Display plots in GUI using plot_switch = 0,
% or save to disk using plot_switch = 1.
%
% Example:
% plot_dataset('data/inflammation-01.csv', 0)
% Generate string for image name:
img_name = replace(file_name, '.csv', '.png');
img_name = replace(img_name, 'data', 'results');
patient_data = readmatrix(file_name);
if plot_switch == 1
figure('visible', 'off')
else
figure('visible', 'on')
end
subplot(2, 2, 1)
plot(mean(patient_data, 1))
ylabel('average')
subplot(2, 2, 2)
plot(max(patient_data, [], 1))
ylabel('max')
subplot(2, 2, 3)
plot(min(patient_data, [], 1))
ylabel('min')
if plot_switch == 1
print(img_name, '-dpng')
close()
end
end
```

### MATLAB

```
%PLOT_ALL Analyse all inflammation datasets
% Create figures to show average, max and min inflammation.
% Save figures to 'results' directory.
files = dir('data/inflammation-*.csv');
for i = 1:length(files)
file_name = files(i).name;
file_name = fullfile('data', file_name);
% Process each data set, saving figures to disk.
plot_dataset(file_name, 1);
end
```

We have now solved our original problem: we can analyze any number of data files with a single command. More importantly, we have met two of the most important ideas in programming:

Use arrays to store related values, and loops to repeat operations on them.

Use functions to make code easier to re-use and easier to understand.

Content from Defensive Programming

Last updated on 2023-04-26 | Edit this page

Estimated time: 30 minutes

## Overview

### Questions

- How can I make sure my programs are correct?

### Objectives

- Explain what an assertion is.
- Add assertions to programs that correctly check the program’s state.
- Correctly add precondition and postcondition assertions to functions.
- Explain what test-driven development is, and use it when creating new functions.
- Explain why variables should be initialized using actual data values rather than arbitrary constants.
- Debug code containing an error systematically.

Our previous lessons have introduced the basic tools of programming:
variables and lists, file I/O, loops, conditionals, and functions. What
they *haven’t* done is show us how to tell whether a program is
getting the right answer, and how to tell if it’s *still* getting
the right answer as we make changes to it.

To achieve that, we need to:

- write programs that check their own operation,
- write and run tests for widely-used functions, and
- make sure we know what “correct” actually means.

The good news is, doing these things will speed up our programming, not slow it down.

The first step toward getting the right answers from our programs is
to assume that mistakes *will* happen and to guard against them.
This is called defensive
programming, and the most common way to do it is to add assertions to our code so that it
checks itself as it runs. An assertion is simply a statement that
something must be true at a certain point in a program. When MATLAB sees
one, it checks the assertion’s condition. If it’s true, MATLAB does
nothing, but if it’s false, MATLAB halts the program immediately and
prints an error message. For example, this piece of code halts as soon
as the loop encounters a value that isn’t positive:

### MATLAB

```
numbers = [1.5, 2.3, 0.7, -0.001, 4.4];
total = 0;
for n = numbers
assert(n >= 0)
total = total + n;
end
```

### OUTPUT

`error: assert (n >= 0) failed`

Programs like the Firefox browser are full of assertions: 10-20% of the code they contain are there to check that the other 80-90% are working correctly. Broadly speaking, assertions fall into three categories:

- A precondition is something that must be true at the start of a function in order for it to work correctly.
- A postcondition is something that the function guarantees is true when it finishes.
- An invariant is something that is always true at a particular point inside a piece of code.

For example, suppose we are representing rectangles using an array of
four coordinates `(x0, y0, x1, y1)`

, such that (x0,y0) are
the bottom left coordinates, and (x1,y1) are the top right coordinates.
In order to do some calculations, we need to normalize the rectangle so
that it is at the origin, measures 1.0 units on its longest axis, and is
oriented so the longest axis is the y axis. Here is a function that does
that, but checks that its input is correctly formatted and that its
result makes sense:

### MATLAB

```
function normalized = normalize_rectangle(rect)
%NORMALIZE_RECTANGLE
% Normalizes a rectangle so that it is at the origin
% measures 1.0 units on its longest axis
% and is oriented with the longest axis in the y direction:
assert(length(rect) == 4, 'Rectangle must contain 4 coordinates')
x0 = rect(1);
y0 = rect(2);
x1 = rect(3);
y1 = rect(4);
assert(x0 < x1, 'Invalid X coordinates')
assert(y0 < y1, 'Invalid Y coordinates')
dx = x1 - x0;
dy = y1 - y0;
if dx > dy
scaled = dx/dy;
upper_x = scaled;
upper_y = 1.0;
else
scaled = dx/dy;
upper_x = scaled;
upper_y = 1.0;
end
assert ((0 < upper_x) && (upper_x <= 1.0), 'Calculated upper X coordinate invalid');
assert ((0 < upper_y) && (upper_y <= 1.0), 'Calculated upper Y coordinate invalid');
normalized = [0, 0, upper_x, upper_y];
end
```

The three preconditions catch invalid inputs:

### ERROR

```
Error using normalize_rectangle (line 6)
Rectangle must contain 4 coordinates
```

### ERROR

`Error: Rectangle must contain 4 coordinates`

### ERROR

`error: Invalid X coordinates`

The post-conditions help us catch bugs by telling us when our calculations cannot have been correct. For example, if we normalize a rectangle that is taller than it is wide, everything seems OK:

### OUTPUT

`0.00000 0.00000 0.20000 1.00000`

but if we normalize one that’s wider than it is tall, the assertion is triggered:

### ERROR

`error: Calculated upper X coordinate invalid`

Re-reading our function, we realize that line 19 should divide
`dy`

by `dx`

. If we had left out the assertion at
the end of the function, we would have created and returned something
that had the right shape as a valid answer, but wasn’t. Detecting and
debugging *that* would almost certainly have taken more time in
the long run than writing the assertion.

But assertions aren’t just about catching errors: they also help people understand programs. Each assertion gives the person reading the program a chance to check (consciously or otherwise) that their understanding matches what the code is doing.

Most good programmers follow two rules when adding assertions to
their code. The first is, *fail early, fail often*. The greater
the distance between when and where an error occurs and when it’s
noticed, the harder the error will be to debug, so good code catches
mistakes as early as possible.

The second rule is, *turn bugs into assertions or tests*. If
you made a mistake in a piece of code, the odds are good that you have
made other mistakes nearby, or will make the same mistake (or a related
one) the next time you change it. Writing assertions to check that you
haven’t regressed (i.e., haven’t
re-introduced an old problem) can save a lot of time in the long run,
and helps to warn people who are reading the code (including your future
self) that this bit is tricky.

### Finding Conditions

Suppose you are writing a function called `average`

that
calculates the average of the numbers in a list. What pre-conditions and
post-conditions would you write for it? Compare your answer to your
neighbor’s: can you think of an input array that will pass your tests
but not hers or vice versa?

An assertion checks that something is true at a particular point in the program. The next step is to check the overall behavior of a piece of code, i.e., to make sure that it produces the right output when it’s given a particular input. For example, suppose we need to find where two or more time series overlap. The range of each time series is represented as a pair of numbers, which are the time the interval started and ended. The output is the largest range that they all include:

Most novice programmers would solve this problem like this:

- Write a function
`range_overlap`

. - Call it interactively on two or three different inputs.
- If it produces the wrong answer, fix the function and re-run that test.

This clearly works–after all, thousands of scientists are doing it right now–but there’s a better way:

- Write a short function for each test.
- Write a
`range_overlap`

function that should pass those tests. - If
`range_overlap`

produces any wrong answers, fix it and re-run the test functions.

Writing the tests *before* writing the function they exercise
is called test-driven
development (TDD). Its advocates believe it produces better code
faster because:

If people write tests after writing the thing to be tested, they are subject to confirmation bias, i.e., they subconsciously write tests to show that their code is correct, rather than to find errors.

Writing tests helps programmers figure out what the function is actually supposed to do.

Below are three test functions for `range_overlap`

, but
first we need a brief aside to explain some MATLAB behaviour.

- Argument format required for
`assert`

`assert`

requires a scalar logical input argument
(i.e. `true`

(1) or `false`

(0)).

- Testing for equality between matrices

The syntax `matA == matB`

returns a matrix of logical
values. If we just want a `true`

or `false`

answer
for the whole matrix (e.g. to use with `assert`

), we need to
use `isequal`

instead.

### MATLAB

```
%TEST_RANGE_OVERLAP Test script for range_overlap function.
% assert(isnan(range_overlap([0.0, 1.0], [5.0, 6.0])))
% assert(isnan(range_overlap([0.0, 1.0], [1.0, 2.0])))
assert(isequal(range_overlap([0, 1.0]),[0, 1.0]))
assert(isequal(range_overlap([2.0, 3.0], [2.0, 4.0]),[2.0, 3.0]))
assert(isequal(range_overlap([0.0, 1.0], [0.0, 2.0], [-1.0, 1.0]),[0.0, 1.0]))
```

### ERROR

```
Undefined function or variable 'range_overlap'.
Error in test_range_overlap (line 6)
assert(isequal(range_overlap([0, 1.0]),[0, 1.0]))
```

The error is actually reassuring: we haven’t written
`range_overlap`

yet, so if the tests passed, it would be a
sign that someone else had and that we were accidentally using their
function.

And as a bonus of writing these tests, we’ve implicitly defined what our inputs and output look like: we expect two or more input arguments, each of which is a vector with length = 2, and we return a single vector as output.

Given that `range_overlap`

will have to accept varying
numbers of input arguments, we need to learn how to deal with an unknown
number of input arguments before we can write
`range_overlap`

. Consider the example below from the MATLAB
documentation:

### MATLAB

```
function varlist(varargin)
%VARLIST Demonstrate variable number of input arguments
fprintf('Number of arguments: %d\n',nargin)
celldisp(varargin)
```

### OUTPUT

```
Number of arguments: 3
varargin{1} =
1 1 1
1 1 1
1 1 1
varargin{2} =
some text
varargin{3} =
3.1416
```

MATLAB has a special variable `varargin`

which can be used
as the last parameter in a function definition to represent a variable
number of inputs. Within the function `varargin`

is a
**cell array** containing the input arguments, and the
variable `nargin`

gives the number of input arguments used
during the function call. A *cell array* is a MATLAB data type
with indexed data containers called cells. Each cell can contain any
type of data, and is indexed using braces, or “curly brackets”
`{}`

.

### Variable number of input arguments

Using what we have just learned about `varargin`

and
`nargin`

fill in the blanks below to complete the first draft
of our `range_overlap`

function.

### MATLAB

```
function overlap = range_overlap(________)
%RANGE_OVERLAP Return common overlap among a set of [low, high] ranges.
lowest = -inf;
highest = +inf;
% Loop over input arguments
for i = 1:______
% Set range equal to each input argument
range = ________{i};
low = range(1);
high = range(2);
lowest = max(lowest, low);
highest = min(highest, high);
end
overlap = [lowest, highest];
```

### MATLAB

```
function overlap = range_overlap(varargin)
%RANGE_OVERLAP Return common overlap among a set of [low, high] ranges.
lowest = -inf;
highest = +inf;
% Loop over input arguments
for i = 1:nargin
% Set range equal to each input argument
range = varargin{i};
low = range(1);
high = range(2);
lowest = max(lowest, low);
highest = min(highest, high);
end
overlap = [lowest, highest];
```

Now that we have written the function `range_overlap`

,
when we run the tests:

we shouldn’t see an error.

Something important is missing, though. We don’t have any tests for the case where the ranges don’t overlap at all, or for the case where the ranges overlap at a point. We’ll leave this as a final assignment.

### Fixing Code Through Testing

Fix `range_overlap`

. Uncomment the two commented lines in
`test_range_overlap`

. You’ll see that our objective is to
return a special value: `NaN`

(Not a Number), for the
following cases:

- The ranges don’t overlap.
- The ranges overlap at their endpoints.

As you make change to the code, run `test_range_overlap`

regularly to make sure you aren’t breaking anything. Once you’re done,
running `test_range_overlap`

shouldn’t raise any errors.

Hint: read about the `isnan`

function in the help files to
make sure you understand what these first two lines are doing.

### MATLAB

```
function overlap = range_overlap(varargin)
%RANGE_OVERLAP Return common overlap among a set of [low, high] ranges.
lowest = -inf;
highest = +inf;
% Loop over input arguments
for i = 1:nargin
% Set range equal to each input argument
range = varargin{i};
low = range(1);
high = range(2);
lowest = max(lowest, low);
highest = min(highest, high);
% Catch non-overlapping ranges
if low >= highest || high<=lowest
output_range = NaN;
return
end
end
overlap = [lowest, highest];
```

## Debugging strategies

Once testing has uncovered problems, the next step is to fix them. Many novices do this by making more-or-less random changes to their code until it seems to produce the right answer, but that’s very inefficient (and the result is usually only correct for the one case they’re testing). The more experienced a programmer is, the more systematically they debug, and most follow some variation on the rules explained below.

The first step in debugging something is to *know what it’s
supposed to do*. “My program doesn’t work” isn’t good enough: in
order to diagnose and fix problems, we need to be able to tell correct
output from incorrect. If we can write a test case for the failing
case—i.e., if we can assert that with *these* inputs, the
function should produce *that* result—then we’re ready to start
debugging. If we can’t, then we need to figure out how we’re going to
know when we’ve fixed things.

But writing test cases for scientific software is frequently harder than writing test cases for commercial applications, because if we knew what the output of the scientific code was supposed to be, we wouldn’t be running the software: we’d be writing up our results and moving on to the next program. In practice, scientists tend to do the following:

*Test with simplified data.*Before doing statistics on a real data set, we should try calculating statistics for a single record, for two identical records, for two records whose values are one step apart, or for some other case where we can calculate the right answer by hand.*Test a simplified case.*If our program is supposed to simulate magnetic eddies in rapidly-rotating blobs of supercooled helium, our first test should be a blob of helium that isn’t rotating, and isn’t being subjected to any external electromagnetic fields. Similarly, if we’re looking at the effects of climate change on speciation, our first test should hold temperature, precipitation, and other factors constant.*Compare to an oracle.*A test oracle is something—experimental data, an older program whose results are trusted, or even a human expert—against which we can compare the results of our new program. If we have a test oracle, we should store its output for particular cases so that we can compare it with our new results as often as we like without re-running that program.*Check conservation laws.*Mass, energy, and other quantities are conserved in physical systems, so they should be in programs as well. Similarly, if we are analyzing patient data, the number of records should either stay the same or decrease as we move from one analysis to the next (since we might throw away outliers or records with missing values). If “new” patients start appearing out of nowhere as we move through our pipeline, it’s probably a sign that something is wrong.*Visualize.*Data analysts frequently use simple visualizations to check both the science they’re doing and the correctness of their code (just as we did in the opening lesson of this tutorial). This should only be used for debugging as a last resort, though, since it’s very hard to compare two visualizations automatically.

We can only debug something when it fails, so the second step is
always to find a test case that *makes it fail every time*. The
“every time” part is important because few things are more frustrating
than debugging an intermittent problem: if we have to call a function a
dozen times to get a single failure, the odds are good that we’ll scroll
past the failure when it actually occurs.

As part of this, it’s always important to check that our code is “plugged in”, i.e., that we’re actually exercising the problem that we think we are. Every programmer has spent hours chasing a bug, only to realize that they were actually calling their code on the wrong data set or with the wrong configuration settings, or are using the wrong version of the software entirely. Mistakes like these are particularly likely to happen when we’re tired, frustrated, and up against a deadline, which is one of the reasons late-night (or overnight) coding sessions are almost never worthwhile.

If it takes 20 minutes for the bug to surface, we can only do three
experiments an hour. That doesn’t mean we’ll get less data in more time:
we’re also more likely to be distracted by other things as we wait for
our program to fail, which means the time we *are* spending on
the problem is less focused. It’s therefore critical to *make it fail
fast*.

As well as making the program fail fast in time, we want to make it fail fast in space, i.e., we want to localize the failure to the smallest possible region of code:

The smaller the gap between cause and effect, the easier the connection is to find. Many programmers therefore use a divide and conquer strategy to find bugs, i.e., if the output of a function is wrong, they check whether things are OK in the middle, then concentrate on either the first or second half, and so on.

N things can interact in N

^{2}different ways, so every line of code that*isn’t*run as part of a test means more than one thing we don’t need to worry about.

Replacing random chunks of code is unlikely to do much good. (After
all, if you got it wrong the first time, you’ll probably get it wrong
the second and third as well.) Good programmers therefore *change one
thing at a time*, for a reason. They are either trying to gather
more information (“is the bug still there if we change the order of the
loops?”) or test a fix (“can we make the bug go away by sorting our data
before processing it?”).

Every time we make a change, however small, we should re-run our
tests immediately, because the more things we change at once, the harder
it is to know what’s responsible for what (those N^{2}
interactions again). And we should re-run *all* of our tests:
more than half of fixes made to code introduce (or re-introduce) bugs,
so re-running all of our tests tells us whether we have regressed.

Good scientists keep track of what they’ve done so that they can
reproduce their work, and so that they don’t waste time repeating the
same experiments or running ones whose results won’t be interesting.
Similarly, debugging works best when we *keep track of what we’ve
done* and how well it worked. If we find ourselves asking, “Did left
followed by right with an odd number of lines cause the crash? Or was it
right followed by left? Or was I using an even number of lines?” then
it’s time to step away from the computer, take a deep breath, and start
working more systematically.

Records are particularly useful when the time comes to ask for help. People are more likely to listen to us when we can explain clearly what we did, and we’re better able to give them the information they need to be useful.

Version control is often used to reset software to a known state
during debugging, and to explore recent changes to code that might be
responsible for bugs. In particular, most version control systems have a
`blame`

command that will show who last changed particular
lines of code…

And speaking of help: if we can’t find a bug in 10 minutes, we should
*be humble* and ask for help. Just explaining the problem aloud
is often useful, since hearing what we’re thinking helps us spot
inconsistencies and hidden assumptions.

Asking for help also helps alleviate confirmation bias. If we have just spent an hour writing a complicated program, we want it to work, so we’re likely to keep telling ourselves why it should, rather than searching for the reason it doesn’t. People who aren’t emotionally invested in the code can be more objective, which is why they’re often able to spot the simple mistakes we have overlooked.

Part of being humble is learning from our mistakes. Programmers tend to get the same things wrong over and over: either they don’t understand the language and libraries they’re working with, or their model of how things work is wrong. In either case, taking note of why the error occurred and checking for it next time quickly turns into not making the mistake at all.

And that is what makes us most productive in the long run. As the
saying goes, *A week of hard work can sometimes save you an hour of
thought*. If we train ourselves to avoid making some kinds of
mistakes, to break our code into modular, testable chunks, and to turn
every assumption (or mistake) into an assertion, it will actually take
us *less* time to produce working programs, not more.