Round 2.1: Filters and pipes

Nov 6, 2012 • Henry Chong

Filters and pipes are simple UNIX system calls used to manipulate text files.  Simple filter commands, such as head, tail, grep or cut allow text files to be parsed.  Commands such as cat and paste allow files to be appended or aggregated.  These file manipulation commands lend themselves especially well to handling data stored in delimited text files (e.g., csv format).  Pipes allow the (standard) output of these commands to be used as the input to another filter command, allowing for filters to be compounded.  Sophisticated tools for parsing, characterizing, and formatting data files can be constructed through sequentially cascading filter commands with pipes.

Example

We have a data file of real-time observations of beam diameters and beam positions at different locations in the beam path of a laser system.  The filename is observation_data.csv.  The columns of the file are dates, times, locations, beam diameters, and beam positions, separated by commas, e.g.,

Date,Time,Location,Diameter,X,Y.
01/01/2010,10:30:05.54,Front,4.551,0.2,-0.3
01/01/2010,10:30:07.66,TopFlange,14.37,-1.18,1.77
01/01/2010,10:30:10.87,Front,4.53,-0.18,0.143

02/02/2010,16:35:14.8,BackMount,3.553,-2.43,3.14
02/02/2010,19:49:56.096,EntryPort,,6.133,1.56.18,2.676
02/02/2010,21:52:11.656,Flange,24.53,-0.123,10.234

The data file contains two-months worth of data taken roughly every five-seconds at seven different locations in the laser system.

The grep command searches for a string or pattern in a text file, and sends all rows with the string to the standard output.  For example,

% grep '01/02/2010′ observation_data.csv

will select all rows in the file observation_data.csv with the string '01/02/2010′ to the standard output, effectively selecting all measurements made on January 2, 2010, at all locations in the laser system.  The rows will be printed to you screen.  To send the output to a file for inspection later, we can use a carrot to re-direct the standard (screen) output to a file,

% grep '01/02/2010′ observation_data.csv > data_from_20100102.csv

To select only observations made on the Front mirror, on January 2, 2010, we can cascade grep calls,

% grep '01/02/2010′ observation_data.csv | grep 'Front' > data_from_20100102_Front.csv

How could we extract the necessary data in the follow cases?

1) January 12, 2011, during an experimental run from 11PM to 4AM, the laser system performed optimally based on an experimental metric (e.g., optimized interaction or yield); how do you extract the measured alignment of the laser, so you might be able design an adaptive optics system to stabilize laser alignment near this optimum.

2) You notice that measured experimental interaction tends to fade between 7AM and 10AM, as the run rises and shines on the TopFlange mirror mount.  You are concerned that the heat of the sun shining on the mount causes enough expansion to cause a minor misalignment.  How would you parse observation_data.csv to extract the data that may help confirm this?

Time spent: three hours.