Mastery Table: Data Management

Feb 23, 2013 • Kwasi Kwakwa

I’m probably about intermediate level here, but it is something I’ve had to deal with a lot over the last couple of years. Plus I’m guessing data management is one of those topics that most people have to learn about at one point in time or another. Usually after they need to retrieve something and have forgotten where it is or how they created it.

Beginner:

  • Dump all data files in the same directory, usually the same place the source is kept
  • Distinguish the files by names that make sense at the moment
  • Keep track of the newest files partly by timestamp
  • Keep track of the settings used to create the data with cryptic notes in lab notebook

Intermediate:

  • Place files in directories labelled by date
  • Put all sourcescripts in a separate directory from data
  • use subdirectory structure/filenames to represent the settings used to generate and/or process the data
  • Keep text files with extensive information on the creation and processing of the data files in the directory
  • Back up data manually unto an external hard drive

Advanced:

  • Plan out directory structure to best represent the flow of data and analysis results
  • Be able to select the best possible way to represent data(.csv file, custom binary, raw image file etc.)
  • Automate the data processing and archiving process as much as possible with scripts
  • Automate the backing up of data either by scripts or backup software
  • Place all scripts and metadata files in version control