Mastery --- Katy, Justin K & Lynne (Atomic Tangerine)

Oct 4, 2012 • Justin Kitzes

Here are the six key questions that Atomic Tangerine (Katy, Lynne, Justin) settled on. Our examples of Novice, Intermediate, and Advanced skills under each question are a bit lengthy, and reflect our different (but generally complementary) perspectives on these different levels.

How do I prepare and analyze my data?

Novice: perform numeric calculations for each entry with pencil and paper; use Excel or other programs with graphical interface, copy-paste, save resulting files; use someone else’s code without modification; modify data files by hand

Intermediate: mentally break problem into parts that can be solved with existing functions; use MatLAB, R, bash, Python or other small, simple scripts to call known functions; automate saving of results files

Advanced: automate file conversions with error checking; store data files as text; traverse data(base) dynamically with command line or viewer; periodically integration test; write custom, modular functions/libraries; choose best approach/libraries from known toolbox; combine multiple languages; parallel processing

How do I speed up my work and avoid duplication?

Novice: identify major tools/platforms used in my domain and open/use scripts generated by others; ask labmates for help/code; get undergrads!; collaborate with others by sending email attachments back and forth

Intermediate: buy commercial (or get free) software that is known to solve problems like mine (bonus points if it comes with customer service); write scripts to automate different parts of my workflow, but don’t necessarily chain them together well; keep an active listhost/issue tracker to coordinate work with others

Advanced: identify strengths/weaknesses of different tools used in my domain, and choose the appropriate one for each task based on needs, not my skill level; utilize and link to open source libraries, and contribute back improvements; use a master script to complete all tasks in one step; advanced issue tracking with bug/ticket systems; known and purposefully chosen development strategy (lean, agile, kanban, etc.)

How do I keep track of what I’ve done?

Novice: don’t bother; just remember it; record steps (perhaps quite thoroughly) in lab notebook; name files like MyDataDay1, MyDataDay2, MyDataDay2a, etc.; email back and forth with colleagues, changing file names with each step

Intermediate: keep notes on manual calculations or file modifications, such as a Readme file in each folder; give some thought to file naming and directory structure, keeping relevant files together; annotate and comment code to track thought process; use shell merge or Word track changes (not for code!) to merge contributions

Advanced: use version control to record and track changes in code and documentation, with branches, tags, etc.; no manual modifications of files at any stage (all data modifications and analysis done by scripts); baseline versioning system for databases; large (binary) data files backed up but not in version control; tag output with metadata

How do I verify and validate my actions and results?

Novice: visual debugging to recognize errors; if simulation results approximately known solution, assume entire method if valid; compare results to figures, tables in other papers and books

Intermediate: keep clean, modular code base with good unit test coverage; ask an expert for help on corner solutions, likely logic errors; put basic error checking/raising statements in code

Advanced: keep dashboard to build and test code on various platforms (nightly), automate blame in event of failing commits; excellent unit test coverage plus integration tests; test driven development; use of toy data sets with known (computed by hand) results, including boundary cases

How do I get help?

Novice: muck around with data/code until it works; ask labmates; read manual (maybe)

Intermediate: man and apropos in shell; Google; StackOverflow; read manual (probably); begin consulting key external resources, listserves, etc.

Advanced: consult external key resources (Software Carpentry!, Python core library docs, etc.); ask questions on listserves/irc rooms; online issue and ticket boards for code bases you work with

How do I prepare my work for sharing with others?

Novice: copy/paste results back into Excel / Word, make figures; draw graphs by hand; wait until all done then start to think about documentation and the manuscript; take “notes to self” files generated along the way and combine them at the end

Intermediate: use graphical interface or console to manually adjust figs and save to HD; use scripts to write data tables, copy into manuscript; manually convert file formats where needed; document code with comment strings and Readme files; give files useful names and store somewhere others can access them

Advanced: fully automated figure and table generation and saving using scripts and libraries like matplotlib; files in standardized hierarchy (src, doc, man, etc.); access all data in place; automated inclusion of results in manuscripts (Sweave, LaTeX includes); doxygen, javadoc, sphinx; put documentation on the internet (automated builds)