Mastery --- Aron, Becky, and Azalee (Electric Lime)

Oct 7, 2012 • Azalee Bostroem

We started with a discussion on who our primary audience was and what length class we were going to teach. This helped us limit the scope of what we thought was important. We then started designing a curriculum rather than ask fundamental questions to be answered by our curriculum. This is reflected in Q5 especially. What you see below is the result of 2.5-3 hours of face to face discussion. With more time we would like to refine some of our questions to reflect skills we want student’s to learn with rather than specific subject matter. At the end you will find some material that we think should be included but didn’t find a specific question under which to place it.

Preliminaries

Preliminary Assumptions:

  • Assuming as 2 day workshop
  • Assuming that students volunteered to take the class
  • Assuming that students have done some basic programming in some language prior to the class
  • Assume that students come in at a basic level and we will teach them to operate at an intermediate level, and show them a sneak peak of the expert level (this is what is ahead of you should you wish to pursue this)

Our definitions of beginner, intermediate, and expert:

  • Beginner: What we assume people or the majority of their cohort are doing
  • Intermediate: The goal that we have set for them to have at the end of the 2 day workshop
  • Expert: Topics we recognize as important for mature scientific software developers to be aware of, comprehend, and practice, but that we will not have time to properly expose students to during a course. In other words, these topics and principles are important, but out-of-scope for an introductory course.

Questions

Q1. What do I need to know to effectively use the command line?

Novice: What’s a command line?

Intermediate: less, cd, ls, mv, cp, grep, rm, cat # git, python, ssh, cp, sftp, scp, pwd

Expert: environment variables (paths), shell scripting, startup files, advanced tools (find, sed/awk, cut, echo, printenv/setenv, which) shell redirection and pipes, aliases

Q2. What are fundamental skills of working remotely?

Novice: How do I work remotely?

Intermediate: wget/curl, ssh, sshfs, ssh-keygen, ssh passphrases, ssh agents,

Expert: expandra, sshfs

Q3. How should I collaborate with other researchers on code?

Novice: email, drop box

Intermediate: Basic version control: get repository, commit changes, merge, restore to a basic state. Which commands make changes at what level

Expert: github, forking and branching, advanced merge strategies, rebasing, bisection

Q4. How do I test my software?

Novice: Smoke test — run your software without error messages

Intermediate: Create a regression test case that you know the answer to, write a test that can be automatically rerun without intervention from the user. It might be useful

Expert: An automated test suite that is run daily, results automatically compared, and nagging emails sent to the developers when tests are failing

Q5. What are the fundamentals of programming in Python?

Novice: Basic programming skills in any language

Intermediate: starting python interpreter, help/?, arithmetic, variable assignment, if/statement, for/while loops,

Expert: See Q6

Q6. How can I create reusable code and use another person’s code? (assumes Q5)?

Novice: Hasn’t used modules before

Intermediate: writing a function, docstrings, importing, dir/?/tab, setting a default argument, scope

Expert: attributes, keyword and ordered arguments, comprehensions, lambda functions, closures, generators

Q7. When should I use different data structures?

Novice: lists for everything

Intermediate: how and when to use dictionaries and keys, tuples, lists, numpy arrays, sets

Expert: sparse matrices/vectors, tree structures, understanding big-Oh notation for various operations on stored data

Q8: How do I create and document code so that someone else could use it?

Novice: comment the code

Intermediate: write a doc string at the beginning, maybe write a readme file

Expert: documenting command line arguments with argparse, documentation generation tools like sphinx

Q9: How and when do I optimize my code?

Novice: Premature optimization is the root of all evil (Knuth)

Intermediate: Measuring and profiling

Expert: Benchmarks, Cython, numba, parallel computing

Q10: How does my computer work on a machine/operating system level?

Novice: I press the button and it turns on

Intermediate: Basics of file systems, interacting with the computer, file systems, hierarchies of data, pipelines, virtual machines

Expert: SIMD architectures, processes and threads, physical shared memory vs. distributed memory, global address space vs. message passing, performance models

Other things that came up during the discussion:

  • Package managers: fink, macports, brew, pip?
  • Use virtualenv: never have to touch your python path
  • Slightly different environments have trouble doing something — understanding environment dependencies
  • Test on a virtual machine
  • Datasets not on your physical machine — what do you do?
  • Should there be some discussion on the history of computing or scientific computing?
  • Aron recommends reading: Code Complete 2
  • We should highlight the importance of backing up your work
  • Do databases belong in a 2 day curriculum?