I think the purpose of this endeavor is to covey tools that support the tenets of science, reproducibility, peer review, documentation, data control, etc. I tried not to look at your questions before writing these, but they turned out very similarly anyway. You phrased yours a little more clearly, perhaps. Here are my questions :
- How do I record multiple versions of my data and methods?
- How do I recreate the input from the output?
- How do I keep track of contributions by others?
- How do analyze thisdata?
- How do I visualize this data?
- How do I leverage the work of others to avoid reinventing the wheel?
- How do I communicate my data and methods to peers?
- How do I collaborate on this work?
- How do I verify these methods?
- How do I automate most of it?
- How do I find help?
As I wrote out the behavior descriptors, I realized that we don’t always teach toward the Advanced developer behavior. I also recognize I probably don’t know (or do) what advanced people are doing.
- How do I record multiple versions of my data and methods?
- Novice : MyDataDay1, MyDataDay2, MyDataDay2a, etc.
- Intermediate : Well organized system of files and folders.
- Advanced : Version Control Software for the source files, a baseline versioning system for databaseshttp://odetocode.com/Blogs/scott/archive/2008/02/01/versioning-databases-the-baseline.aspx
- How do I recreate the input from the output ?
- Novice : Keep meticulously dated, detailed notes in a lab notebook.
- Intermediate : Keep readme.txt files in the folders for each simulation.
- Advanced : Automatically tag output with metadata about the input in a quickly reusable way.
- How do I keep track of contributions by others?
- Novice : Periodically ask your labmate to email you his last copy of the code. Add your changes by hand.
- Intermediate :Periodically ask your labmate to email you his last copy of the code. Use the shell’s merge command.
- Advanced : Version control. Even better : distributed version control.
- How do I analyze thisdata?
- Novice : Perform numeric calculations for each entry with pencil and paper.
- Intermediate :Perform numeric calculations for each entry with MatLAB. Copy and paste new data to repeat.
- Advanced : Traverse the database dynamically through the command line or a viewer. Script the calculations and make sure it periodically provides a plot or two for consistency checking.
- How do I visualize this data?
- Novice : Draw the plot on graph paper.http://maxcho.com/2011/09/hand-drawn-science/
- Intermediate : copy and paste the data into Excel. Press the plot button.
- Advanced : Use a plotting library like matplotlib and write a script. Access the data in place.
- How do I leverage the work of others to avoid reinventing the wheel?
- Novice :Ask your labmate if she’s invented this wheel already. If not, tough luck.
- Intermediate : Buy commercial off the shelf software. It’s not clear what algorithm it’s using on your data, but the expensive license comes with free customer service.
- Advanced : Utilize open source libraries that meet your needs. Make improvements where they miss the mark. Even better : contribute your improvements back to the open source project as you progress.
- How do I communicate my data and methods to peers?
- Novice : Write a paper when you’re done.
- Intermediate : Document the code in comment strings and readme.txt files. Interested parties can see your notes by sifting through the source code.
- Advanced : Doxygen, javadoc, sphinx. Put that documentation on the internet. Even better, automate nightly documentation builds and website updates.
- How do I collaborate on this work?
- Novice : Write a lot of emails, attaching new/offending code versions to each message.
- Intermediate : Keep an active developer listhost for answering questions and keeping track of solutions.
- Advanced : Bug/Ticket systems, fancy development workflows (lean, agile, kanban, etc.), gerrit, huboard, irc, etc.
- How do I verify these methods?
- Novice :Periodicallyconducting visual code review. If the simulation results are within the uncertainty range for a known solution, these methods must be valid everywhere.
- Intermediate : Keeping a clean, modular code base and having reasonable unit test coverage.
- Advanced : Keeping a dashboard that builds and tests the code on numerous platforms nightlyand atuomates developer shaming in the event of failing commits.
- How do I automate most of it?
- Novice : Undergrads!
- Intermediate : Write a number of bash/perl scripts. Sometimes pipe them together, but sometimes you don’t need to run the sixth one, so you usually end up running either all eight scripts or 1,2,3,4,5,7, and 8 individually after each run.
- Advanced : Write a master script that allows flexible, general, scriptable tasks and completes your processing in just one step, accepting command line arguments or whatever. Python is good for this.
- How do I find help?
- Novice : If its not in the user manual, ask your officemate. (Though, recently she seems to often direct you to a strange site, lmgtfy.com.)
- Intermediate : The”man” and “apropos” shell commands. StackOverflow.
- Advanced : Online issue and ticket boards for the code bases you work with, developer listhosts and irc rooms for those code bases.