I found this exercise was helpful for me to think about what I need to learn, what I know how to do, but don’t actually do, and what I’m actually OK at, in addition to the stated goal — to think about what new Software Carpentry learners want to learn. I know continue to learn a lot, so it’s been interesting to see what each person thinks someone at each level should know! I think the questions I came up with agree well with what others have had to say so far, with maybe a few small additions. The answers are pretty general and lack detail, so feel free to chip in with comments!
- How can I keep track of what I’ve done?
- novice: save multiple dated or randomly named files in a folder, email them to yourself when you remember (We all probably know someone who has over 80 saved versions of the same paper!)
- intermediate: use some sort of version control or dated back up system (such as packrat for dropbox)
- expert: use version control system, update frequently
- How can I store/manage my data?
- novice: save data in multiple excel files that may or may not be linked in a meaningful way, include lots of redundancies, score missing data as “0”
- intermediate: save data in a relational database, in a way that minimizes redundancies
- expert: save data in a relational database, where it can be quickly and easily queried. Use scripts to access, clean, and automatically update the data, and to flag potential problems.
- How can I clean up my (or someone else’s) messy data?
- novice: Go through excel files line by line, for data that doesn’t “look” right
- intermediate: use short scripts to flag questionable data, or data that may be at the limits of what should be allowed. Outline the data types, ranges, or other markers that may help identify what is OK and what is not.
- expert: write a series of tests to look for questionable data and can alert the user where it is located and/or make the appropriate correction and determine the type of error. Have a clear understanding of how data is best organized and how to reorganize a poorly constructed dataset.
- How can I track down mistakes in my code?
- novice: go through it line by line and look for code or output that doesn’t “look” right. Use print statements to make sure things are running OK.
- intermediate: run code using some simple boundary case inputs to determine that each function is working correctly.
- expert: write formal tests. Debugging.
- How can I look for help when I get stuck?
- novice: Go bug someone you know.
- intermediate: Google it. (Note: finding your own help online may depend on a certain level of knowledge that a novice may not always have.)
- expert: Google it, look on developer sites/blogs, know which websites are most helpful for a particular kind of problem. Ask for help in a detailed way and offer help in other online conversations.
- How can I become more confident in sharing/publishing my code?
- novice: What?! Someone else might see my code? I guess I’ll make sure it seems to work ok and has some annotations.
- intermediate: modularized code that runs from some intermediate stage, where the data is cleaned up. Reconstructs analyses and provides the output.
- expert: modularized code that runs from start (raw data) to finished product (figures, tables, etc.). Shared online in a way that is easy for others to download, use, replicate, and understand.