Data Management videos

Aug 1, 2013 • Shoaib Sufi

I watched the Data Management videos at http://software-carpentry.org/v4/data/

For the first video which is at http://software-carpentry.org/v4/data/mgmt.html I felt the speed was a bit fast and I think the presenter speaking a bit more slowly would have helped my comprehension of what was being said.

Having said that some interesting ways of organising data in a file system were shown and the subtle problems of binary vs textual data being held in a version control system were mentioned.

Having said that as a starter is was good and helped people who make the classic errors when organising their work. Having said that a follow up video is probably needed.

The information was applicable but a bit outdated to be considered best practice or modern best practice.

Systems such as IPython notebook(http://ipython.org/notebook.html), Galaxy (http://galaxyproject.org) attempt to roll pipelining, running actual analysis, visualisation and provenance into one place thus easing data management.

Efforts such as Research Objects (http://www.researchobjects.org) are thinking deeply about the type of aggregation of resources used in research to help support the different shades of reproducible science; again highly relevant to data management.

So a useful first lesson on the subject but a more upto date view is needed; i.e. the video needs a refresh.

The second video was about Bein http://software-carpentry.org/v4/data/bein.html a system that helped run command line programs from python and store resulting data files in sensible place.

This was a short video and I got the flavour of what could be done but 5 more minutes show a Bein based programme running and what it does with shell outputs and what is inside the miniLIMS directory would have made me more empowered to try this out myself.

The presenter in neither video said their name or affiliation, while the first one was clearly Greg Wilson, who was the chap in the second video ? was he the person who developed Bein ? or was he an enthusiastic user ? I think it’s useful to know who is speaking and where they are based — as one can look them up afterwards and perhaps ask a follow-up question.

It was interesting to note that there were no comments on either page — was this due to the website change or was it always like this ?

I know SWC is about novice to competent but I think some introduction to Cloud computing and some discussion about moving the compute to big data rather than the other way around would have been interesting, even if just describing using a centralised data store like iRods from https://www.irods.org would have been useful or natural in a section on data management as it’s quite widely used in research or perhaps as further reading.

I am also surprised that there was no mention of catalogues (of data) — e.g. whether from Astronomy, Bioinformatics or Biodiversity as knowing where to get data from for your domain and where to put data for publication is surely part of data management ?

So a good section that needs updating and expanding; I guess the novice to competent aim will govern what should be on the syllabus.