Update: I’m pleased to say that I was awarded Imperial’s Bradley-Mason Prize for Open Chemistry — see Professor Rzepa’s blog post for more info.
From 1st May 2015, the EPSRC requires that all publications include a statement saying how the underlying research data can be accessed. Technically, you can simply include an email address to contact for the data, but I think that’s hardly in the spirit of open science. In this post, I want to describe how I used object-orientated programming (OOP) and figshare to meet this requirement for my latest paper in Lab on a Chip. You can download the data and MATLAB code to reproduce the graphs at figshare.
In OOP, you create classes that define objects and their properties. For example, if you had a class
Animal, instances of this class could be
dog. For the
Animal class the properties might be
legs (an integer) or
dateofbirth (a date). The class also defines methods, which are functions that operate on instances of a class. For example,
Animal.age() might use the
dateofbirth property to return the age of the animal.
For my paper I defined a class called
sepexp (short for separation experiment, the subject of the paper) with properties corresponding to the independent and dependent variables. My class definition also included a method
runall to run the experiments (which were, thankfully, automated—one of the joys of flow chemistry) and
plot[^overloading] to plot the data.
To start an experiment, I would create an instance of my
sepexp class. For example, let’s call it
exp1, and during its creation I specify the independent variables. Executing
exp1.runall() runs all the experiments defined by my properties. The details aren’t relevant here—see the paper if you’re interested—but the key thing is that it saves the results in the properties
The next step is to plot it, so I execute
exp1.plot(), which does a straightforward calculation on the data collected to get the volumetric collection rate at the outlet and plots it. I then repeated this for each experiment.
What does this approach give you? You end up with a class definition and series of objects that contain the parameters of each experiment, how it was carried out, the results, and a means to reproduce the analyses. You can zip this up, upload it to figshare, and you’ve got a publicly accessible link to your data with a DOI.
An OOP approach saves time when analysing data, because you define how the data is analysed once in class definition, and apply it repeatedly to every object/experiment. It’s easy to iterate through all your objects (see the scripts in the
/plotting_scripts folder). Distributing the class definition ands the objects together means others can reproduce your analysis. I think that’s pretty cool. If you’ve got MATLAB, download my archive and give it a go.
Or even better, try it out with your next project. There are lots of resources for learning OOP in your language of choice online. The MATLAB OOP documentation is good (although I think MATLAB’s OOP syntax is horrible). I personally like books and learnt about OOP for the first time in the excellent book Learning Python by Mark Lutz.
[^overloading]: This is an example of operator overloading.
[^binary]: The main disadvantage of these methods are that they save the data as binary objects. There are also security issues around opening pickle objects from untrusted sources. Therefore I recommend that when you come to publishing your data you also export it as ASCII, which is straightforward. See the
export_mat2csv.m script in the figshare archive.