Open science with figshare and object orientated-programming

Update: I’m pleased to say that I was awarded Imperial’s Bradley-Mason Prize for Open Chemistry — see Professor Rzepa’s blog post for more info.

From 1st May 2015, the EPSRC requires that all publications include a statement saying how the underlying research data can be accessed. Technically, you can simply include an email address to contact for the data, but I think that’s hardly in the spirit of open science. In this post, I want to describe how I used object-orientated programming (OOP) and figshare to meet this requirement for my latest paper in Lab on a Chip. You can download the data and MATLAB code to reproduce the graphs at figshare.

Graphical abstract for Microscale extraction and phase separation using a porous capillary.
Graphical abstract for my paper.

In OOP, you create classes that define objects and their properties. For example, if you had a class Animal, instances of this class could be cat and dog. For the Animal class the properties might be legs (an integer) or dateofbirth (a date). The class also defines methods, which are functions that operate on instances of a class. For example, Animal.age() might use the dateofbirth property to return the age of the animal.

For my paper I defined a class called sepexp (short for separation experiment, the subject of the paper) with properties corresponding to the independent and dependent variables. My class definition also included a method runall to run the experiments (which were, thankfully, automated—one of the joys of flow chemistry) and plot[^overloading] to plot the data.

To start an experiment, I would create an instance of my sepexp class. For example, let’s call it exp1, and during its creation I specify the independent variables. Executing exp1.runall() runs all the experiments defined by my properties. The details aren’t relevant here—see the paper if you’re interested—but the key thing is that it saves the results in the properties mass_initial and mass_final.

Now I’ve got an object that defines the experiment and contains the results I can save it, e.g. using save in MATLAB or pickle in Python.[^binary]

The next step is to plot it, so I execute exp1.plot(), which does a straightforward calculation on the data collected to get the volumetric collection rate at the outlet and plots it. I then repeated this for each experiment.

What does this approach give you? You end up with a class definition and series of objects that contain the parameters of each experiment, how it was carried out, the results, and a means to reproduce the analyses. You can zip this up, upload it to figshare, and you’ve got a publicly accessible link to your data with a DOI.

An OOP approach saves time when analysing data, because you define how the data is analysed once in class definition, and apply it repeatedly to every object/experiment. It’s easy to iterate through all your objects (see the scripts in the /plotting_scripts folder). Distributing the class definition ands the objects together means others can reproduce your analysis. I think that’s pretty cool. If you’ve got MATLAB, download my archive and give it a go.

Or even better, try it out with your next project. There are lots of resources for learning OOP in your language of choice online. The MATLAB OOP documentation is good (although I think MATLAB’s OOP syntax is horrible). I personally like books and learnt about OOP for the first time in the excellent book Learning Python by Mark Lutz.

[^overloading]: This is an example of operator overloading.

[^binary]: The main disadvantage of these methods are that they save the data as binary objects. There are also security issues around opening pickle objects from untrusted sources. Therefore I recommend that when you come to publishing your data you also export it as ASCII, which is straightforward. See the export_mat2csv.m script in the figshare archive.

Leave a Reply

Your email address will not be published. Required fields are marked *