Open science with figshare and object orientated-programming

Update: I’m pleased to say that I was awarded Imperial’s Bradley-Mason Prize for Open Chemistry — see Professor Rzepa’s blog post for more info.

From 1st May 2015, the EPSRC requires that all publications include a statement saying how the underlying research data can be accessed. Technically, you can simply include an email address to contact for the data, but I think that’s hardly in the spirit of open science. In this post, I want to describe how I used object-orientated programming (OOP) and figshare to meet this requirement for my latest paper in Lab on a Chip. You can download the data and MATLAB code to reproduce the graphs at figshare.

Graphical abstract for Microscale extraction and phase separation using a porous capillary.
Graphical abstract for my paper.

In OOP, you create classes that define objects and their properties. For example, if you had a class Animal, instances of this class could be cat and dog. For the Animal class the properties might be legs (an integer) or dateofbirth (a date). The class also defines methods, which are functions that operate on instances of a class. For example, Animal.age() might use the dateofbirth property to return the age of the animal.

For my paper I defined a class called sepexp (short for separation experiment, the subject of the paper) with properties corresponding to the independent and dependent variables. My class definition also included a method runall to run the experiments (which were, thankfully, automated—one of the joys of flow chemistry) and plot to plot the data.

To start an experiment, I would create an instance of my sepexp class. For example, let’s call it exp1, and during its creation I specify the independent variables. Executing exp1.runall() runs all the experiments defined by my properties. The details aren’t relevant here—see the paper if you’re interested—but the key thing is that it saves the results in the properties mass_initial and mass_final.

Now I’ve got an object that defines the experiment and contains the results I can save it, e.g. using save in MATLAB or pickle in Python.

The next step is to plot it, so I execute exp1.plot(), which does a straightforward calculation on the data collected to get the volumetric collection rate at the outlet and plots it. I then repeated this for each experiment.

What does this approach give you? You end up with a class definition and series of objects that contain the parameters of each experiment, how it was carried out, the results, and a means to reproduce the analyses. You can zip this up, upload it to figshare, and you’ve got a publicly accessible link to your data with a DOI.

An OOP approach saves time when analysing data, because you define how the data is analysed once in class definition, and apply it repeatedly to every object/experiment. It’s easy to iterate through all your objects (see the scripts in the /plotting_scripts folder). Distributing the class definition ands the objects together means others can reproduce your analysis. I think that’s pretty cool. If you’ve got MATLAB, download my archive and give it a go.

Or even better, try it out with your next project. There are lots of resources for learning OOP in your language of choice online. The MATLAB OOP documentation is good (although I think MATLAB’s OOP syntax is horrible). I personally like books and learnt about OOP for the first time in the excellent book Learning Python by Mark Lutz.

Light- and power-making things

Inspired by xkcd’s Up Goer Five comic Theo Sanderson created the Up Goer Five Text Editor. It challenges you to explain a hard idea using only the thousand ten hundred most commonly used words in the English language. Lots of scientists on Twitter have been using it to try and describe their work. It’s a lot harder than it sounds! Here’s my attempt:

Many years ago a few people were doing some work and, to their surprise, they managed to make light come out of something that had never had light come out of it before. People were very excited about it and now lots of groups of people spend their time trying to answer questions like “how does it work?” and “how can we make it work better?”. Everyone was interested because they thought it could be used to make new things like better TVs, very small computers and different kinds of lighting. But the perhaps the most important thing it could maybe do was give us all a new way to turn light from the sun into power for not very much money.

At the moment only a few people get to see them because they are hard to make. They are hard to make for lots of reasons, but perhaps the biggest reason is that the parts you need are themselves hard to make. Everyone struggles to make enough of them exactly as they need them to be. If the parts aren’t good enough, sometimes not very much light comes out, or for only a little while, or the ones that turn light from the sun into power don’t do it very well. No one wants any of those.

It doesn’t help that the normal ways of making the parts are often only good enough for making a little at a time. If you try to make more in the same way it stops working so well. I’m part of a group of people trying to make the parts in a new way that can make lots and lots and it still be good enough. In fact, our stuff is usually better than the best stuff you can buy.

I try lots of different ways to make things. I look in books to read how other people did things to get new ideas that no one else has had before. Sometimes they don’t work, but sometimes they do and when that happens it makes me very excited and happy. Sometimes we tell everyone but sometimes we only tell a few people. We can use my new way to make the light-making and power-making things work better and for less money than ever before so everyone can have them.

What do you think?

Negative results and dodgy papers: keep quiet or publish?

Negative results are very rarely published in the literature. After all, the literature is bursting with new positive results and we don’t have enough time to read all of these, let alone papers describing what doesn’t work. Negative results are dull—who would want to read anything in the Journal of Negative Results?

Up until recently I haven’t had a problem with the status quo. I’m afraid the following discussion is a bit vague because I’m (still) not sure about how much detail I can go into my work, but please bear with me.

I came across a paper published this year which describes the effect of doing something quite specific in a synthesis on nanoparticle shape. Do the thing, get a particular nanoparticle shape (usually quite challenging to obtain); stop doing the thing, you get another shape (easy to obtain). I was quite excited because if it worked it would get around a major barrier to my desired nanoparticles.

I repeated the reaction exactly as the paper described, but it didn’t work.

I repeated the reaction in a flow reactor as it would make it easy to intensify the “thing”. According to the paper, this should definitely give the desired nanoparticles because the morphology selectivity/yield is directly proportional to the intensity of the “thing”. But it still didn’t work.

I’ve now given up on the reaction and moved on to something else. But that my results will not be published means that someone else could also waste a lot of time and money—on equipment, reagents, electron microscopy—repeating the experiment.

What can I do? I think I have three options:

Option 1: Do nothing.

I’ve already made it clear that I don’t like this option. I’m fairly sure the paper is wrong. It bugs me that it exists without some kind of mark against it.

Option 2: Email the authors.

I’m not too keen on this either. I suspect that my email would be ignored. Plus, I would rather any discussion happened in the open, which brings me on to…

Option 3: Blog about it (and possibly email the authors telling them that I blogged about it).

I feel uneasy about this. Could it be perceived as confrontational? Would I get a reputation as a troublemaker? I feel like it is the proper, scientific and open thing to do, but in reality it is absolutely not the done thing. I suspect most researchers would go for option one and do nothing. I could be right and the paper is wrong, but I’d be very happy to be proven wrong and get the reaction working.

What you think? Keep quiet, email or blog? Any other suggestions are welcome.

The death of my paper lab book?

Nature recently had a feature on the “paperless” lab which mostly focused on electronic laboratory notebooks (ELNs). As a computer nerd, I’ve been thinking about using one for a while.

ELNs have lots of advantages over paper notebooks. They’re searchable, easily backed up and can automatically incorporate data from instruments—no more cutting and pasting. Businesses like them as it’s easier to find out what an ex-employee did in an ELN than in loads of paper notebooks.

I’ve always used the my department’s standard synthetic chemistry lab book which has a risk assessment and reaction scheme on every left page and lines on every right. It works quite well. I number every reaction TWP001, TWP002 etc and samples are labelled TWP001-A, TWP001-B, etc. Spectra follow a similar convention, e.g. TWP001-A_em_spec.txt or TWP001-A_abs_spec.txt, and all data and code used for data analysis is kept in a folder called TWP001_brief_description.

But there are a few things that I really hate about paper lab books. Going back through my notes when writing up work is a real chore, especially with seemingly never ending notes along the lines of “same as TWP050 except…”. Reaction TWP050 says: “same as TWP049 except…”. With an ELN you can just copy and paste.

The inherent linearity of a paper lab book is a pain. Entries are in chronological order and reactions are performed sequentially, one at a time, but I usually work on two or three reactions at a time. Leaving blank pages looks sloppy, but cramming notes into small gaps is messy.

The biggest problem is that paper notebooks have become incomplete records of research in the modern laboratory. A lab book should be a complete record of your thoughts, observations, measurements and results. However with modern lab instrumentation it’s impractical or impossible to include all the data by printing, cutting and sticking it in. For example, a search on my computer (not a look in my lab book) reveals 510 UV-vis absorption, fluorescence and excitation spectra recorded since August 2010. There’s no way I could print that out (and even if I did, the data is useless in that format). Furthermore, a paper lab book can’t capture any of the data analysis on the computer. My MATLAB (and now Python) code is riddled with comments. With paper lab books, this information is highly fragmented.

Considering these problems I’ve been looking at electronic alternatives for some time, but what I’ve disliked about them boils down to two things: inflexibility and how they handle data. They seem to try to fit everything into a particular template or form. With a paper lab book, I can write and draw whatever I want, which is important to me as I’m not a “normal” synthetic chemist—I with flow reactors and I’m more interested in my residence time than yield.

I want to be able to access my plain text data files as plain text files and not have them converted into horrible proprietary binary formats subject to the whims of the ELN vendor. Think of the hassle caused when Microsoft switched from .doc to .docx—I don’t want this happening with my data. Plain text files from 30 years ago can still be read today and will be readable for longer than I’ll be alive. It also worries me that a web based ELN could disappear and leave me with a load of horribly formatted files to wade through.

Researching online I found advocates of open notebook science—the (left field) practice of making your entire lab book and data available online as it is recorded—using blogs and wikis as ELNs. Cameron Neylon’s blog-like open lab book used the University of Southampton’s free LabTrove software. Lab book entries are like blog posts, with attachements for data, and you can organise posts using tags, e.g. “NMR” or categories, perhaps to organise posts related to a single reaction. Jean-Claude Bradley’s group notebook, called the UsefulChem Project used a wiki. I really like Bradley’s wiki and there are lots of nice examples if you click about on the list of reactions. His group upload and link to spectra and photographs—a complete research record.

I did a bit more research into using a wiki for an ELN and they seem to be the perfect match. They’re flexible in terms of organising data however I want and pages are versioned so you can see what was written when. There are loads of different wiki applications available, so I narrowed the possibilities down with the following criteria:

  • active development
  • proven large scale deployment for stability and reliability
  • open source and free
  • page access control
  • supports attachments
  • self-hosted because I don’t trust anyone
  • written in a nice programming language
  • stores data nicely, i.e. not binary formats

This boiled down to MediaWiki (runs Wikipedia), FosWiki (used for loads of corporate intranets) and MoinMoin (large scale deployments are the Apache Software Foundation, Python and Ubuntu wikis).

MediaWiki doesn’t handle attachments very well for ELNs since attachments are available globally, i.e. across the whole wiki at the top level rather being linked to individual pages. The latter makes more sense to me as spectra or photos (the attachment) are related to the experiment (the page) rather than the whole notebook (the wiki). MediaWiki is designed for open content, so it doesn’t do access control without dodgy extensions. It’s also written in PHP, which I have no intension of learning. So that’s MediaWiki struck off.

FosWiki is aimed at corporations, which I think you can tell from it’s look and feature list. It’s also written in Perl, which I really don’t want to learn. So that’s FosWiki gone.

Last is Moinmoin. Unlike MediaWiki, attachments are linked to pages. MoinMoin is written in Python, a really nice language I’ve started to use instead of MATLAB, so there’s the possibility of writing my own extensions. It’s currently at version 1.9.4, so it should be very stable, and version 2.0 is under active development. It’s very clean and tidy.

I spoke to my supervisor about an ELN and he was extremely keen so I’ve decided to give MoinMoin a go. I’ve installed it on a Linode virtual server running Ubuntu linux.[^VPS] It took a about 6 hours to install the whole server from scratch—not bad having never administered a server before! Initially I was a little worried about security, with data being on a internet server, but I’ve locked down the server pretty tight and am going to make off site backups to my office machine. If anyone is interested, I’ll write up how to set it up.

It would be cool to make MoinMoin chemically-savvy—perhaps by pulling in data from ChemSpider or Wolfram Alpha, or COSHH info from Sigma-Alrich? I think this could be done with a little Python scripting. I’ll open source anything good for others to use. I’m also planning on setting up an old scanner in the lab to upload paper drawings.

This could all prove to be an embarrassing experiment or even a complete nightmare and ending with me dusting off my most recent lab book and finding a pen. On the other hand, it could be great. We’ll have to wait and see!

[^VPS]: I could have installed it on a dedicated machine in the office, but we’re a bit short on machines and didn’t want to have to deal with hardware.

MSci Project Part 1: Quantum Dots

I don’t start my PhD until October so I won’t be posting much about it for a couple of months. In the mean time, I thought it would be nice to talk about what I did for my final year research project as part of my MSci degree.

The aim was to synthesise (core-shell and ternary) quantum dots using microfluidic reactors. It sounds complicated, but really it’s quite straight forward! An explanation of it all in one post would be rather long so I’m going to break it down into two posts, starting with quantum dots and then moving on to microfluidic reactors.

What are Quantum Dots?

Quantum dots are nanoparticles—particles only a few billionths of a metre in size—made from semiconductors. Semiconductors are materials whose electrical conductivity is midway between that of insulators and conductors. They are the foundation of modern electronics and without them we wouldn’t have components like transistors and diodes which are essential building blocks of the technology we use every day.

All materials have particular physical properties—such as the melting point or density—that are independent of how much of the material you have. For example, if you measured the melting point of a material, cut it in half, then remeasured the melting point, the melting point would not change. Properties like these are called intensive properties.

Imagine you had a piece of semiconductor and repeatedly measured an intensive property, such as melting point, then cut it in half. You would expect intensive properties to stay the same, regardless of the amount of material. However, if you carried on doing this for quite some time—so that your semiconductor was just a few billionths of a metre across—you would find that its properties would start to change: properties which were intensive become extensive and dependent on how much of the material you have. Chemists take can advantage of this phenomenon to tune the properties of semiconductors for particular applications by controlling the particle size.

Making Quantum Dots

Rather than breaking down macro- or microscopic bits of semiconductor to make nanoparticles (“top-down”), chemists usually make quantum dots from individual atoms (“bottom-up”). This is most commonly achieved by injecting the appropriate reagents into a hot solvent. The quantum dots spontaneously form in the hot solvent and are left to grow to the desired size.

The photo below is of some cadmium selenide quantum dots that I made last year. I think it’s a wonderful example of their size-dependent properties.

CdSe Quantum Dots
CdSe quantum dots fluorescing under UV light.

Each vial contains quantum dots that were removed from the reaction vessel at regular intervals. The vial on the far left hand side contains quantum dots grown for 30 seconds and the vial on the far right hand side contains quantum dots grown for 3 hours. The mean size of the particles grown for 30 seconds and 3 hours was 2.8 nm and 4.2 nm respectively, so the nanoparticle size increases from left to right.

The colour arises from a process called fluorescence. The vials are sat on top of an ultraviolet lamp which causes the quantum dots to fluoresce and emit light, the wavelength of which is dependent on the size of the quantum dots.

These unique optical properties make quantum dots very attractive for use in solar cells, displays and even in medical imaging. The trouble is that high-quality quantum dots are quite tricky to make, especially on an industrial scale. In part 2, I’ll talk a bit more about the applications of quantum dots, what microfluidics is and why it’s great for making quantum dots. If anyone has any questions, please don’t hesitate to ask!