Category Archives: Science

Paper submitted to Recomputability 2014: “Share and Enjoy”: Publishing Useful and Usable Scientific Models

Last month, me, Ben Hall, Samin Ishtiaq and Kenji Takeda (all Microsoft Research) submitted a paper to Recomputability 2014, to be held in conjunction with the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014) in London in December. This workshop is an interdisciplinary forum for academic and industrial researchers, practitioners and developers to discuss challenges, ideas, policy and practical experience in reproducibility, recomputation, reusability and reliability across utility and cloud computing. It aims to provide an opportunity to share and showcase best practice, as well as to offering a platform to further develop policy, initiatives and practical techniques for researchers in this domain.

In our paper, we discuss a number of issues in this space, proposing a new open platform for the sharing and reuse of scientific models and benchmarks. You can download our arXiv pre-print; the abstract is as follows:

The reproduction and replication of reported scientific results is a hot topic within the academic community. The retraction of numerous studies from a wide range of disciplines, from climate science to bioscience, has drawn the focus of many commentators, but there exists a wider socio-cultural problem that pervades the scientific community. Sharing data and models often requires extra effort, and this is currently seen as a significant overhead that may not be worth the time investment.

Automated systems, which allow easy reproduction of results, offer the potential to incentivise a culture change and drive the adoption of new techniques to improve the efficiency of scientific exploration. In this paper, we discuss the value of improved access and sharing of the two key types of results arising from work done in the computational sciences: models and algorithms. We propose the development of an integrated cloud-based system underpinning computational science, linking together software and data repositories, toolchains, workflows and outputs, providing a seamless automated infrastructure for the verification and validation of scientific models and in particular, performance benchmarks.

 
(see GitHub repo)

Tagged , , , , , ,

Paper submitted to WSSSPE2: “Can I Implement Your Algorithm?”: A Model for Reproducible Research Software

Yesterday, me, Ben Hall and Samin Ishtiaq (both Microsoft Research Cambridge) submitted a paper to WSSSPE2, the 2nd Workshop on Sustainable Software for Science: Practice and Experiences to be held in conjunction with SC14 in New Orleans in November. As per the aims of the workshop: progress in scientific research is dependent on the quality and accessibility of software at all levels and it is critical to address challenges related to the development, deployment and maintenance of reusable software as well as education around software practices.

As discussed in our paper, we feel this multitude of research software engineering problems are not just manifest in computer science, but also across the computational science and engineering domains (particularly with regards to benchmarking and availability of code). We highlight a number of recommendations to address these issues, as well as proposing a new open platform for scientific software development. You can download our arXiv pre-print; the abstract is as follows:

The reproduction and replication of novel scientific results has become a major issue for a number of disciplines. In computer science and related disciplines such as systems biology, the issues closely revolve around the ability to implement novel algorithms and approaches. Taking an approach from the literature and applying it in a new codebase frequently requires local knowledge missing from the published manuscripts and project websites. Alongside this issue, benchmarking, and the development of fair, and widely available benchmark sets present another barrier. In this paper, we outline several suggestions to address these issues, driven by specific examples from a range of scientific domains. Finally, based on these suggestions, we propose a new open platform for scientific software development which effectively isolates specific dependencies from the individual researcher and their workstation and allows faster, more powerful sharing of the results of scientific software engineering.

 
(see GitHub repo)

Tagged , , , , , ,

Call for Papers: Recomputability 2014

I am co-chairing Recomputability 2014, the first workshop to focus explicitly on recomputability and reproducibility in the context of utility and cloud computing and is open to all members of the cloud, big data, grid, cluster computing and open science communities. Recomputability 2014 is an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014), to be held in London in December 2014.

Recomputability 2014 will provide an interdisciplinary forum for academic and industrial researchers, practitioners and developers to discuss challenges, ideas, policy and practical experience in reproducibility, recomputation, reusability and reliability across utility and cloud computing. It will provide an opportunity to share and showcase best practice, as well as to provide a platform to further develop policy, initiatives and practical techniques for researchers in this domain. Participation by early career researchers is strongly encouraged.

Proposed topics of interest include (but are not limited to):

  • infrastructure, tools and environments for recomputabilty and reproducibility in the cloud;
  • recomputability for virtual machines;
  • virtual machines as self-contained research objects or demonstrators;
  • describing and cataloging cloud setups;
  • the role of community/open access experimental frameworks and repositories for virtual machines and data, their operation and sustainability;
  • validation and verification of experimental results by the community;
  • sharing and publication issues;
  • recommending policy changes for recomputability and reproducibility;
  • improving education and training: best practice, novel uses, case studies;
  • encouraging industry’s role in recomputability and reproducibility.

Please see the full call for papers; deadline for submissions (online via EasyChair) is 10 August 2014 17 August 2014.

Tagged , , , , , ,

The Economic Significance of the UK Science Base

A new independent report for the Campaign for Science & Engineering (CaSE) published today shows that investing public money in science and engineering is good for the economy. The Economic Significance of the UK Science Base examines the economic impact of public investment in the UK science base.

uksciencebasecover

The report looks in detail at the relationship between public funding of science and engineering and three levels of economic activity: total factor productivity growth in industries; ability of universities to attract external income; and interaction between individual researchers and the wider economy.

The report shows that, at the level of industries, universities and individual researchers, public investment in science and engineering leads to economic growth. CaSE is thus calling for current and future governments to recognise that public spending on science and engineering is an investment with significant benefits for the economy and society.

The report was written by Professor Jonathan Haskel (Imperial College Business School), Professor Alan Hughes and Dr Elif Bascavusoglu-Moreau (both University of Cambridge). It was funded by a consortium of six CaSE members: British Pharmacological Society, The Geological Society, The Institution of Engineering and Technology, Institute of Physics, Royal Society of Chemistry and Society of Biology.

Read the full report or the key messages from the two page briefing note.

(N.B. I sit on the board of directors of CaSE)

Tagged , , ,

Inquiry into STEM skills in Wales

The National Assembly for Wales’ Enterprise and Business Committee is undertaking a follow-up Inquiry into STEM skills, after the publication of a report on the STEM agenda in Wales in January 2011. The terms of reference for this consultation are as follows:

  • What impact has the Welsh Government’s strategy Science for Wales and Delivery Plan had on STEM skills in Wales?
  • What progress has been made in addressing the issues identified in the Enterprise and Learning Committee’s 2011 inquiry into the STEM agenda, including:
    • The adequacy of provision of STEM skills in schools, further education colleges, higher education and work-based learning (including apprenticeships);
    • Value for money from the additional funding to support and promote STEM skills and whether the current supply of STEM skills is meeting the needs of the Welsh labour market;
    • The supply of education professionals able to teach STEM subjects and the impact of Initial Teacher Training Grants and the Graduate Teacher Programme on recruiting STEM teachers and education professionals;
    • The effectiveness of education and business links between education institutions and STEM employers.
  • Whether any progress has been made on addressing negative perceptions and gender stereotypes of STEM and promoting good practice to encourage women to acquire STEM skills and to follow STEM related careers.
  • What progress has been made on learning STEM skills through Welsh medium education and training?

See the full consultation; the Committee welcomes responses from both individuals and organisations, with a deadline of Friday 25 April 2014.

Tagged , ,

Embrace logic

Let him who is not come to logic be plagued with continuous and everlasting filth.

Metalogicon II (1159)
John of Salisbury (1120-1180)

Tagged , , ,

Is the Universe a simulation?

From an article by Edward Frenkel in today’s New York Times:

Many mathematicians, when pressed, admit to being Platonists. The great logician Kurt Gödel argued that mathematical concepts and ideas “form an objective reality of their own, which we cannot create or change, but only perceive and describe”. But if this is true, how do humans manage to access this hidden reality?

We don’t know. But one fanciful possibility is that we live in a computer simulation based on the laws of mathematics — not in what we commonly take to be the real world. According to this theory, some highly advanced computer programmer of the future has devised this simulation, and we are unknowingly part of it. Thus when we discover a mathematical truth, we are simply discovering aspects of the code that the programmer used.

 
This hypothesis is by no means new; in Are you living in a computer simulation, Nick Bostrum argues that one of the following propositions is true:

  1. the human species is very likely to go extinct before reaching a “posthuman” stage;
  2. any posthuman civilisation is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof);
  3. we are almost certainly living in a computer simulation.

Also see: Constraints on the Universe as a Numerical Simulation.

Tagged , , ,

This Article Should Not Be Rejected

In 1990, Spanish philosopher Jon Perez Laraudogoitia submitted an article to the journal Mind entitled “This Article Should Not Be Rejected by Mind”. In it, he argued:

  1. If statement 1 in this argument is trivially true, then this article should be accepted.
  2. If statement 1 were false, then its antecedent (“statement 1 in this argument is trivially true”) would be true, which means that statement 1 itself would be true, a contradiction. So statement 1 must be true.
  3. But that seems wrong, since Mind is a serious journal and shouldn’t publish trivial truths.
  4. That means statement 1 must be either false or a non-trivial truth. We know it can’t be false (#2), so it must be a non-trivial truth, and its antecedent (“statement 1 in this argument is trivially true”) is false.
  5. What then is the truth value of its consequent, “this article should be accepted”? If this were false then Mind shouldn’t publish the article; that can’t be right, since the article consists of a non-trivial truth and its justification.
  6. So the consequent must be true, and Mind should publish the article.

They published it. “This is, I believe, the first article in the whole history of philosophy the content of which is concerned exclusively with its own self, or, in other words, which is totally self-referential”, Laraudogoitia wrote. “The reason why it is published is because in it there is a proof that it should not be rejected and that is all”.

(reblogged from Futility Closet)

Tagged , , ,

Grant applications, early 20th century style

warburggrant

Facsimile of a research proposal submitted by Otto Warburg to the Notgemeinschaft der Deutschen Wissenschaft (Emergency Association of German Science), c.1921.

The application, which consisted of a single sentence, “I require 10,000 marks“, was funded in full.

(read the full Nature Reviews Cancer article)

Tagged , ,

52 things to know about policy, science and the public

There have been a flurry of articles of late listing important things that scientists, politicians and the public should know about each other. I am collating them here because I enjoyed each of the pieces and think it likely that I (or others) will want to consult them in the future.

First to appear was a piece in Nature in November by William Sutherland, David Spiegelhalter and Mark Burgman — Policy: Twenty tips for interpreting scientific claims (which mutated into the Top 20 things politicians need to know about science when reported in the Guardian).

In reply a couple of weeks later Chris Tyler, the Director of the Parliamentary Office of Science and Technology, listed his Top 20 things scientists need to know about policy-making.

Just two days after Tyler’s post, Roland Jackson of Sciencewise, a programme devoted to fostering public discourse about science policy, sought to remind both scientists and policy makers about the general public by enumerating 12 things policy-makers and scientists should know about the public.

Whilst offering a range of perspectives, there is certainly overlap between the 52 points; I recommend each of them as a worthwhile read for anyone interested in the intersection of science, people and policy.

Tagged

2014 Software Sustainability Institute Fellowship

SSI

I’m delighted to have been named today as one of the sixteen Software Sustainability Institute Fellows for 2014.

The Software Sustainability Institute (SSI) is an EPSRC-funded project based at the universities of Edinburgh, Manchester, Oxford and Southampton, and draws on a team of experts with a breadth of experience in software development, project and programme management, research facilitation, publicity and community engagement. It’s a national facility for cultivating world-class research through software, whose goal is to make it easier to rely on software as a foundation of research; see their manifesto. The SSI works with researchers, developers, funders and infrastructure providers to identify the key issues and best practice surrounding scientific software.

During my fellowship, I’m particularly keen to work closely with Software Carpentry and Mozilla Science Lab to highlight the importance of software skills across the STEM disciplines. I’m also interested in a broader open science/open computation agenda; see the Recomputation Manifesto and the recently established recomputation.org project.

More to follow in 2014!

Tagged , , , , , , ,

Colloquial definitions of Big, Open and Personal Data

Here’s a useful (draft) set of colloquial definitions for Big, Open and Personal Data on GitHub from the Open Data Institute.

Why is this a worthwhile exercise? Well, Open Data gets conflated with Personal Data, everyone talks about Big Data (yet no-one is exactly sure what it is, but many have tried to define it)…and we all should be concerned about Personal Data.

typesofdata

1. Big Data is (i) data that you cannot handle with conventional tools or (ii) a term used as a vague metaphor for solving problems with data.

2. Open Data is data that anyone can use; without legal, technical or financial barriers.

3. Personal Data is data derived from people, where you can distinguish a person from other people in the group.

(also, can Big Open Personal (BOP) Data exist?)

Tagged , , , , ,

Science rules of thumb


If an elderly but distinguished scientist says that something is possible he is almost certainly right, but if he says that it is impossible he is very probably wrong.

Arthur C. Clarke

 

When, however, the lay public rallies around an idea that is denounced by distinguished but elderly scientists and supports that idea with great fervor and emotion — the distinguished but elderly scientists are then, after all, probably right.

Isaac Asimov

(reblogged from Futility Closet)

Tagged , , , ,

Ten Simple Rules for Reproducible Computational Research

In a paper published last week in PLoS Computational Biology, Sandve, Nekrutenko, Taylor and Hovig highlight the issue of replication across the computational sciences. The dependence on software libraries, APIs and toolchains, coupled with massive amounts of data, interdisciplinary approaches and the increasing complexity of the questions being asked are complicating replication efforts.

To address this, they present ten simple rules for reproducibility of computational research:
 

Rule 1: For Every Result, Keep Track of How It Was Produced

Rule 2: Avoid Manual Data Manipulation Steps

Rule 3: Archive the Exact Versions of All External Programs Used

Rule 4: Version Control All Custom Scripts

Rule 5: Record All Intermediate Results, When Possible in Standardized Formats

Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds

Rule 7: Always Store Raw Data behind Plots

Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected

Rule 9: Connect Textual Statements to Underlying Results

Rule 10: Provide Public Access to Scripts, Runs, and Results


The rationale underpinning these rules clearly resonates with the work of the Software Sustainability Institute: better science through superior software. Based at the universities of Edinburgh, Manchester, Oxford and Southampton, it is a national facility for cultivating world-class research through software (for example, Software Carpentry). An article that caught my eye in July was the Recomputation Manifesto: computational experiments should be recomputable for all time. In light of the wider open data and open science agenda, should we also be thinking about open software and open computation?

Tagged , , , , , , ,

Winchester Science Festival 2013

Yesterday I spoke at the 2013 Winchester Science Festival, a fantastic weekend of science communication and science education with some excellent speakers. My talk was entitled “Computing: The Science of Nearly Everything” (slides), which attempted to reset the perception of computer science: highlighting the importance of computer science education (in particular the wide utility of programming) and how modern science and engineering increasingly leverages computation.

Précis: We have seen how computational techniques have moved on from assisting scientists in doing science, to transforming both how science is done and what science is done (also see this Royal Society report). Thus, perhaps we should value the increasingly cross-cutting and interdisciplinary field of computer science, as well as computational literacy from school through to postgraduate research skills training.

Dr Tom Crick opening slide

9335493620_0c3b2f4df5_o

(you can also see other photos from the 2013 Winchester Science Festival, including me doing silly gestures)

Tagged , ,

Delivering a Digital Wales

Next week I will be speaking at Digital 2013, a headline Welsh Government event highlighting the importance of the ICT sector in Wales. In preparation for the event, I was interviewed to discuss the “Digital 2013 Opportunity“, especially with the ongoing ICT review in Wales, as well as broader science, technology and innovation policy:

 

Tagged , , , ,

All scientific knowledge

If, in some cataclysm, all scientific knowledge were to be destroyed, and only one sentence passed on to the next generation of creatures, what statement would contain the most information in the fewest words? I believe it is the atomic hypothesis (or atomic fact, or whatever you wish to call it) that all things are made of atoms — little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another. In that one sentence you will see an enormous amount of information about the world, if just a little imagination and thinking are applied.

The Feynman Lectures on Physics, Vol. I (1964)
Richard Feynman

Tagged , , , ,

This is not yet a scientific age

Is no one inspired by our present picture of the universe? This value of science remains unsung by singers, you are reduced to hearing not a song or poem, but an evening lecture about it.

This is not yet a scientific age.

What Do You Care What Other People Think? (1988)
Richard Feynman

Tagged , ,

Science, the knowledge of consequences

Science is the knowledge of consequences, and dependence of one fact upon another.

Leviathan or The Matter, Forme and Power of a Common Wealth Ecclesiasticall and Civil
Thomas Hobbes (1588-1679)

Tagged , ,
Follow

Get every new post delivered to your Inbox.

Join 346 other followers