Posts Tagged ‘Software’
In a paper published last week in PLoS Computational Biology, Sandve, Nekrutenko, Taylor and Hovig highlight the issue of replication across the computational sciences. The dependence on software libraries, APIs and toolchains, coupled with massive amounts of data, interdisciplinary approaches and the increasing complexity of the questions being asked are complicating replication efforts.
To address this, they present ten simple rules for reproducibility of computational research:
Rule 1: For Every Result, Keep Track of How It Was Produced
Rule 2: Avoid Manual Data Manipulation Steps
Rule 3: Archive the Exact Versions of All External Programs Used
Rule 4: Version Control All Custom Scripts
Rule 5: Record All Intermediate Results, When Possible in Standardized Formats
Rule 6: For Analyses That Include Randomness, Note Underlying Random Seeds
Rule 7: Always Store Raw Data behind Plots
Rule 8: Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
Rule 9: Connect Textual Statements to Underlying Results
Rule 10: Provide Public Access to Scripts, Runs, and Results
The rationale underpinning these rules clearly resonates with the work of the Software Sustainability Institute: better science through superior software. Based at the universities of Edinburgh, Manchester, Oxford and Southampton, it is a national facility for cultivating world-class research through software (for example, Software Carpentry). An article that caught my eye in July was the Recomputation Manifesto: computational experiments should be recomputable for all time. In light of the wider open data and open science agenda, should we also be thinking about open software and open computation?
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.
Tony Hoare (speaking at QCon London 2009)
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.