## Come and work with me: KTP Associate in Big Social Data Analytics

Fancy working with me on a Knowledge Transfer Partnership (KTP) project in collaboration with Coup Media (funded by Innovate UK with support from the Welsh Government)?

A KTP Associate position is available to develop an adaptable social media analytics engine and associated framework for the film and media industry to capture consumer insight, marketing perceptions, sentiments, trends and rankings using big social media datasets. With the explosion of social networking, there is a clear correlation between box office takings and sentiments, opinions and perceptions expressed in the public domain on social media platforms. This project aims to leverage this by developing an extensible and adaptable social media sentiment engine using big social datasets (initially targeting Twitter) to rank movies by opinion, informing industry marketing decisions and providing commercially valuable insight into the public’s emerging movie tastes and selections.

This is an 11 month position, with a pro-rata salary of £21,000. For informal enquiries, please drop me an email: tcrick@cardiffmet.ac.uk; further information and how to apply can be found on jobs.ac.uk and the Cardiff Met website.

Deadline for applications: Friday 19 June.

## New paper: “Top Tips to Make Your Research Irreproducible”

It is an unfortunate convention of science that research should pretend to be reproducible; we have noticed (and contributed to) a number of manifestos, guides and top tips on how to make research reproducible, but we have seen very little published on how to make research irreproducible.

Irreproducibility is the default setting for all of science, and irreproducible research is particularly common across the computational sciences (for example, here and here). The study of making your work irreproducible without reviewers complaining is a much neglected area; we feel therefore that by encapsulating our top tips on irreproducibility, we will be filling a much-needed gap in the domain literature. By following our tips, you can ensure that if your work is wrong, nobody will be able to check it; if it is correct, you can make everyone else do disproportionately more work than you to build upon it. Our top tips will also help you salve the conscience of certain reviewers still bound by the fussy conventionality of reproducibility, enabling them to enthusiastically recommend acceptance of your irreproducible work. In either case you are the beneficiary.

1. Think “Big Picture”. People are interested in the science, not the experimental setup, so don’t describe it.
2. Be abstract. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details that actually make it work.
3. Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit.
4. The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with.
5. Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t work at all.

Read the full version of our high-impact paper on arXiv.

## 2015 EAPLS Board member elections

EAPLS, the European Association for Programming Languages and Systems, aims to stimulate research in the area of programming languages and systems. Formally inaugurated in 1996, it provides a forum for researchers across the domain, working with related organisations and industry to initiate scientific events and stimulate the exchange of ideas, as well as raising funds, organising conferences and divesting financial support.

I’m standing in the 2015 EAPLS Board elections (current Board members); I believe there is a significant opportunity to rejuvenate the activities of EAPLS and raise its profile: building networks for early career researchers, sponsoring new events/initiatives, engaging with the major conferences and journals in our field, encouraging improved knowledge transfer activities with industry, as well as raising the profile of the wider research areas in both UK and EU funding streams. We can also be more active in the policy space, by highlighting the educational and economic impact of the wider research areas of programming languages and systems.

You can view my full election statement; all EAPLS members (free to join) are eligible to vote, with the election open until 15 April 2015.

Tagged ,

## Paper submitted to CAV 2015: “Dear CAV, We Need to Talk About Reproducibility”

Today, me, Ben Hall (Cambridge) and Samin Ishtiaq (Microsoft Research) submitted a paper to CAV 2015, the 27th International Conference on Computer Aided Verification, to be held in San Francisco in July. CAV is dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems; the conference covers the spectrum from theoretical results to concrete applications, with an emphasis on practical verification tools and the algorithms and techniques that are needed for their implementation.

In this paper we build upon our recent work, highlighting a number of key issues relating to reproducibility and how they impact on the CAV (and wider computer science) research community, proposing a new model and workflow to encourage, enable and enforce reproducibility in future instances of CAV. We applaud the CAV Artifact Evaluation process, but we need to do more. You can download our arXiv pre-print; the abstract is as follows:

How many times have you tried to re-implement a past CAV tool paper, and failed?

Reliably reproducing published scientific discoveries has been acknowledged as a barrier to scientific progress for some time but there remains only a small subset of software available to support the specific needs of the research community (i.e. beyond generic tools such as source code repositories). In this paper we propose an infrastructure for enabling reproducibility in our community, by automating the build, unit testing and benchmarking of research software.

(also see: GitHub repo)

## The Art of Programming

The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct.

Selected Papers on Computer Science (1996)
Donald Knuth

Tagged , ,

## Solution: Create unhackable systems

Needless to say, this tweet prompted a number of subtle (and not so subtle) responses; it is just vague enough to not be 100% sure he is actually joking (because the software verification problem is trivial, right?).

Did North Korea hack Sony? I doubt it; perhaps it was from an unexpected agent.

N.B. high-profile cosmologists appear to be quite happy to make bold statements to the media on issues well outside of their expertise…

## A set of books to read in 2015

Having been shamed by Stephen Curry’s excellent posts on the books he has read the previous year for the second year on the bounce, I have decided to pick twelve books to read in 2015 that have been sitting unread in piles around my house.

Tagged , ,

## Best of 2014

Here are the most popular posts of 2014*; as always, a combination of research, CS education, programming, science policy and cathartic moaning. Most of my visitors came from the UK, with the US and Germany not far behind (168 countries in all). The busiest day of the year was 14 January, with lots of traffic to this post from last December.

Top five posts (70 this year, 271 overall):

The most common search term was once again: “feynman algorithm”; the most bizarre: “big boobs gpg image” (most likely referring to this post from 2012).

Thank you all for reading! See you back in 2015.

*also see best of: 2013, 2012, 2011

Tagged

## Christmas computational complexity

While there are alternative explanations for how the naughty/nice list is generated, hashing is important: Santa could be using a Bloom filter, in which false positive matches are possible, but false negatives are not (i.e. a query returns either “possibly in set” or “definitely not in set”, thus it has a 100% recall rate).

And while we’re on this subject, remember Santa’s delivery route represents a nested Travelling Salesman Problem, compounded by the naughty/nice list changing every year…

(Merry Christmas…and watch out!)

## The many Rs of e-Research

The 6 12 many Rs of e-Research…what else could/should we add to this (especially in the context of research objects and supporting reproducible research)?

Tagged , ,

## Reproducibility-as-a-service: can the cloud make it real?

Kenji Takeda, Solutions Architect and Technical Manager with Microsoft Research, has written a blog post on Recomputability 2014, as well as discussing some of the issues (and potential opportunities) for reproducibility in computational science we have outlined in our joint paper (including a quote from me):

This is an exciting area of research and one that could have a profound impact on the way that computational science is performed. By rethinking how we develop, use, benchmark, and share algorithms, software, and models, alongside the development of integrated and automated e-infrastructure to support recomputability and reproducibility, we will be able to improve the efficiency of scientific exploration as well as promoting open and verifiable scientific research.

Read Kenji’s full post on the Microsoft Research Connections Blog.

## It’s impossible to conduct research without software

No one knows how much software is used in research. Look around any lab and you’ll see software — both standard and bespoke — being used by all disciplines and seniorities of researchers. Software is clearly fundamental to research, but we can’t prove this without evidence. And this lack of evidence is the reason why we ran a survey of researchers at 15 Russell Group universities to find out about their software use and background.

The Software Sustainability Institute‘s recent survey of researchers at research-intensive UK universities is out. Headlines figures:

• 92% of academics use research software;
• 69% say that their research would not be practical without it;
• 56% develop their own software (worryingly, 21% have no training in software development);
• 70% of male researchers develop their own software, and only 30% of female researchers do.

For the full story, see the SSI blog post; the survey results described there are based on the responses of 417 researchers selected at random from 15 Russell Group universities, with good representation from across the disciplines, genders and career grades. It represents a statistically significant number of responses that can be used to represent, at the very least, the views of people in research-intensive universities in the UK (the data collected from the survey is available for download and is licensed under a Creative Commons by Attribution licence).

(you may also like to sign this petition and join the UK Community of Research Software Engineers)

## Accepted papers and programme for Recomputability 2014

I am co-chairing Recomputability 2014 next week, an affiliated workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). The final workshop programme is now available and it will take place on Thursday 11 December in the Hobart Room at the Hilton London Paddington hotel.

I will also be presenting our paper on sharing and publishing scientific models (arXiv), as well as chairing a panel session on the next steps for recomputability and reproducibility; I look forward to sharing some of the outcomes of this workshop over the next few weeks.

The workshop Twitter hashtag is #recomp14; you can also follow the workshop co-chairs: @DrTomCrick and @npch, as well as the main UCC account: @UCC2014_London.

## Warning: May Contain Advanced Math[s]

(read the full NYT review here)

## Toys for girls and boys

This week an image has been doing the rounds on Twitter, showing a letter to parents printed on the back of a pamphlet from a LEGO set:

Originally posted on reddit, it unsurprisingly went viral but with many questioning its authenticity. However, it has been confirmed as genuine by LEGO UK:

The text is from 1974 and was a part of a pamphlet showing a variety of Lego doll house products targeted girls aged 4 and up. It remains relevant to this day — our focus has always been, and remains to bring creative play experiences to all children in the world…ultimately enabling children to build and create whatever they can imagine.

Don’t forget, you can use this helpful guide to check to see if a toy is for boys or girls.

Tagged , , ,

## Some very Pointless maths

If you enjoy mathematics as well as the BBC quiz series Pointless, hosted by Alexander Armstrong and Richard Osman, then you will most definitely enjoy the following blog post by Mathistopheles:

This article is based on a gloriously irrelevant mathematical sequence that is derived (rather appropriately) from the episodes of the television show Pointless. It is the sort of idea that has me scribbling calculations on the back of envelopes for hours on end, despite there being absolutely no hope of an outcome that could in any way justify this investment of effort. In this first part, I introduce the sequence and explain how it is related to some well-known mathematical objects called Markov chains. In the vain hope that I might convey my enthusiasm for this topic to others, I have tried to write this piece in a fairly accessible way. Almost no mathematical knowledge is assumed, beyond a rough idea of what probability is.

It provides a clear and accessible analysis of how the quiz show works by using directed graphs, matrices and Markov chains: read the full post here.

## Computing research

There is nothing to do with computers that merits a PhD.

Max Newman (1897-1984), as quoted in Alan Turing: The Enigma by Andrew Hodges

## Roots of integers

An integer is either a perfect square or its square root is irrational. Essentially: when you compute the square root of an integer, there are either no figures to the right of the decimal or there are an infinite number of figures to right of the decimal and they don’t repeat. There’s no middle ground — you can’t hope, for example, that the decimal expansion might stop or repeat after a hundred or so terms.

The proof of this theorem is surprisingly simple, not much harder than the familiar proof that the square root of 2 is irrational.

Suppose $\tfrac{a}{b}$ is a fraction in lowest terms, i.e. $a$ and $b$ are co-prime (i.e. their gcd is 1), and $\tfrac{a}{b}$ is a solution to $xn = c$ where $n > 0$ is an integer and $c$ is an integer. Then:

$(\dfrac{a}{b})^n = \dfrac{a^n}{b^n} = c$

and so:

$\dfrac{a^n}{b} = c b^{n-1}$

Now the right side of the equation above is an integer, so the left side must be an integer as well. But $b$ is relatively prime to $a$, and so $b$ is relatively prime to $a^n$. The only way $\tfrac{a^n}{b}$ could be an integer is for $b$ to equal 1 or -1. And so $\tfrac{a}{b}$ must be an integer.

Another way to get the same result is to assume $\tfrac{a}{b}$ is an irreducible fraction and is not an integer (i.e. $b \neq 1$), and consider $(\tfrac{a}{b})^n$. Clearly $a^n$ and $b^n$ are co-prime and the denominator $b^n \neq 1$, so $\tfrac{a^n}{b^n}$ is not an integer.

So what we said about square roots extends to cube roots and in fact to all integer roots (for example, the fifth root of an integer is either an integer or an irrational number). In other words: no (non-integer) fraction, when raised to a power, can produce an integer.

(reblogged from John D. Cook’s blog)

Tagged ,

## A rational animal

Man is a rational animal — so at least I have been told. Throughout a long life, I have looked diligently for evidence in favour of this statement, but so far I have not had the good fortune to come across it, though I have searched in many countries spread over three continents.

Unpopular Essays (1950)
Bertrand Russell (1872-1970)