This AI Researcher Is Trying To Ward Off a Reproducibility Crisis
Joelle Pineau doesn’t want science’s reproducibility crisis to come to artificial intelligence (AI).
Spurred by her frustration with difficulties recreating results from other research teams, Pineau, a machine-learning scientist at McGill University and Facebook in Montreal, Canada, is now spearheading a movement to get AI researchers to open up their methods and code to scrutiny.
Alongside Koustuv Sinha, a PhD student at McGill, Pineau holds one of two new roles dedicated to reproducibility on the organizing committee for the Conference on Neural Information Processing Systems (NeurIPS), a major meeting for AI that this year attracted some 13,000 researchers. Ahead of this year’s conference in Vancouver, Canada, from 8 to 14 December, the committee asked scientists to provide their code and fill in a checklist of methodological details for each paper submitted. They also ran a competition that challenged researchers to recreate each other’s work.
Pineau spoke to Nature about the measures and how they’ve gone down with the community.
It’s easy to imagine why scientific studies of the natural world might be hard to reproduce. But why are some algorithms irreproducible?
It’s true that with code, you press start and, for the most part, it should do the same thing every time. The challenge can be trying to reproduce a precise set of instructions in machine code from a paper. And then there’s the issue that papers don’t always give all the detail, or give misleading detail. Sometimes it’s unintentional and perhaps sometimes it’s towards making the results look more favourable. That’s a big issue.
What got you interested in reproducibility?
I fell into reproducibility by accident. Over and over again my students would say ‘I can’t get these results,’ or they found that, to get the results, they had to do things that I didn’t think were correct, methodologically. So for me it was important to stop it before it becomes the norm. It’s also very timely for the wider community because there are a lot of people flooding in to the field and it’s important to establish what the methodological norms are.
What’s an example of such a practice?
In reinforcement learning, for example, if you do two runs of some algorithms with different initial random settings, you can get very different results. And if you do a lot of runs, you’re able to report only the best ones. Results from the people with more computing power to do more runs will look better. Papers don’t always say how many runs were performed. But it makes a big difference to the conclusions you draw.
Don't miss out on the latest technology delivered to your email monthly. Sign up for the Data Science and Digital Engineering newsletter. If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.
19 May 2020