The NIPS Consistency Experiment — My Experience | Tangentially / A Machine Learning Blog

$\newcommand{\lyxlock}{}$

Warning: MathJax requires JavaScript to correctly process the mathematics on this page. Please enable JavaScript on your browser.

This year the NIPS (Neural Information Processing Systems) conference organisers decided to run an experiment on the consistency of paper reviews. They selected 10% of papers to be reviewed twice. Different area chairs and a different set of 3 reviewers were chosen for those papers.

Luckily for me, my paper with Francis Bach and Simon Lacoste-Julien was one of those 10%. My paper was initially submitted as Paper ID 867. They essentially created a duplicated paper id for me, #1860, which contained the second set of reviews.

This duplication of reviews was particularly interesting in my case. There was a very large discrepancy between the two sets of reviews. I won’t know if this is representative of the consistency of other reviews until NIPS releases the statistics from there experiment.

For reference, the two sets of reviews gave the following scores, before rebuttal:

Set 1 review 1: Quality 9, impact 2 (high) , confidence 4 (confident but not certain)

Set 1 review 2: Quality 6, impact 1 (incremental), confidence 3 (fairly confident)

Set 1 review 3: Quality 6, impact 1 (incremental), confidence 5 (Absolutely certain)

Set 2 review 1: Quality 5, impact 1 (incremental), confidence 5 (Absolutely certain)

Set 2 review 2: Quality 3, impact 1 (incremental), confidence 4 (Confident)

Set 2 review 3: Quality 6, impact 1 (incremental), confidence 5 (Absolutely certain)

Generally for NIPS a 9/6/6 in quality gives a high change of acceptance, where as a 5/3/6 is a certain non-acceptance. So one set of reviews was a clear accept and the second a clear reject! The meta reviews were as follows:

The paper introduces a new incremental gradient method that allows adaptation to the level of convexity in the input. The paper has a nice discussion of related methods, and it has a simpler proof that will be of interest to researchers. Recommendation: Accept.

Unfortunately, the scores are too low for acceptance to NIPS, and none of the reviewers were willing to argue for acceptance of the paper. The reviewers discussed the paper after the author rebuttal, and all reviewers ultimately felt that the paper could use some additional polish before publishing. Please do keep in mind the various criticisms of the reviewers when submitting to another venue.

The paper we submitted was fairly rough in its initial state, and the reviewers suggested lots of improvements. Particularly the Set 2/review 1, which was the most in depth review. I generally agree with the second meta-review in that the paper needed addition polish, which we have done for the camera ready.

In the end the paper was accepted. I suspect most papers with this kind of accept/reject split would be accepted, as it would just seem unfair if it were not.

The issue of consistency in paper reviews is clear to any body who has ever resubmitted a rejected paper to a different venue. It feels like luck of a draw to a degree. There is no easy solutions to this, so I’ll be interested to see if NIPS changes there process in future years, and what changes they make.

2 thoughts on “The NIPS Consistency Experiment — My Experience”

Pingback: Blogs on the NIPS Experiment | Inverse Probability
Charles Twardy says:

December 3, 2014 at 2:59 am

Nice post, and thanks for forecasting on this at SciCast. https://scicast.org/#!/questions/1083/comments/safe

Tangentially / A Machine Learning Blog

The NIPS Consistency Experiment — My Experience

2 thoughts on “The NIPS Consistency Experiment — My Experience”

Leave a Reply Cancel reply

Aaron Defazio, Machine Learning Researcher & Data Scientist