Convenience encourages reproducible research!

This past year I volunteered as an artifact evaluator for ISCA’24 and ASPLOS’25. Despite the numerous lengthy guides that exist for authors and reviewers [1, 2, 3, 4, 5], I still witnessed atrocities in the evaluation process.

I hope this short note to the research community (and my future self) will shed some light on what is expected out of an artifact from the authors and what you should expect as a reviewer. There are two main takeaways I want this note to communicate:

Authors should expect very little systematic intervention from the artifact evaluator.
Reviewers should follow directions and stay blind to all else. Be a horse with blinkers.

For authors

Do not have the evaluator execute an excessive number of bash commands manually: e.g.

Run python my_latest_invention.py <some arg1>
Then run python my_latest_invention.py <some arg2>
Then run python my_latest_invention.py <some arg3>
… many commands later …
Now in some_dir/ you can observe the plot from Figure 2a.

The above should be automated with a script. It should look like this:

Run ./gen_figures.sh in the some dir/ directory. This should reproduce figures 1-5 in the figures/ directory. The data points in the figures were also logged by the bash script, and should look similar to the one below:

    Running `my_latest_invention.py <some arg1>`
    Results: Metric of interest before: <value1>
    Results: Metric of interest after: <value2>
    …
    Running `my_latest_invention.py <some arg2>`
    Results: Metric of interest before: <value3>
    Results: Metric of interest after: <value4>
    …

For reviewers

Read directions and follow them, but do not be afraid to ask for revisions. If an author provides instructions like the above, you need to request a revision for the artifact. If for any reason you feel overwhelmed by an artifact during evaluation, communicate the problem clearly to the author. Artifacts exist to enable reproducibility, not to test the patience of evaluators.

Why is convenience so important for artifact evaluation? In short, time is valuable. As a busy student reviewer, you may have hundreds of other tasks at hand. With the short deadlines for artifact evaluation, it’s very possible you may skimp on a thorough evaluation due to time consuming systematic interventions like the above. More importantly, the entire point of an artifact is to enable reproducibility of experimental results in order to encourage the community to do comparisons with future work. If evaluating an artifact feels as labor-intensive as replicating the entire study, it defeats this purpose entirely.

So in conclusion, create artifacts that are easy to evaluate for even the lazy researcher, and more importantly, your future self.

References

[1] https://sysartifacts.github.io/sosp2023/guide

[2] https://www.sigmobile.org/grav/about/artifact-guidelines

[3] https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf

[4] https://www.sigplan.org/Resources/EmpiricalEvaluation/

[5] https://www.acm.org/publications/policies/artifact-review-badging