Email list hosting service & mailing list manager

Bibliography extraction process Lassi Kortela (08 Mar 2020 17:27 UTC)

Bibliography extraction process Lassi Kortela 08 Mar 2020 17:27 UTC

Here's the exact process I follow:

- Search for the title of the paper (e.g. "A few principles of macro
design") on <https://scholar.google.com/>. Almost always the first
search result is the right one; if not, the second one is, or add author
names to the search query.

- Click on big quote mark below search result; click on BibTeX in the
popup window.

- Copy-paste the BibTeX code into an empty Emacs buffer.

- Follow the [PDF] link beside the search result to get the PDF. While
the PDF is loading, copy-paste its URL into the same Emacs buffer.

- Save an on-disk copy of the PDF file just in case.

- Copy-paste the abstract from the PDF into the buffer as well. If that
doesn't work, screenshot the abstract and use <https://newocr.com/> to
scan it. The OCR works extremely well on typeset papers, even dodgy ones
from the 1980s.

- Run `M-x shell-command-on-region tools/bibtex2lose.scm` in Emacs to
convert the Emacs buffer to LOSE format. bibtex2lose is a small Gauche
script that can do almost everything automatically; at this point, only
very minor corrections are necessary sometimes.

That does it for most papers.

Sometimes the PDF is hard to find; checking the authors' homepages is
often helpful, and can even find PDFs for papers that ACM has paywalled.