Bibliography extraction process
Lassi Kortela 08 Mar 2020 17:27 UTC
Here's the exact process I follow:
- Search for the title of the paper (e.g. "A few principles of macro
design") on <https://scholar.google.com/>. Almost always the first
search result is the right one; if not, the second one is, or add author
names to the search query.
- Click on big quote mark below search result; click on BibTeX in the
popup window.
- Copy-paste the BibTeX code into an empty Emacs buffer.
- Follow the [PDF] link beside the search result to get the PDF. While
the PDF is loading, copy-paste its URL into the same Emacs buffer.
- Save an on-disk copy of the PDF file just in case.
- Copy-paste the abstract from the PDF into the buffer as well. If that
doesn't work, screenshot the abstract and use <https://newocr.com/> to
scan it. The OCR works extremely well on typeset papers, even dodgy ones
from the 1980s.
- Run `M-x shell-command-on-region tools/bibtex2lose.scm` in Emacs to
convert the Emacs buffer to LOSE format. bibtex2lose is a small Gauche
script that can do almost everything automatically; at this point, only
very minor corrections are necessary sometimes.
That does it for most papers.
Sometimes the PDF is hard to find; checking the authors' homepages is
often helpful, and can even find PDFs for papers that ACM has paywalled.