colrev pdf-prepΒΆ
In the colrev pdf-prep
operation, records transition from pdf_imported
to pdf_prepared
or pdf_needs_manual_preparation
.
Depending on the settings, this operation may involve any of the following:
Check whether the PDF is machine readable and apply OCR if necessary
Identify and remove additional pages and decorations (may interfere with machine learning tools)
Validate whether the PDF matches the record metadata and whether the PDF is complete (matches the number of pages)
Create unique PDF identifiers (PDF hashes) that can be used for retrieval and validation (e.g., in crowdsourcing)
Per default, CoLRev keeps a backup of PDFs that are changed by the pdf-prep
operation. The keep_backup_of_pdfs
option of the pdf_prep
settings can be modified to change this behavior:
colrev pdf-prep [options]
The following options for pdf-prep
are available:
Identifier |
Description |
Status |
---|---|---|
colrev.grobid_tei |
GROBID TEI (instructions) |
|
colrev.ocrmypdf |
OCRMyPDF (instructions) |
|
colrev.remove_coverpage |
Remove Cover Page (instructions) |
|
colrev.remove_last_page |
Remove Last Page (instructions) |
The colrev pdf-prep-man
operation provides an interactive convenience function for PDFs that cannot be prepared automatically, with records transitioning from pdf_needs_manual_preparation
to pdf_prepared
.
colrev pdf-prep-man [options]
The following options for pdf-prep-man
are available:
Identifier |
Description |
Status |
---|---|---|
colrev.colrev_cli_pdf_prep_man |
Prep PDFs manually (CLI) (instructions) |