colrev pdf-prep
In the colrev pdf-prep operation, records transition from pdf_imported to pdf_prepared or pdf_needs_manual_preparation.
Depending on the settings, this operation may involve any of the following:
- Check whether the PDF is machine readable and apply OCR if necessary 
- Identify and remove additional pages and decorations (may interfere with machine learning tools) 
- Validate whether the PDF matches the record metadata and whether the PDF is complete (matches the number of pages) 
- Create unique PDF identifiers (PDF hashes) that can be used for retrieval and validation (e.g., in crowdsourcing) 
Per default, CoLRev keeps a backup of PDFs that are changed by the pdf-prep operation. The keep_backup_of_pdfs option of the pdf_prep settings can be modified to change this behavior:
colrev pdf-prep [options]
The following options for pdf-prep are available:
| Identifier | Description | Status | 
|---|---|---|
| colrev.grobid_tei | GROBID TEI (instructions) | |
| colrev.ocrmypdf | OCRMyPDF (instructions) | |
| colrev.remove_coverpage | Remove Cover Page (instructions) | |
| colrev.remove_last_page | Remove Last Page (instructions) | 
The colrev pdf-prep-man operation provides an interactive convenience function for PDFs that cannot be prepared automatically, with records transitioning from pdf_needs_manual_preparation to pdf_prepared.
colrev pdf-prep-man [options]
The following options for pdf-prep-man are available:
| Identifier | Description | Status | 
|---|---|---|
| colrev.colrev_cli_pdf_prep_man | Prep PDFs manually (CLI) (instructions) |