colrev prep

In the colrev prep operation, records with sufficient metadata quality transition from md_imported to md_prepared (md_needs_manual_preparation otherwise). The benefit of separating high and low-quality metadata is that efforts to fix metadata can be allocated more precisely, which is important for duplicate identification and for ensuring high-quality sample metadata as well as reference sections.

Quality rules:

  • Completeness of fields based on rules and external sources, e.g., a journal article requires author, title, year, journal, volume (and issue) fields

  • Completeness of field values, e.g., author fields should not end with “and others”, journal fields should not end with “…”

  • Consistency between fields, e.g., inproceedings records cannot have a journal field

  • Format consistency, e.g., fields should not be capitalized, author fields should be formatted correctly, DOIs should follow a predefined pattern

  • Consistency between metadata associated with ids, e.g., metadata associated with the DOI should be in line with the metadata associated displayed on the website (linked in the URL field)

Preparation procedures (the specific preparation depends on the specified settings, they typically consist of steps like the following):

  • General rules, such as resolving BiBTeX cross-references, formatting DOI fields, and determining the language of records

  • SearchSource-specific rules to fix quality defects, such as incorrect use of field names (without affecting other SearchSources)

  • Linking and update based on high-quality metadata-sources, i.e., retrieve DOI identifier and metadata from online repositories (e.g., crossref, semantic scholar, DBLP, open library)

  • Linking and update based on with CoLRev curations, which establishes a quality curation loop

  • Automated prescreen exclusion of retracted records, complementary materials (such as “About our authors” or “Editorial board”), or records using non-latin alphabets

Note. When records are linked and updated based on SearchSources in the prep operation, corresponding metadata will be stored in additional metadata SearchSources <search sources> (with md_* prefix). Such metadata SearchSources <search sources> are also updated in the search. They do not retrieve additional records and they are excluded from statistics such as those displayed in the colrev status or PRISMA flow charts.

Before starting the colrev prep-man operation, it is recommended to check the most common quality defects and to consider implementing preparation rules to fix these defects automatically (after rerunning prep).

colrev prep [options]

In addition, colrev prep-man provides convenience functions to prepare records manually (addressing the quality defects listed for each field).

Users can decide to set the colrev_status field to md_prepared and override existing quality defect codes (which may be false positives). The colrev_status field is not changed in the following operations unless new quality defect codes are discovered and added (e.g., in colrev prep –polish).

colrev pdf-prep-man [options]

The following options for prep are available:

Identifier

Preparation packages

Status

colrev.add_journal_ranking

Add Journal ranking information (instructions)

EXPERIMENTAL

colrev.colrev_curation

CoLRev Curations (instructions)

MATURING

colrev.crossref

Crossref API (instructions)

MATURING

colrev.dblp

DBLP API (instructions)

MATURING

colrev.europe_pmc

Europe PMC (instructions)

MATURING

colrev.exclude_collections

Exclude collections (instructions)

MATURING

colrev.exclude_complementary_materials

Exclude Complementary Materials (instructions)

MATURING

colrev.exclude_languages

Exclude Languages (instructions)

MATURING

colrev.exclude_non_latin_alphabets

Exclude Non-Latin Alphabets (instructions)

MATURING

colrev.general_polish

General Polish (instructions)

EXPERIMENTAL

colrev.get_doi_from_urls

Get DOI from URLs (instructions)

EXPERIMENTAL

colrev.get_masterdata_from_citeas

Get Masterdata from CiteAs (instructions)

EXPERIMENTAL

colrev.get_masterdata_from_doi

Get Masterdata from DOI (instructions)

EXPERIMENTAL

colrev.get_year_from_vol_iss_jour

Year-Vol-Iss Prep (instructions)

EXPERIMENTAL

colrev.github

GitHub API (instructions)

MATURING

colrev.local_index

LocalIndex (instructions)

MATURING

colrev.open_alex

OpenAlex API (instructions)

EXPERIMENTAL

colrev.open_library

OpenLibrary API (instructions)

EXPERIMENTAL

colrev.pubmed

Pubmed (instructions)

MATURING

colrev.remove_broken_ids

Remove Broken IDs (instructions)

MATURING

colrev.remove_urls_with_500_errors

Remove URLs with 500 Errors (instructions)

MATURING

colrev.semanticscholar

Semantic Scholar API (instructions)

EXPERIMENTAL

colrev.source_specific_prep

Source-specific Prep (instructions)

MATURING

The following options for prep-man are available:

Identifier

Manual preparation packages

Status

colrev.export_man_prep

Export Man Prep (instructions)

MATURING

colrev.prep_man_curation_jupyter

Prep-man Jupyter Notebook (instructions)

EXPERIMENTAL