colrev prep¶
In the colrev prep
operation, records with sufficient metadata quality transition from md_imported
to md_prepared
(md_needs_manual_preparation
otherwise). The benefit of separating high and low-quality metadata is that efforts to fix metadata can be allocated more precisely, which is important for duplicate identification and for ensuring high-quality sample metadata as well as reference sections.
Quality rules:
Completeness of fields based on rules and external sources, e.g., a journal article requires author, title, year, journal, volume (and issue) fields
Completeness of field values, e.g., author fields should not end with “and others”, journal fields should not end with “…”
Consistency between fields, e.g., inproceedings records cannot have a journal field
Format consistency, e.g., fields should not be capitalized, author fields should be formatted correctly, DOIs should follow a predefined pattern
Consistency between metadata associated with ids, e.g., metadata associated with the DOI should be in line with the metadata associated displayed on the website (linked in the URL field)
Preparation procedures (the specific preparation depends on the specified settings, they typically consist of steps like the following):
General rules, such as resolving BiBTeX cross-references, formatting DOI fields, and determining the language of records
SearchSource-specific rules to fix quality defects, such as incorrect use of field names (without affecting other SearchSources)
Linking and update based on high-quality metadata-sources, i.e., retrieve DOI identifier and metadata from online repositories (e.g., crossref, semantic scholar, DBLP, open library)
Linking and update based on with CoLRev curations, which establishes a quality curation loop
Automated prescreen exclusion of retracted records, complementary materials (such as “About our authors” or “Editorial board”), or records using non-latin alphabets
Note. When records are linked and updated based on SearchSources in the prep
operation, corresponding metadata will be stored in additional metadata SearchSources <search sources> (with md_*
prefix).
Such metadata SearchSources <search sources> are also updated in the search. They do not retrieve additional records and they are excluded from statistics such as those displayed in the colrev status
or PRISMA flow charts.
Before starting the colrev prep-man
operation, it is recommended to check the most common quality defects and to consider implementing preparation rules to fix these defects automatically (after rerunning prep
).
colrev prep [options]
In addition, colrev prep-man
provides convenience functions to prepare records manually (addressing the quality defects listed for each field).
Users can decide to set the colrev_status field to md_prepared and override existing quality defect codes (which may be false positives). The colrev_status field is not changed in the following operations unless new quality defect codes are discovered and added (e.g., in colrev prep –polish).
colrev pdf-prep-man [options]
The following options for prep
are available:
Identifier |
Preparation packages |
Status |
---|---|---|
colrev.add_journal_ranking |
Add Journal ranking information (instructions) |
|
colrev.colrev_curation |
CoLRev Curations (instructions) |
|
colrev.crossref |
Crossref API (instructions) |
|
colrev.dblp |
DBLP API (instructions) |
|
colrev.europe_pmc |
Europe PMC (instructions) |
|
colrev.exclude_collections |
Exclude collections (instructions) |
|
colrev.exclude_complementary_materials |
Exclude Complementary Materials (instructions) |
|
colrev.exclude_languages |
Exclude Languages (instructions) |
|
colrev.exclude_non_latin_alphabets |
Exclude Non-Latin Alphabets (instructions) |
|
colrev.general_polish |
General Polish (instructions) |
|
colrev.get_doi_from_urls |
Get DOI from URLs (instructions) |
|
colrev.get_masterdata_from_citeas |
Get Masterdata from CiteAs (instructions) |
|
colrev.get_masterdata_from_doi |
Get Masterdata from DOI (instructions) |
|
colrev.get_year_from_vol_iss_jour |
Year-Vol-Iss Prep (instructions) |
|
colrev.github |
GitHub API (instructions) |
|
colrev.local_index |
LocalIndex (instructions) |
|
colrev.open_alex |
OpenAlex API (instructions) |
|
colrev.open_library |
OpenLibrary API (instructions) |
|
colrev.pubmed |
Pubmed (instructions) |
|
colrev.remove_broken_ids |
Remove Broken IDs (instructions) |
|
colrev.remove_urls_with_500_errors |
Remove URLs with 500 Errors (instructions) |
|
colrev.semanticscholar |
Semantic Scholar API (instructions) |
|
colrev.source_specific_prep |
Source-specific Prep (instructions) |
The following options for prep-man
are available:
Identifier |
Manual preparation packages |
Status |
---|---|---|
colrev.export_man_prep |
Export Man Prep (instructions) |
|
colrev.prep_man_curation_jupyter |
Prep-man Jupyter Notebook (instructions) |