Metadata quality model

The quality model specifies the necessary checks when a records should transition to md_prepared. The functionality fixing errors is organized in the prep package endpoints.

Similar to linters such as pylint, it should be possible to disable selected checks. Failed checks are made transparent by adding the corresponding codes (e.g., mostly-upper) to the colrev_masterdata_provenance (notes field).

Table of contents

Format

Completeness

Within-record consistency

Origin consistency

Common defects

Format

mostly-all-caps

Fields should not contain mostly upper case letters.

Problematic value

title = {AN EMPIRICAL STUDY OF PLATFORM EXIT}

Correct value

title = {An empirical study of platform exit}

Fields checked

author

title

editor

journal

booktitle


html-tags

Fields should not contain HTML tags.

Problematic value

title = {A commentary on <i>microsourcing</i>}

Correct value

title = {A commentary on microsourcing}

Note: abstracts are not checked and may contain html tags.

Fields checked

title

journal

booktitle

author

publisher

editor


name-format-titles

Names should not contain titles, such as “MD”, “Dr”, “PhD”, “Prof”, or “Dipl Ing”.

Problematic value

@phdthesis{Smith2022,
    ...
    author = {Prof. Smith, M. PhD.},
    ...
}

Correct value

@phdthesis{Smith2022,
    ...
    author = {Smith, M.},
    ...
}

Fields checked

author

editor


name-format-separators

Names should be correctly separated.

Problematic value

author = {Smith, W.; Thompson, U.}

Correct value

author = {Smith, W. and Thompson, U.}
  • Author names are separated by ” and “.

  • Must contain at least two capital letters, and all should be letters

  • Should be separated by ,

  • Must be longer than 5

Fields checked

author

editor


name-particles

Name particles should be formatted correctly and protected.

Problematic value

author = {Brocke, Jan vom}

Correct value

author = {{vom Brocke}, Jan}

Fields checked

author

editor

Links


year-format

year should be full year.

Problematic value

year = {2023-01-03}

Correct value

year = {2023}

Fields checked

year


doi-not-matching-pattern

The doi field should follow a predefined pattern. It does not start with http… and is in upper case.

Problematic value

doi = {https://doi.org/10.1016/j.jsis. 2021.101694}

Correct value

doi = {10.1016/j.jsis.2021.101694}

Fields checked

doi

Links


isbn-not-matching-pattern

ISBN should be valid.

Problematic value

isbn = {978316}

Correct value

isbn = {978-3-16-148410-0}

TODO : ISBN-10/ISBN13, how multiple ISBNs are stored

Fields checked

ibn


pubmedid_not_matching_pattern

Pubmed IDs should be formatted correctly (7 or 8 digits).

Problematic value

colrev.pubmed.pubmedid = {PMID: 1498274774},

Correct value

colrev.pubmed.pubmedid = {33044175},

Fields checked

colrev.pubmed.pubmedid


language-format-error

The ISO 639-3 language code should be valid.

Problematic value

language = {en}

Correct value

language = {eng}

Fields checked

language

See language_service.


language-unknown

Record should contain a ISO 639-3 language code.

Problematic value

language = {American English}

Correct value

language = {eng}

Fields checked

language

See language_service.

Completeness

missing-field

Records should contain all required fields for the respective ENTRYTYPE.

Problematic value

@article{Webster2002,
    title = {Analyzing the past to prepare for the future: Writing a literature review},
    author = {Webster, Jane and Watson, Richard T},
    journal = {MIS quarterly},
}

Correct value

@article{Webster2002,
    title = {Analyzing the past to prepare for the future: Writing a literature review},
    author = {Webster, Jane and Watson, Richard T},
    journal = {MIS quarterly},
    volume = {26},
    number = {2},
    pages = {xiii-xxiii},
}

See: inconsistent-field

ENTRYTYPE

Required fields

article

author, title, journal, year, volume, number

inproceedings

author, title, booktitle, year

incollection

author, title, booktitle, publisher, year

inbook

author, title, chapter, publisher, year

proceedings

booktitle, editor, year

conference

booktitle, editor, year

book

author, title, publisher, year

phdthesis

author, title, school, year

bachelorthesis

author, title, school, year

thesis

author, title, school, year

masterthesis

author, title, school, year

techreport

author, title, institution, year

unpublished

title, author, year

misc

author, title, year

software

author, title, url

online

author, title, url

other

author, title, year


incomplete-field

Fields should be complete. Fields considered incomplete (truncated) if they have ... at the end.

Problematic value

title = {A commentary on ...}

Correct value

title = {A commentary on microsourcing}

Fields checked

title

journal

booktitle

author

abstract


container-title-abbreviated

Containers should not be abbreviated.

Problematic value

journal = {MISQ}

Correct value

year = {MIS Quarterly}

Container are considers abbreviated if it is less than 6 characters and all upper case.

Fields checked

journal

booktitle


name-abbreviated

Names should not be abbreviated

Problematic value

author = {Smith, W. et. al.}

Correct value

author = {Smith, W. and Thompson, U.}

Fields checked

author

editor

Within-record consistency

inconsistent-with-entrytype

Some fields are inconsistent with the respective ENTRYTYPE.

Problematic value

@article{SmithParkerWeber2003,
    ...
    booktitle = {First Workshop on ...},
    ...
}

Correct value

@inproceedings{SmithParkerWeber2003,
    ...
    booktitle = {First Workshop on ...},
    ...
}

ENTRYTYPE

inconsistent fields

article

booktitle

inproceedings

issue,number,journal

incollection

inbook

journal

book

volume,issue,number,journal

phdthesis

volume,issue,number,journal,booktitle

masterthesis

volume,issue,number,journal,booktitle

techreport

volume,issue,number,journal,booktitle

unpublished

volume,issue,number,journal,booktitle

online

journal,booktitle

misc

journal,booktitle


thesis-with-multiple-authors

Thesis ENTRYTYPE should not contain multiple authors.

Problematic value

@phdthesis{SmithParkerWeber2003,
    ...
    author = {Smith, M. and Parker, S. and Weber, R.},
    ...
}

Correct value

@phdthesis{Smith2003,
    ...
    author = {Smith, M.},
    ...
}

Fields checked

author [if ENTRYTPYE in thesis|phdthesis|mastertsthesis]


page-range

Page range should be valid, i.e., the first page should be lower than the last page if the pages are numerical.

Problematic value

pages = {11--9}

Correct value

pages = {11--19}

Fields checked

pages


identical-values-between-title-and-container

Title and containers (booktitle, journal) should not contain identical values.

Problematic value

title = {MIS Quarterly},
journal = {MIS Quarterly},

Correct value

title = {A commentary on microsourcing}
journal = {MIS Quarterly},

inconsistent-content

Fields should not contain inconsistent values,

  • Journal should not be from conference or workshop,

  • booktitle should not belong to journal

Problematic value

journal = {Proceedings of the 32nd Conference on ...}

Correct value

booktitle = {Proceedings of the 32nd Conference on ...}

Fields checked

Erroneous values

journal

conference, workshop

booktitle

journal

Origin consistency

inconsistent-with-doi-metadata

Record content needs to be consistent with doi metadata.

Problematic value

@article{wagner2021exploring,
    title = {Analyzing the past to prepare for the future: Writing a literature review},
    author = {Webster, Jane and Watson, Richard T},
    journal = {MIS quarterly},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    doi = {10.1016/j.jsis.2021.101694}
}

# metadat at crossref:
# https://api.crossref.org/works/10.1016/j.jsis.2021.101694

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    doi = {10.1016/j.jsis.2021.101694}
}

Correct value

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    doi = {10.1016/j.jsis.2021.101694}
}

# metadat at crossref:
# https://api.crossref.org/works/10.1016/j.jsis.2021.101694

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    doi = {10.1016/j.jsis.2021.101694}
}

Fields checked

title

journal

author


inconsistent-with-url-metadata

Checks url metadata should be consistent with Zotero generated metadata about the url.

Problematic value

@article{wagner2021exploring,
    title = {Analyzing the past to prepare for the future: Writing a literature review},
    author = {Webster, Jane and Watson, Richard T},
    journal = {MIS quarterly},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S096386872100041X}
}

# metadat from the url:

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S096386872100041X}
}

Correct value

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S096386872100041X}
}

# metadat from the url:

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
    url = {https://www.sciencedirect.com/science/article/abs/pii/S096386872100041X}
}

Fields checked

author

title

year

journal

volume

number


record-not-in-toc

The record should be found in the relevant table-of-content (toc) if a toc is available.

Problematic value

@article{wagner2021exploring,
    title = {A breakthrough paper on microsouring},
    author = {Wagner, Gerit},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    year = {2021},
}

# Table-of-contents (based on crossref):
# The Journal of Strategic Information Systems, 30-4

Gable, G. and Chan, Y. - Welcome to this 4th issue of Volume 30 of The Journal of Strategic Information Systems
Mamonov, S. and Peterson, R. - The role of IT in organizational innovation – A systematic literature review
Eismann, K. and Posegga, O. and Fischbach, K. - Opening organizational learning in crisis management: On the affordances of social media
Dhillon, G. and Smith, K. and Dissanayaka, I. - Information systems security research agenda: Exploring the gap between research and practice
Wagner, G. and Prester, J. and Pare, G. - Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research
Hund, A. and Wagner, H. T. and Beimborn, D. and Weitzel, T. - Digital innovation: Review and novel perspective

Correct value

@article{wagner2021exploring,
    title = {Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research},
    author = {Wagner, Gerit and Prester, Julian and Paré, Guy},
    journal = {The Journal of Strategic Information Systems},
    volume = {30},
    number = {4},
    pages = {101694},
    year = {2021},
}

# Table-of-contents (based on crossref):
# The Journal of Strategic Information Systems, 30-4

Gable, G. and Chan, Y. - Welcome to this 4th issue of Volume 30 of The Journal of Strategic Information Systems
Mamonov, S. and Peterson, R. - The role of IT in organizational innovation – A systematic literature review
Eismann, K. and Posegga, O. and Fischbach, K. - Opening organizational learning in crisis management: On the affordances of social media
Dhillon, G. and Smith, K. and Dissanayaka, I. - Information systems security research agenda: Exploring the gap between research and practice
Wagner, G. and Prester, J. and Pare, G. - Exploring the boundaries and processes of digital platforms for knowledge work: A review of information systems research
Hund, A. and Wagner, H. T. and Beimborn, D. and Weitzel, T. - Digital innovation: Review and novel perspective

Common defects

erroneous-symbol-in-field

Fields should not contains invalid symbols.

Problematic value

author = {M�ller, U.}

Correct value

author = {Müller, U.}

Symbols considered erroneous: “�”, “™”

Fields checked

author

title

editor

journal

booktitle


erroneous-term-in-field

Fields should not contain any erroneous terms.

Problematic value

author = {Smith, F. orcid-0012393}

Correct value

author = {Smith, F.}

field

Erroneous terms

author

http, University, orcid, student, Harvard, Conference, Mrs, Hochschule

title

research paper, completed research, research in progress, full research paper


erroneous-title-field

Title should not contain typical defects.

Problematic value

title = {A I S ssociation for nformation ystems}

Correct value

title = {An empirical study of platform exit}

Fields checked

title