Package development

CoLRev packages are Python packages that extend CoLRev by relying on its shared data structure and standard process. Specifically, a CoLRev package can extend package base classes, such as the ReviewTypePackageBaseClass, or the SearchSourcePackageBaseClass, to implement custom functionality for a specific task or data source. In addition, packages can provide complementary functionalities (e.g., for ad-hoc data exploration and visualization) without extending a specific base class.

The following guide explains how to develop built-in packages, i.e., packages that reside in the packages directory. Built-in packages should also be registered as a dependency in the pyproject.toml.

Overview package development

Init

To create a new CoLRev package, the following command sets up the necessary directories, files, and code skeleton:

colrev package --init

To check the package structure and metadata, use the following command:

colrev package --check

Install and use

To install a CoLRev package, you can use the following command (pip install <package_name> is also possible):

colrev install <package_name>

Once installed, packages that extends a base class can be used in the standard process by registering the package in the settings.json of a project (e.g., by running colrev search –add <package_name>).

Creating a new CoLRev package

To create a new CoLRev package, the following command sets up the necessary directories, files, and code skeleton:

colrev package --init

Develop, test, document and check

The init command should set up the package structure and metadata. The following sections provide more details on how to develop, test, document, and check the package.

It is recommended to run the following check regularly:

colrev package --check

Package structure

A package contains the following files and directories:

├── pyproject.toml
├── README.md
├── src
│   ├── __init__.py
│   ├── package_functionality.py

The package metadata is stored in the pyproject.toml file. The metadata is used by the CoLRev to identify the package and its dependencies. The metadata should include the following fields:

[tool.poetry]
name = "colrev.abi_inform_proquest"
description = "CoLRev package for abi_inform_proquest"
version = "0.1.0"
authors = ["Gerit Wagner <gerit.wagner@uni-bamberg.de>"]
license = "MIT"
repository = "https://github.com/CoLRev-Environment/colrev/blob/main/colrev/packages/sync"


[tool.colrev]
colrev_doc_description = "Package for sync"
colrev_doc_link = "README.md"
search_types = ["API", "TOC", "MD"]

[tool.poetry.plugins.colrev]
search_source = "colrev.packages.abi_inform_proquest.src.package_functionality:ABIInformProQuestSearchSource"

The tool.poetry.plugins.colrev section specifies which base classes are extended. The value contains the module path and the class name. The module path is relative to the package directory.

Develop

Package development is done in the src directory. The package should extend the respective base class(es).

Best practices

  • Remember to install CoLRev in editable mode, so that changes are immediately available (run pip install -e /path/to/cloned/colrev)

  • Check the other package implementations for getting a good idea on how to proceed

  • Use the colrev constants

  • Get paths from review_manager

  • Use the logger and colrev_report_logger to help users examine and validate the process, including links to the docs where instructions for tracing and fixing errors are available.

  • Before committing do a pre-commit test

  • Use poetry for dependency management (run poetry add <package_name> to add a new dependency)

  • Once the package development is completed, make a pull request to the CoLRev origin repository, with brief description of the package.

  • The add_endpoint is only required for SearchSources. It is optional for other packages.

Packages allow packages to implement functionality that can be called in the standard process if users register the package in the settings.json of a project.

To implement an endpoint, the tool.colrev section of pyproject.toml must provide a reference to the class which inherits from the respective base classes. The reference is a string that contains the module path and the class name. The module path is relative to the package directory.

The following endpoint - abstract base class pairs are available:

Endpoint

Abstract base class

review_type

ReviewTypePackageBaseClass

search_source

SearchSourcePackageBaseClass

prep

PrepPackageBaseClass

prep_man

PrepManPackageBaseClass

dedupe

DedupePackageBaseClass

prescreen

PrescreenPackageBaseClass

pdf_get

PDFGetPackageBaseClass

pdf_get_man

PDFGetManPackageBaseClass

pdf_prep

PDFPrepPackageBaseClass

pdf_prep_man

PDFPrepManPackageBaseClass

screen

ScreenPackageBaseClass

data

DataPackageBaseClass

Documentation

  • Link the documentation (README.md) in the pyproject.toml.

  • See tests/REAMDE.md for details on building the CoLRev docs.

  • CLI demonstrations can be recorded with asciinema.

Testing

  • Tests for built-in packages are currently in the tests of the CoLRev packages.

  • See tests/REAMDE.md for details.

Document

  • Link the documentation (README.md) in the pyproject.toml.

  • See docs/REAMDE.md for details on building the CoLRev docs.

  • CLI demonstrations can be recorded with asciinema.

Publish

  • Standalone CoLRev packages are published on PyPI.

  • Built-in packages are not published separately. They are automatically provided with every PyPI-release of CoLRev.

Register

To have a package registered as an official CoLRev package, create a pull-request adding it to the packages.json.

To integrate the package documentation into the official CoLRev documentation, the CoLRev team

Package development resources