colrev.dataset.Dataset

class Dataset(*, review_manager)[source]

Bases: object

The CoLRev dataset (records and their history in git)

Methods

add_changes

Add changed file to git

add_setting_changes

Add changes in settings to git

behind_remote

Check whether the repository is behind the remote

create_commit

Create a commit (including a commit report)

file_in_history

Check whether a file is in the git history

format_records_file

Format the records file (Entrypoint for pre-commit hooks)

get_commit_message

Get the commit message for commit #

get_committed_origin_state_dict

Get the committed origin_state_dict

get_last_commit_date

Get the last commit date for a file

get_last_commit_sha

Get the last commit sha

get_origin_state_dict

Get the origin_state_dict (to determine state transitions efficiently)

get_remote_url

Get the remote url

get_repo

Get the git repository object

get_tree_hash

Get the current tree hash

get_untracked_files

Get the files that are untracked by git

has_changes

Check whether the relative path (or the git repository) has changes

has_record_changes

Check whether the records have changes

has_untracked_search_records

Check whether there are untracked search records

load_records_dict

Load the records

load_records_from_history

Iterates through Git history, yielding records file contents as dictionaries.

propagated_id

Check whether an ID is propagated (i.e., its record's status is beyond md_processed)

pull_if_repo_clean

Pull project if repository is clean

read_next_record

Read records (Iterator) based on condition

records_changed

Check whether the records were changed

remote_ahead

Check whether the remote is ahead

repo_initialized

Check whether the repository is initialized

reset_log_if_no_changes

Reset the report log file if there are not changes

save_records_dict

Save the records dict in RECORDS_FILE

save_records_dict_to_file

Save the records dict

set_ids

Set the IDs of records according to predefined formats or according to the LocalIndex

stash_unstaged_changes

Stash unstaged changes

update_gitignore

Update the gitignore file by adding or removing particular paths

add_changes(path, *, remove=False, ignore_missing=False)[source]

Add changed file to git

Return type:

None

add_setting_changes()[source]

Add changes in settings to git

Return type:

None

behind_remote()[source]

Check whether the repository is behind the remote

Return type:

bool

create_commit(*, msg, manual_author=False, script_call='', saved_args=None, skip_status_yaml=False, skip_hooks=True)[source]

Create a commit (including a commit report)

Return type:

bool

file_in_history(filepath)[source]

Check whether a file is in the git history

Return type:

bool

format_records_file()[source]

Format the records file (Entrypoint for pre-commit hooks)

Return type:

dict

get_commit_message(*, commit_nr)[source]

Get the commit message for commit #

Return type:

str

get_committed_origin_state_dict()[source]

Get the committed origin_state_dict

Return type:

dict

get_last_commit_date(filename)[source]

Get the last commit date for a file

Return type:

str

get_last_commit_sha()[source]

Get the last commit sha

Return type:

str

get_origin_state_dict(records_string='')[source]

Get the origin_state_dict (to determine state transitions efficiently)

{‘30_example_records.bib/Staehr2010’: <RecordState.pdf_not_available: 10>,}

Return type:

dict

get_remote_url()[source]

Get the remote url

Return type:

str

get_repo()[source]

Get the git repository object

Return type:

Repo

get_tree_hash()[source]

Get the current tree hash

Return type:

str

get_untracked_files()[source]

Get the files that are untracked by git

Return type:

list

has_changes(relative_path, *, change_type='all')[source]

Check whether the relative path (or the git repository) has changes

Return type:

bool

has_record_changes(*, change_type='all')[source]

Check whether the records have changes

Return type:

bool

has_untracked_search_records()[source]

Check whether there are untracked search records

Return type:

bool

load_records_dict(*, header_only=False)[source]

Load the records

header_only:

{“Staehr2010”: {‘ID’: ‘Staehr2010’, ‘colrev_origin’: [‘30_example_records.bib/Staehr2010’], ‘colrev_status’: <RecordState.md_imported: 2>, ‘screening_criteria’: ‘criterion1=in;criterion2=out’, ‘file’: PosixPath(‘data/pdfs/Smith2000.pdf’), ‘colrev_data_provenance’: {Fields.AUTHOR:{“source”:”…”, “note”:”…”}}}, }

Return type:

dict

load_records_from_history(commit_sha='')[source]

Iterates through Git history, yielding records file contents as dictionaries.

Starts iteration from a provided commit SHA. Skips commits where the records file is unchanged. Useful for tracking dataset changes over time.

Return type:

Iterator[dict]

Parameters:

commit_sha (str, optional): Start iteration from this commit SHA. Defaults to beginning of Git history if not provided.

Yields:

dict: Records file contents at a specific Git history point, as a dictionary.

propagated_id(*, record_id)[source]

Check whether an ID is propagated (i.e., its record’s status is beyond md_processed)

Return type:

bool

pull_if_repo_clean()[source]

Pull project if repository is clean

Return type:

None

read_next_record(*, conditions)[source]

Read records (Iterator) based on condition

Return type:

Iterator[dict]

records_changed()[source]

Check whether the records were changed

Return type:

bool

remote_ahead()[source]

Check whether the remote is ahead

Return type:

bool

repo_initialized()[source]

Check whether the repository is initialized

Return type:

bool

reset_log_if_no_changes()[source]

Reset the report log file if there are not changes

Return type:

None

save_records_dict(records, *, partial=False)[source]

Save the records dict in RECORDS_FILE

Return type:

None

save_records_dict_to_file(records)[source]

Save the records dict

Return type:

None

set_ids(selected_ids=None)[source]

Set the IDs of records according to predefined formats or according to the LocalIndex

Return type:

dict

stash_unstaged_changes()[source]

Stash unstaged changes

Return type:

bool

update_gitignore(*, add=None, remove=None)[source]

Update the gitignore file by adding or removing particular paths

Return type:

None