colrev.dataset.Dataset¶
- class Dataset(*, review_manager)[source]¶
Bases:
object
The CoLRev dataset (records and their history in git)
Methods
Add changed file to git
Add changes in settings to git
Check whether the repository is behind the remote
Create a commit (including a commit report)
Check whether a file is in the git history
Format the records file (Entrypoint for pre-commit hooks)
Get the commit message for commit #
Get the committed origin_state_dict
Get the last commit date for a file
Get the last commit sha
Get the origin_state_dict (to determine state transitions efficiently)
Get the remote url
Get the git repository object
Get the current tree hash
Get the files that are untracked by git
Check whether the relative path (or the git repository) has changes
Check whether the records have changes
Check whether there are untracked search records
Load the records
Iterates through Git history, yielding records file contents as dictionaries.
Check whether an ID is propagated (i.e., its record's status is beyond md_processed)
Pull project if repository is clean
Read records (Iterator) based on condition
Check whether the records were changed
Check whether the remote is ahead
Check whether the repository is initialized
Reset the report log file if there are not changes
Save the records dict in RECORDS_FILE
Save the records dict
Set the IDs of records according to predefined formats or according to the LocalIndex
Stash unstaged changes
Update the gitignore file by adding or removing particular paths
- add_changes(path, *, remove=False, ignore_missing=False)[source]¶
Add changed file to git
- Return type:
None
- create_commit(*, msg, manual_author=False, script_call='', saved_args=None, skip_status_yaml=False, skip_hooks=True)[source]¶
Create a commit (including a commit report)
- Return type:
bool
- format_records_file()[source]¶
Format the records file (Entrypoint for pre-commit hooks)
- Return type:
dict
- get_origin_state_dict(records_string='')[source]¶
Get the origin_state_dict (to determine state transitions efficiently)
{‘30_example_records.bib/Staehr2010’: <RecordState.pdf_not_available: 10>,}
- Return type:
dict
- has_changes(relative_path, *, change_type='all')[source]¶
Check whether the relative path (or the git repository) has changes
- Return type:
bool
- has_record_changes(*, change_type='all')[source]¶
Check whether the records have changes
- Return type:
bool
- has_untracked_search_records()[source]¶
Check whether there are untracked search records
- Return type:
bool
- load_records_dict(*, header_only=False)[source]¶
Load the records
header_only:
{“Staehr2010”: {‘ID’: ‘Staehr2010’, ‘colrev_origin’: [‘30_example_records.bib/Staehr2010’], ‘colrev_status’: <RecordState.md_imported: 2>, ‘screening_criteria’: ‘criterion1=in;criterion2=out’, ‘file’: PosixPath(‘data/pdfs/Smith2000.pdf’), ‘colrev_data_provenance’: {Fields.AUTHOR:{“source”:”…”, “note”:”…”}}}, }
- Return type:
dict
- load_records_from_history(commit_sha='')[source]¶
Iterates through Git history, yielding records file contents as dictionaries.
Starts iteration from a provided commit SHA. Skips commits where the records file is unchanged. Useful for tracking dataset changes over time.
- Return type:
Iterator
[dict
]
- Parameters:
commit_sha (str, optional): Start iteration from this commit SHA. Defaults to beginning of Git history if not provided.
- Yields:
dict: Records file contents at a specific Git history point, as a dictionary.
- propagated_id(*, record_id)[source]¶
Check whether an ID is propagated (i.e., its record’s status is beyond md_processed)
- Return type:
bool
- read_next_record(*, conditions)[source]¶
Read records (Iterator) based on condition
- Return type:
Iterator
[dict
]
- reset_log_if_no_changes()[source]¶
Reset the report log file if there are not changes
- Return type:
None
- save_records_dict(records, *, partial=False)[source]¶
Save the records dict in RECORDS_FILE
- Return type:
None