colrev.dataset.Dataset¶
- class colrev.dataset.Dataset(*, review_manager)[source]¶
Bases:
object
The CoLRev dataset (records and their history in git)
Methods
format_records_file
Format the records file (Entrypoint for pre-commit hooks)
get_committed_origin_state_dict
Get the committed origin_state_dict
get_origin_state_dict
Get the origin_state_dict (to determine state transitions efficiently)
load_records_dict
Load the records
load_records_from_history
Iterates through Git history, yielding records file contents as dictionaries.
propagated_id
Check whether an ID is propagated (i.e., its record's status is beyond md_processed)
read_next_record
Read records (Iterator) based on condition
reset_log_if_no_changes
Reset the report log file if there are not changes
save_records_dict
Save the records dict in RECORDS_FILE
save_records_dict_to_file
Save the records dict
set_ids
Set the IDs of records according to predefined formats or according to the LocalIndex
- format_records_file()[source]¶
Format the records file (Entrypoint for pre-commit hooks)
- Return type:
dict
- get_origin_state_dict(records_string='')[source]¶
Get the origin_state_dict (to determine state transitions efficiently)
{‘30_example_records.bib/Staehr2010’: <RecordState.pdf_not_available: 10>,}
- Return type:
dict
- load_records_dict(*, header_only=False)[source]¶
Load the records
header_only:
{“Staehr2010”: {‘ID’: ‘Staehr2010’, ‘colrev_origin’: [‘30_example_records.bib/Staehr2010’], ‘colrev_status’: <RecordState.md_imported: 2>, ‘screening_criteria’: ‘criterion1=in;criterion2=out’, ‘file’: PosixPath(‘data/pdfs/Smith2000.pdf’), ‘colrev_data_provenance’: {Fields.AUTHOR:{“source”:”…”, “note”:”…”}}}, }
- Return type:
dict
[str
,dict
[str
,Any
]]
- load_records_from_history(commit_sha='')[source]¶
Iterates through Git history, yielding records file contents as dictionaries.
Starts iteration from a provided commit SHA. Skips commits where the records file is unchanged. Useful for tracking dataset changes over time.
- Return type:
Iterator
[dict
]
- Parameters:
commit_sha (str, optional): Start iteration from this commit SHA. Defaults to beginning of Git history if not provided.
- Yields:
dict: Records file contents at a specific Git history point, as a dictionary.
- propagated_id(*, record_id)[source]¶
Check whether an ID is propagated (i.e., its record’s status is beyond md_processed)
- Return type:
bool
- read_next_record(*, conditions)[source]¶
Read records (Iterator) based on condition
- Return type:
Iterator
[dict
]
- reset_log_if_no_changes()[source]¶
Reset the report log file if there are not changes
- Return type:
None