colrev.ops.dedupe.Dedupe
- class colrev.ops.dedupe.Dedupe(*, review_manager, notify_state_transition_operation=True)[source]
 Bases:
OperationDeduplicate records (entity resolution)
Methods
apply_mergesApply deduplication decisions
check_preconditionCheck the operation precondition
concludeConclude the operation (stop Docker containers)
connected_componentsFind the connected components in a graph.
decorateDecorator for operations
fix_errorsFix lists of errors
get_infoGet info on cuts (overlap of search sources) and same source merges
get_records_for_dedupeGet (pre-processed) records for dedupe
main- rtype:
 Any
merge_based_on_global_idsMerge records based on global IDs (e.g., doi)
merge_recordsMerge records by ID sets
notifyNotify the review_manager about the next operation
unmerge_recordsUnmerge duplicate decision of the records, as identified by their ids.
Attributes
DUPLICATES_TO_VALIDATENON_DUPLICATE_FILE_TXTNON_DUPLICATE_FILE_XLSXPREVENTED_SAME_SOURCE_MERGE_FILESAME_SOURCE_MERGE_FILEdebugtype- apply_merges(*, id_sets, complete_dedupe=False, preferred_masterdata_sources=None)[source]
 Apply deduplication decisions
id_sets : [[ID_1, ID_2, ID_3], …] :rtype:
Nonecomplete_dedupe: when not all potential duplicates were considered,
we cannot set records to md_procssed for non-duplicate decisions
- check_precondition()
 Check the operation precondition
- Return type:
 None
- conclude()
 Conclude the operation (stop Docker containers)
- Return type:
 None
- classmethod connected_components(id_sets)[source]
 Find the connected components in a graph.
- Return type:
 list
- Args:
 id_sets (list): A list of id sets.
- Returns:
 list: A list of connected components.
- classmethod decorate()
 Decorator for operations
- Return type:
 Callable
- get_info()[source]
 Get info on cuts (overlap of search sources) and same source merges
- Return type:
 dict
- classmethod get_records_for_dedupe(*, records_df, verbosity_level=0)[source]
 Get (pre-processed) records for dedupe
- Return type:
 DataFrame
- merge_based_on_global_ids(*, apply=False)[source]
 Merge records based on global IDs (e.g., doi)
- Return type:
 None
- notify(*, state_transition=True)
 Notify the review_manager about the next operation
- Return type:
 None