colrev.env.tei_parser.TEIParser

class TEIParser(*, environment_manager, pdf_path=None, tei_path=None)[source]

Bases: object

Environment service for TEI parsing

Creates a TEI file modes of operation: - pdf_path: create TEI and temporarily store in self.data - pfd_path and tei_path: create TEI and save in tei_path - tei_path: read TEI from file

Methods

get_abstract

Get the abstract

get_author_details

Get the author details

get_citations_per_section

Get a dict of section-names and list-of-citations

get_grobid_version

Get the GROBID version used for TEI creation

get_metadata

Get the metadata of the PDF (title, author, ...) as a dict

get_paper_keywords

Get hte keywords

get_references

Get the bibliography (references section) as a list of record dicts

get_tei_str

Get the TEI string

mark_references

Mark references with the additional record ID

Attributes

ns

nsmap

get_abstract()[source]

Get the abstract

Return type:

str

get_author_details()[source]

Get the author details

Return type:

list

get_citations_per_section()[source]

Get a dict of section-names and list-of-citations

Return type:

dict

get_grobid_version()[source]

Get the GROBID version used for TEI creation

Return type:

str

get_metadata()[source]

Get the metadata of the PDF (title, author, …) as a dict

Return type:

dict

get_paper_keywords()[source]

Get hte keywords

Return type:

list

get_references(*, add_intext_citation_count=False)[source]

Get the bibliography (references section) as a list of record dicts

Return type:

list

get_tei_str()[source]

Get the TEI string

Return type:

str

mark_references(*, records)[source]

Mark references with the additional record ID