Developing a parser
===================
.. image:: documentation.png
:width: 800px
Development setup
-------------------
.. code-block::
:caption: Installation in editable mode with `dev` extras
pip install -e ".[dev]"
Skeleton
--------------------
A code skeleton is available for the parser and tests:
.. literalinclude:: parser_skeleton.py
:language: python
.. literalinclude:: parser_skeleton_tests.py
:language: python
To parse a list format, the numbered sub-queries should be replaced to create a search string, which can be parsed with the standard string-parser. This helps to avoid redundant implementation.
Tokenization
------------
`Regex `_
Translate search fields: Mapping Fields to Standard-Fields
----------------------------------------------------------
The search fields supported by the database (Platform-Fields) may not necessarily match exactly with the standard fields (Standard-Fields) in ``constants.Fields``.
We distinguish the following cases:
**1:1 matches**
Cases where a 1:1 match exists between DB-Fields and Standard-Fields are added to the ``constants.SYNTAX_FIELD_MAP``.
**1:n matches**
Cases where a DB-Field combines multiple Standard-Fields are added to the ``constants.SYNTAX_COMBINED_FIELDS_MAP``. For example, Pubmed offers a search for ``[tiab]``, which combines ``Fields.TITLE`` and ``Fields.ABSTRACT``.
When parsing combined DB-Fields, the standard syntax should consist of n nodes, each with the same search term and an atomic Standard-Field. For example, ``Literacy[tiab]`` should become ``(Literacy[ti] OR Literacy[ab])``. When serializing a database string, it is recommended to combine Standard-Fields into DB-Fields whenever possible.
**n:1 matches**
If multiple Database-Fields correspond to the same Standard-Field, a combination of the default Database-Field and Standard-Field are added to the ``constants.SYNTAX_FIELD_MAP``. Non-default Database-Fields are replaced by the parser. For example, the default for MeSH terms at Pubmed is ``[mh]``, but the parser also supports ``[mesh]``.
Search Field Validation in Strict vs. Non-Strict Modes
----------------------------------------------------------
.. list-table:: Search Field Validation in Strict vs. Non-Strict Modes
:widths: 20 20 20 20 20
:header-rows: 1
* - **Search-Field required**
- **Search String**
- **Search-Field**
- **Mode: Strict**
- **Mode: Non-Strict**
* - Yes
- With Search-Field
- Empty
- ok
- ok
* - Yes
- With Search-Field
- Equal to Search-String
- ok - search-field-redundant
- ok
* - Yes
- With Search-Field
- Different from Search-String
- error: search-field-contradiction
- ok - search-field-contradiction. Parser uses Search-String per default
* - Yes
- Without Search-Field
- Empty
- error: search-field-missing
- ok - search-field-missing. Parser adds `title` as the default
* - Yes
- Without Search-Field
- Given
- ok - search-field-extracted
- ok
* - No
- With Search-Field
- Empty
- ok
- ok
* - No
- With Search-Field
- Equal to Search-String
- ok - search-field-redundant
- ok
* - No
- With Search-Field
- Different from Search-String
- error: search-field-contradiction
- ok - search-field-contradiction. Parser uses Search-String per default
* - No
- Without Search-Field
- Empty
- ok - search-field-not-specified
- ok - Parser uses default of database
* - No
- Without Search-Field
- Given
- ok - search-field-extracted
- ok
Tests
----------------
- All test data should be stored in standard JSON.
Resources
---------------
- `Web of Science Errors `_