Search-Query Documentation
Search-query is a Python package for translating academic literature search queries (i.e., parsing and serializing), but also for validating, simplifying, and improving them. It implements various syntax validation checks (aka. linters) and prints instructive messages to inform users about potential issues. These checks are valuable for preventing errors—an important step given that previous studies have found high error rates in search queries (Li & Rainer, 2023: 50%; Salvador-Oliván et al., 2019: 90%; Sampson & McGowan, 2006: 80%).
We currently support PubMed, EBSCOHost, and Web of Science, but plan to extend search-query to support other databases. The package can be used programmatically or through the command line, has zero dependencies, and can therefore be integrated in a variety of environments. The parsers, and linters are battle-tested on over 500 (TO UPDATE) peer-reviewed queries registered at searchRxiv.
A Jupyter Notebook demo (hosted on Binder) is available here:
Installation
To install search-query, run:
pip install search-query
Quickstart
Creating a query programmatically is simple:
from search_query import OrQuery, AndQuery
# Typical building-blocks approach
digital_synonyms = OrQuery(["digital", "virtual", "online"], search_field="Abstract")
work_synonyms = OrQuery(["work", "labor", "service"], search_field="Abstract")
query = AndQuery([digital_synonyms, work_synonyms], search_field="Author Keywords")
We can also parse a query from a string or a JSON search file (see the overview of platform identifiers (syntax)):
from search_query.parser import parse
query_string = '("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract])'
query = parse(query_string, syntax="pubmed")
Once we have created a query
object, we can translate it for different databases.
Note how the syntax is translated and how the search for Title/Abstract
is spit into two elements:
query.to_string(syntax="ebsco")
# Output:
# (TI("digital health") OR AB("digital health")) AND (TI("privacy") OR AB("privacy"))
query.to_string(syntax="wos")
# Output:
# (TI=("digital health") OR AB=("digital health")) AND (TI=("privacy") OR AB=("privacy"))
Another useful feature of search-query is its validation (linter) functionality, which helps us to identify syntactical errors:
from search_query.parser import parse
query_string = '("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract]'
query = parse(query_string, syntax="pubmed")
# Output:
# Fatal: unbalanced-parentheses (F0002) at position 66:
# ("digital health"[Title/Abstract]) AND ("privacy"[Title/Abstract]
# ^^
Beyond the instructive error message, additional information on the specific messages is available here.
JSON search files
Search-query can parse queries from strings and JSON files in the standard format (Haddaway et al. 2022). Example:
{
"record_info": {},
"authors": [{"name": "Wagner, G.", "ORCID": "0000-0000-0000-1111"}],
"date": {"data_entry": "2019.07.01", "search_conducted": "2019.07.01"},
"platform": "Web of Science",
"database": ["SCI-EXPANDED", "SSCI", "A&HCI"],
"search_string": "TS=(quantum AND dot AND spin)"
}
To load a JSON query file, run the parser:
from search_query.search_file import SearchFile
from search_query.parser import parse
search = SearchFile("search-file.json")
query = parse(search.search_string, syntax=search.platform)
To write a query to a JSON file, run the serializer:
from search_query import save_file
save_file(
filename="search-file.json",
query_str=query.to_string(syntax="wos"),
syntax="wos",
authors=[{"name": "Tom Brady"}],
record_info={},
date={}
)
CLI Use
Linters can be run on the CLI:
search-file-lint search-file.json
Pre-commit Hooks
Linters can be included as pre-commit hooks by adding the following to the .pre-commit-config.yaml
:
repos:
- repo: https://github.com/CoLRev-Environment/search-query
rev: main # or version of search-query
hooks:
- id: search-file-lint
For development and testing, use the following:
repos:
- repo: local
hooks:
- id: search-file-lint
name: Search-file linter
entry: search-file-lint
language: python
files: \.json$
To activate and run:
pre-commit install
pre-commit run --all
Parser development
To develop a parser, see dev-parser docs.
References
Haddaway, N. R., Rethlefsen, M. L., Davies, M., Glanville, J., McGowan, B., Nyhan, K., & Young, S. (2022). A suggested data structure for transparent and repeatable reporting of bibliographic searching. Campbell Systematic Reviews, 18(4), e1288. doi: 10.1002/cl2.1288 Li, Z., & Rainer, A. (2023). Reproducible Searches in Systematic Reviews: An Evaluation and Guidelines. IEEE Access, 11, 84048–84060. IEEE Access. doi: 10.1109/ACCESS.2023.3299211 Salvador-Oliván, J. A., Marco-Cuenca, G., & Arquero-Avilés, R. (2019). Errors in search strategies used in systematic reviews and their effects on information retrieval. Journal of the Medical Library Association : JMLA, 107(2), 210. doi: 10.5195/jmla.2019.567 Sampson, M., & McGowan, J. (2006). Errors in search strategies were identified by type and frequency. Journal of Clinical Epidemiology, 59(10), 1057–1063. doi: 10.1016/j.jclinepi.2006.01.007