Linter

Linters are responsible for validating query strings or query lists before execution. They analyze token sequences, syntax, search fields, and operator use to identify errors or ambiguities and print meaningful messages (documented in the messages section). Each platform implements its own linter, which interhits from the base class linter_base.py. Linters are used in the parser methods.

Base Classes

Use the appropriate base class when developing a new linter:

QueryStringLinter: for single query strings
QueryListLinter: for list-based query formats

Each linter must override the validate_tokens() method and the validate_query_tree(). validate_tokens() is called when the query is parsed, and validate_query_tree() is called when the query tree is built (i.e., at the end of the parsing process and when the query is constructed programmatically).

Best Practices

Use standardized linter messages defined in constants.QueryErrorCode.
Add details in messages for user guidance (e.g., invalid format, missing logic).
Ensure valid token sequences using the VALID_TOKEN_SEQUENCES dictionary.
Consider using utility methods provided by linter_base.py: - check_unbalanced_parentheses() - check_unknown_token_types() - add_artificial_parentheses_for_operator_precedence() - check_invalid_characters_in_term(chars) - check_operator_capitalization() - etc.
For search field validation, use a corresponding field mapping and helper functions like map_to_standard().

import typing

from search_query.constants import QueryErrorCode
from search_query.constants import TokenTypes
from search_query.linter_base import QueryStringLinter

if typing.TYPE_CHECKING:
    from search_query.query import Query


class XYQueryStringLinter(QueryStringLinter):
    """Linter for XY query strings"""

    VALID_TOKEN_SEQUENCES = {
        TokenTypes.FIELD: [TokenTypes.TERM],
        TokenTypes.TERM: [
            TokenTypes.LOGIC_OPERATOR,
            TokenTypes.PARENTHESIS_CLOSED,
        ],
        TokenTypes.LOGIC_OPERATOR: [
            TokenTypes.TERM,
            TokenTypes.PARENTHESIS_OPEN,
        ],
        # ...
    }

    def validate_tokens(
        self,
        *,
        tokens: typing.List[Token],
        query_str: str,
        field_general: str = "",
    ) -> typing.List[Token]:
        """Main validation routine"""

        self.tokens = tokens
        self.query_str = query_str
        self.field_general = field_general

        self.check_unbalanced_parentheses()
        self.check_unknown_token_types()
        self.check_invalid_token_sequences()
        self.check_operator_capitalization()

        # custom validation

        return self.tokens

    def check_invalid_token_sequences(self) -> None:
        for i, token in enumerate(self.parser.tokens[:-1]):
            expected = self.VALID_TOKEN_SEQUENCES.get(token.type, [])
            if self.parser.tokens[i + 1].type not in expected:
                self.add_message(
                    QueryErrorCode.INVALID_TOKEN_SEQUENCE,
                    position=self.parser.tokens[i + 1].position,
                    details=f"Unexpected token after {token.type}",
                    fatal=True,
                )

    def validate_query_tree(self, query: Query) -> None:
        """
        Validate the query tree.
        This method is called after the query tree has been built.
        """

        self.check_quoted_terms_query(query)
        self.check_operator_capitalization_query(query)
        self.check_invalid_characters_in_term_query(query, "@&%$^~\\<>{}()[]#")
        self.check_unsupported_fields_in_query(query)
        # term_field_query = self.get_query_with_fields_at_terms(query)