Text Query Syntax (p-search User Manual)

Next: Fields, Previous: Text Queries, Up: Priors [Contents]

4.5.5 Text Query Syntax ¶

The query syntax for p-search does not work the way a grep search would normally work. The query string consists of terms separated by spaces, and each term is queried individually, independent from the others.

So for example, the query “New York City” contains three terms and will perform a separate search for each one. If you wanted to search the whole string, not broken apart, you’d wrap it in quotation marks.

‘"term1 term2 ..."’

Search the terms as a whole, exactly as written.

‘#"term"’

Search term as a regular expression.

‘term^’, ‘term^3’

Boost the importance of term. A number can be provided to give a stronger boost.

Note that after the text query is ran and scored, the score is normalized to be between 0.5 and 0.7. So, simply changing a text query from ‘foo’ to ‘foo^10’ won’t make the query outweigh the other priors. If you want to give more weight to the text query, use the importance setting instead.

‘(term1 term2 ...)~’

Search for ‘term1’ and ‘term2’ occurring near each other. To be considered near, the terms need to occur within the number of lines specified by p-search-default-near-line-length. Each term will reset the line counter, so for example, the query ‘(foo bar baz)~’ will with a line-length setting of 3 will match the following example:

foo

bar

bar

baz

‘fooBarBaz’, ‘foo-bar-baz’, ‘foo_bar_baz’

p-search automatically breaks compound terms and performs a number of searches related to them. While the particular rules may change, and customization options may be added, the following is what is currently done:

The query term is broken at non-word chars and lower to upper case changes. So for example, ‘fooBar-baz’ is broken up as ‘foo’, ‘bar’, and ‘baz’. The following queries are then formed:

The original query, case insensitive (e.g. ‘foobar-baz’).
The constituent terms joined without spacing (e.g. ‘foobarbaz’). This is given 0.7 the weight of the original query.
The constituent terms joined with an underscore (e.g. ‘foo_bar_baz’). This is given 0.7 the weight of the original query.
The constituent terms joined with a dash (e.g. ‘foo-bar-baz’). This is given 0.7 the weight of the original query.
Each constituent term is searched individually, given 0.3 the weight of the original query (e.g. ‘foo’, ‘bar’, ‘baz’).

User Option: p-search-default-near-line-length ¶: This user option controls what is meant by the “nearness” query operator ‘~’. This variable specifies the maximum amount of lines that can occur between the constituent terms.

User Option: p-search-default-boost-amount ¶: This user option controls the default boost amount when the boost operator (i.e. ‘^’) is used without an amount.