Text Queries (p-search User Manual)

Next: Text Query Syntax, Previous: Importance and Complement Options, Up: Priors [Contents]

4.5.4 Text Queries ¶

Perhaps the most prominent prior type in p-search is the text query. Most p-search sessions will involve the text query prior. Text queries behave the same as other priors: they return a score between 0 and 1 for every document. There is a lot of machinery and mechanisms behind the implemention of text queries though.

You can create a text query prior by first pressing P (p-search-add-prior) then q. You will be prompted for a query string. This query string has a special syntax, which will be covered in the next section. Once you enter the query string, you can then further configure the prior. Once created, the search processes will be created, the counts will be tallied, and then when every process of the search completes, the final score will be calculated.

p-search uses the BM-25F algorithm for scoring the text search. Without going into the details, the BM-25F algorithm scores documents for each term based on two key components: term frequency, and inverse document frequency. Term frequency measures how often a term occurs while inverse document frequency (IDF) measures how much information a term provides by counting the number of documents the term occurs in.

As a quick example, suppose the user searched for “defun eggplant”. Here we have two terms, ‘defun’ and ‘eggplant’. Since the term ‘defun’ would occur in almost every Elisp file, it’s IDF would be very low, making it contribute practically nothing to the final score. The term ‘eggplant’ on the other hand would be so rare, that any document containing it would have its score greatly increased.

Since BM-25F scores are not a value between zero and one, the final score given to a document is the BM-25F score normalized between 0.5 and 0.7, with the highest scoring document getting a value of 0.7. Documents that had no matches get a score of 0.3. Remember, in p-search, evidence for means getting a score greater than 0.5. This is why we don’t normalize the BM-25F score between zero and one.

The text search prior is in no way tied to specific search tools like rg or grep. Instead, it delegates the searching to the candidate generator. This way, the candidate generator can choose the best way to search the documents it generates. If the candidate generator has no specific way to seach specified, p-search will fall back to searching using Elisp (which can be slow for large number of documents).

User Option: p-search-default-search-tool ¶: This user option sets the default search tool selected when creating a text query prior. It will try to use rg if it is available.