Lucene Query Syntax Guide
This article is an overview of Lucene query syntax to help you get started with running custom queries in Dark Web search. For more information, see Apache's Lucene - Query Parser Syntax documentation.
Writing a Query
Lucene query syntax can be broken down into three parts: terms, fields, and operators or modifiers. You can use a combination of a term, field, and an operator or modifier to form a query string.
Terms
A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.
- A Single Term is a single word such as
testorhello - A Phrase is a group of words surrounded by double quotes such as
"hello dolly"
Surrounding a phrase by single quotes will result in error
Fields
Lucene supports fielded data. When performing a search you can either specify a field, or if a field is not specified then search works across all fields. The later is the default search method.
You can search any field by typing the field name followed by a colon : and then the term you are looking for.
Available fields
| Field name | Example queries |
|---|---|
| "title" | title: “The quick brown fox” |
| “content” | content: “Lorem Ipsum is simply dummy text” |
| "author_name" | author_name: “author1” |
| “site_domain_name” (Forums only) | site_domain_name: “example.com” |
| “channel_name” (Telegram only) | channel_name: “Stealer Developers” |
| “tags” | tags: “DATABASE” |
Operators and Modifiers
Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "+", OR, NOT and "-" as Boolean operators (Note: Boolean operators must be ALL CAPS).
This list includes basic operators and modifiers. For a full list of operators and modifiers, see Apache's Lucene - Query Parser Syntax documentation.
OR
The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.
For example, to search for documents that contain either "jakarta apache" or just jakarta use the query: "jakarta apache" jakarta or "jakarta apache" OR jakarta.
AND
The AND operator matches documents where both terms exist anywhere in the text of a single document. The symbol && can be used in place of the word AND.
For example, to search for documents that contain "jakarta apache" and "Apache Lucene" use the query: "jakarta apache" AND "Apache Lucene".
NOT
The NOT operator excludes documents that contain the term after NOT. The symbol ! can be used in place of the word NOT.
For example, to search for documents that contain "jakarta apache" but not "Apache Lucene" use the query: "jakarta apache" NOT "Apache Lucene".
The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT "jakarta apache"
* (asterisk)
You can use this wildcard placeholder for multiple characters. This placeholder can only be used with Single Terms.
For example, to search for need, needs, or needed, you can search for the following text: need*.
? (question mark)
You can use this wildcard placeholder for a single character. This placeholder looks for terms that match but have a single character replaced.
For example, to search for text or test you can use the search: te?t.
You cannot use a * or ? symbol as the first character of a search.
For more information, see Apache's Lucene - Query Parser Syntax documentation.