Searching The Lens can be done simply by using the search bar or the Structured Search page. After your initial search you can refine your parameters using the various faceted filters available to you. However for users who want to go above and beyond with their specificity it is also possible to use native search syntax to create sophisticated searches. Our documentation for this syntax is still a work in progress, but the core information you need to use it can be found below.
Apache Lucene – Query Parser Syntax
The Lens uses a modified form of the Apache Lucene Query Parser Syntax. We highly recommend you read this comprehensive guide on this syntax here:
Some important notes about the syntax:
- To search for a value in an indexed field you type the name of the field followed by a colon and then the value you wish to search. For example:
title:malaria
pub_date:[20070101 TO 20070631]
- When you want to search on multiple fields you can use boolean operators which all must be upper case. For example:
rice AND pesticide
malaria OR mosquito
printing NOT inkjet
- Lucene supports
AND
,OR
,NOT
as well as “must”+
and “must not”-
as boolean operators.- The default boolean operator in a Lens search is
AND
eg.blue green
is equivalent toblue AND green
- For more information on how boolean logic works, see this tutorial.
- The default boolean operator in a Lens search is
- Other operators available include:
- term grouping: Lucene supports using parentheses
( )
to group terms into sub queries e.g.(red AND yellow) OR (blue and green)
- Field grouping: Lucene supports using parentheses to group multiple clauses to a single field e.g.
title:(car OR truck)
*
and?
for wildcard searches. Note that wildcard search terms are not stemmed and therefore may not work as expected for searches where “Stemming” (located in “Query Tools“) is turned on. This is because when stemming is turned on, search terms are matched against stemmed values in the index. For example the termsvalve
andvalves
will be stemmed tovalv
and both match the stemmed value valv in the index. The termvalve*
will not be stemmed and therefore won’t matchvalv
in the index. Read more about stemming on wikipedia.~
for fuzzy/proximity searches."foo bar"~4
searches forfoo
andbar
within 4 words from each other. Exact matches are proximity zero and word transpositions(bar foo)
are proximity 2.- TO for range searches
^
to boost the relevance of a value in a search, affects the result order eg.car abstract:coke^2
\
to escape the following special query syntax characters in a search term that are not inside quotes:
(+ - && || ! ( ) { } [ ] ^ ~ * ? : \ /)
- term grouping: Lucene supports using parentheses
Example Queries
The following are a series of example queries to give you some examples of how the syntax works:
transgenic rice
lens_id:022-382-024-804-703
022-382-024-804-703
inventor:"Jefferson Richard"~2
"method and system for direct recording of video information onto a disk medium"
pub_num:20150315267
pub_num:2015\/0315267
"US 2015\/0315267 A1"
+title:"一种循环煤泥水的高效澄清方法"
(abstract:β-glucuronidase) OR (title:β-glucuronidase)
+applicant:"Australian National University"
+filing_date:[20070101, 20070331] +pub_date:[20070101, 20070631]
owner:"Asgrow seed company" OR applicant:"Asgrow seed company"
classification_nat:221\/220
– the forward slashes (/
) in queries must be escaped by a backslash (\
) if the query term is not inside quotesclassification_nat:"221/220"
query term is inside quotes so the forward slash (/
) does not need to be escaped by a backslash
Index Fields
The following are all of the fields indexed on The Lens.
- lens_id – e.g.
186-488-232-022-055
- pub_key e.g.
US_2013_0227762_A1
- jurisdiction e.g.
US
- kind e.g.
A1
- pub_num e.g.
2013/0227762
- pub_date e.g.
20170905
– yyyymmdd - pub_year e.g.
2018
- filing_date e.g.
20000519
- earliest_priority_date e.g.
20000519
- title e.g.
"Fidget Spinner"
- abstract e.g.
"Super Conductor"
- applicant e.g.
"Smith David"
- inventor e.g.
Sally
- owner e.g.
"Sony Ltd"
- has_full_text e.g.
true
- full_text e.g.
robot
- claims e.g.
semiconductor
- classification_ipcr e.g.
"H01L21/768"
- classification_nat e.g.
"221/220"
(US classifications) - classification_cpc e.g.
H01L2924\/*
- cites_patent_pub_key e.g.
US_7128866_B1
– docs citing US_7128866_B1 - cites_patent_count e.g.
5
- cited_by_patent_count e.g.
10
- non_patent_citation e.g.
(health OR medicine)
- citation_id e.g.
10.1038\/NATURE03090
- family_of_pub_key e.g.
US_6408520_B1
- family_jurisdiction e.g.
(US OR EP)
- family_size e.g.
[4 TO 6]
- simple_family_of_pub_key e.g.
US_6408520_B1
- simple_family_jurisdiction e.g.
US
- simple_family_size e.g.
3
- sequence_count e.g.
[2 TO 3]
- sequence_length e.g.
[1 TO 100]
- sequence_type e.g.
N
– nucleotide,P
– peptide