Search Syntax

Searching The Lens can be done simply by using the search bar or the Structured Search page. After your initial search you can refine your parameters using the various faceted filters available to you. However for users who want to go above and beyond with their specificity it is also possible to use native search syntax to create sophisticated searches. Our documentation for this syntax is still a work in progress, but the core information you need to use it can be found below.

Apache Lucene – Query Parser Syntax

The Lens uses a modified form of the Apache Lucene Query Parser Syntax. We highly recommend you read this comprehensive guide on this syntax here:

Some important notes about the syntax:

  • To search for a value in an indexed field you type the name of the field followed by a colon and then the value you wish to search. For example:
    • title:malaria
    • pub_date:[20070101 TO 20070631]
  • When you want to search on multiple fields you can use boolean operators which all must be upper case. For example:
    • rice AND pesticide
    • malaria OR mosquito
    • printing NOT inkjet
  • Lucene supports AND, OR, NOT as well as “must” + and “must not” - as Boolean operators
  • For more information on how boolean logic works, see this tutorial.
  • Other operators available include:
    • term grouping: Lucene supports using parentheses ( ) to group terms into sub queries e.g. (red AND yellow) OR (blue and green)
    • Field grouping: Lucene supports using parentheses to group multiple clauses to a single field e.g. title:(car OR truck)
    • * and ? for wildcard searches. Note that wildcard search terms are not stemmed and therefore may not work as expected for searches where “Stemming” (located in “Query Tools“) is turned on. This is because when stemming is turned on, search terms are matched against stemmed values in the index. For example the terms valve and valves will be stemmed to valv and both match the stemmed value valv in the index. The term valve* will not be stemmed and therefore won’t match valv in the index. Read more about stemming on wikipedia.
    • ~ for fuzzy/proximity searches. "foo bar"~4 searches for foo and bar within 4 words from each other. Exact matches are proximity zero and word transpositions (bar foo) are proximity 2.
    • TO for range searches
    • ^ to boost the relevance of a value in a search, affects the result order eg. car abstract:coke^2
    • \ to escape the following special query syntax characters in a search term that are not inside quotes:
      (+ - && || ! ( ) { } [ ] ^ ~ * ? : \ /)

Example Queries

The following are a series of example queries to give you some examples of how the syntax works:

  • transgenic rice
  • lens_id:022-382-024-804-703
  • 022-382-024-804-703
  • inventor:"Jefferson Richard"~2
  • "method and system for direct recording of video information onto a disk medium"
  • pub_num:20150315267
  • pub_num:2015\/0315267
  • "US 2015\/0315267 A1"
  • +title:"一种循环煤泥水的高效澄清方法"
  • (abstract:β-glucuronidase) OR (title:β-glucuronidase)
  • +applicant:"Australian National University"
  • +filing_date:[20070101, 20070331] +pub_date:[20070101, 20070631]
  • owner:"Asgrow seed company" OR applicant:"Asgrow seed company"
  • classification_nat:221\/220 – the forward slashes (/) in queries must be escaped by a backslash (\) if the query term is not inside quotes
  • classification_nat:"221/220" query term is inside quotes so the forward slash (/) does not need to be escaped by a backslash

Index Fields

The following are all of the fields indexed on The Lens.

  • lens_id – e.g. 186-488-232-022-055
  • pub_key e.g. US_2013_0227762_A1
  • jurisdiction e.g. US
  • kind e.g. A1
  • pub_num e.g. 2013/0227762
  • pub_date e.g. 20170905 – yyyymmdd
  • pub_year e.g. 2018
  • filing_date e.g. 20000519
  • earliest_priority_date e.g. 20000519
  • title e.g. "Fidget Spinner"
  • abstract e.g. "Super Conductor"
  • applicant e.g. "Smith David"
  • inventor e.g. Sally
  • owner e.g. "Sony Ltd"
  • has_full_text e.g. true
  • full_text e.g. robot
  • claims e.g. semiconductor
  • classification_ipcr e.g. "H01L21/768"
  • classification_nat e.g. "221/220" (US classifications)
  • classification_cpc e.g. H01L2924\/*
  • cites_patent_pub_key e.g. US_7128866_B1 – docs citing US_7128866_B1
  • cites_patent_count e.g. 5
  • cited_by_patent_count e.g. 10
  • non_patent_citation e.g. (health OR medicine)
  • citation_id e.g. 10.1038\/NATURE03090
  • family_of_pub_key e.g. US_6408520_B1
  • family_jurisdiction e.g. (US OR EP)
  • family_size e.g. [4 TO 6]
  • simple_family_of_pub_key e.g. US_6408520_B1
  • simple_family_jurisdiction e.g. US
  • simple_family_size e.g. 3
  • sequence_count e.g. [2 TO 3]
  • sequence_length e.g. [1 TO 100]
  • sequence_type e.g. N – nucleotide, P – peptide