How Lens Search Works

Searching on The Lens is as easy as typing in a keyword in either the Patent or scholarly search and clicking the “Search” button.   Patent Search and Scholarly Search use the text in the search bar, even if from an existing search (filters are dropped).  Alternatively, you can use “Structured Search” that is effectively an assisted version of a normal search but tries to parse and modify it.

The Classification Search adds new classification filter values to your existing patent query.  In other words, it is a support tool for adding classification filter in a fully-fledged search….It is different than the other searches in that it doesn’t search a document corpus as such, displays results in its own window and in a pre-existing patent search, it only affects the classification filters.

If you are keen on searching patent sequences and biological patents, you can access the PatSeq Facility  to start your search with more constrained parameters.  From there you have numerous options to sort and filter your search to refine your results towards a single set of documents or  collections of documents.

When you search on The Lens we query your search results against our vast database of more than 100 million patent documents from over 95 jurisdictions around the world. Results are sorted by default using their generated rank. This rank is determined through advanced algorithms which establish which documents are the most relevant given your search terms, parameters and filters. Each patent’s rank is not an indicator of the document’s quality or importance, but rather is a measure of how well this patent matches your search.

We also apply some modifications to the natural rank of these documents, boosting the rank of higher quality documents over lower quality documents to place the best documents at the top of our results.  For an example see this link

Limitations and Caveats to Patent Search

While we strive to produce the highest quality patent database, the user should be aware that there are limitations that may affect the outcome of any search. Some of these limitations are inherent in the data provided by the Patent Offices, while others result from the processing of these data. In the interest of full disclosure, below is a list of known issues with the data and their causes.

Data issues

Misspellings (typos)

  • Can be inherent in the original data, in which case they will appear in the PDF document (where available);
  • Can arise from OCR (optical character recognition) processing (which puts images into a full-text searchable format) in two ways, in which case the correct spelling will appear in the PDF document:
    • (i) because the OCR process  is generally only 99% accurate
    • (ii) can result when words are split over two lines by hyphenation in the original patent document. Currently, such words are indexed as the two separate parts by the OCR process.  For example, if the word “magnetism” is split over two lines as “magnet-ism” then the OCR process indexes it as two separate words “magnet” and “ism”.  Where the error is noted, the affected documents will be re-processed to correct this problem.
  1. Alternate spellings
    • many words in English can be spelled differently, depending on the preference of the writer (e.g., harbor/harbour; center/centre; labeled/labelled);
    • spelling is usually, but not always, consistent within a document;
    • in US patent documents, mostly the spelling is American even if the writer is not from the U.S. while in EP patent documents, the spelling is mostly British;
    • in WO documents, the spelling preference may depend upon the country of origin or the receiving office.
  2. Names (inventors, assignees, etc)
    • names in the inventor, applicant/assignee or agent fields are indexed just like any other word. The various collections format names in different ways, e.g. “John Smith” may appear in any of the following forms: “J. Smith”; “John Smith”; “Smith, John”; “Smith, J.”, etc. ;
    • the best approach to searching for a particular person’s name is to use just the last name, surname or family name (e.g. “Smith”), and if too many documents are returned from the search, then refine the search with one or more additional criteria, such as an organisation name (e.g. “university AND Cornell)”.
  3. Inconsistency of presentation among data sets (e.g. Greek letters, layouts, fields present, order of fields)
    • these inconsistencies will affect your search strategy and search results;
    • Greek letters: It is now possible to search such characters by entering the Unicode character, e.g. beta = β. Please refer to the manual for your computer’s operating system for instructions on entering non-roman characters.
    • layouts: unlike the other data sets, the U.S. patents generally have a fixed set of headings (e.g. Field of the Invention; Summary of the Invention);
    • fields: not all information on the front page is common among the datasets. For example, U.S. documents may contain fields (e.g. U.S. classification codes) not present in EP and WO documents;