This page provides a description of the PatSeq bulk download file formats offered and instructions for accessing and downloading the bulk data using the PatSeq Data* app or download API.
We provide the entire collection of biological sequences disclosed in patents for download in FASTA and Rich file formats. To request access, please register and/or sign in to your Lens account and go to the PatSeq tab on the API & Data page, select the option that best suits your needs and complete the PatSeq Bulk Download request form.
*Note: Only FASTA files are available for download via the PatSeq Data app.
File Formats
FASTA File Format
Sequence data files in FASTA format are provided both for individual jurisdictions and also for the entire sequence database. Each set consists of multiple files, grouped by:
- sequence type (nucleotides, peptides),
- document type (grants, applications), and
Sequence location in the patent document (in claims, all locations – please note: “in claims” is a subset of the “all” dataset). For example, “Grants: Nucleotides (all)” refers to nucleotide sequences disclosed in granted patent documents regardless of where they are referenced in the documents whereas “Grants: Nucleotides (in Claims)” refers to nucleotide sequences referenced in the claims of the granted patent documents
FASTA File Example
gnl|patseq|US_7510834_B2-23062 Sequence 23062 from Patent US_7510834_B2
TCTCAAGTACTCAGTGATCCAGGAGAGCAAGGACATGTGAGGTCAATGGACCTCTATGTGAGGATATTGGCTGAGAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAAACTCCTATGAAGGATTTTCTCTTAACCGGCCTAATGCAGACATAAGCTATACAAACACATTGCACCAAGATTATTTGGGGCACAGGGCATGAAATAGTGAGATGGGAATAAGAAGGGCATAAAAATGATTCTTAAATACTCCATGTTTCAGTAACAGCTCCTAACA
Here you can download some sample data (1,000 sequences, FASTA format, 920KB).
Rich File Format
Sequence data is also provided in a Rich data format, which is a custom annotated flat file format based on the European Molecular Biology Laboratory (EMBL) flat file format, but modified to accommodate rich patent metadata. Rich data files are also provided both for individual jurisdictions and the entire sequence database. Sequence and patent metadata fields available in the Rich data format include:
Field | Description | Example |
ID | Identifiers | US_2002_0040130_A1_17; 162025 BP |
AC | Accession Number | US_2002_0040130_A1_17 |
PN | Patent publication key | US_2002_0040130_A1 |
PD | Patent publication date | 04-APR-2002 |
PF | Patent filing number and date | US_83470001_A; 12-APR-2001 |
PS | Simple family size | 17 |
PX | Patent priority with date | US_21725100; 10-JUL-2000 |
PT | Patent title | Polymorphic kinase anchor proteins and nucleic acids encoding the same |
PL | Patent Sequence Document Location (background info for you: we currently only include claims as a location with claim number if available) | Claim 47 |
PB | Patent applicants | SEQUENOM INC |
PI | Patent inventors | BRAUN ANDREAS |
PO | Patent assignees/owners | SEQUENOM INC |
OS | Declared organism/species | Homo Sapien |
DR | Database references (LensIDand patent office publication key) | lens.org; 158-517-731-452-986 US; US_2002_0040130_A1 |
SQ | Sequence | Sequence 162025 BP; GAATTCCTAT TTCAAAAGAA ACAAATGGGC CAAGTATGGT GGCTCATACC TGTAATCCCA 60 GCACTTTGGG AGGCCGAGGT GAGTGGGTCA CTTGAGGTCA GGAGTTCCAG GCCAGTCTGG 120 |
XX | Field terminator |
Rich File Example
ID US_7510834_B2_11947; DNA; 465 BP. XX AC US_7510834_B2_11947; XX PN US_7510834_B2 PD 31-MAR-2009 PF US_67412403_A; 26-SEP-2003 XX PS 2 XX PX JP_2000112699; 13-APR-2000. PX JP_0007621; 30-OCT-2000. PX JP_2002327516; 28-SEP-2002. PX JP_2002383869; 09-DEC-2002. PX US_25751103; 07-MAR-2003. PX US_67412403; 26-SEP-2003. XX PT Gene mapping method using microsatellite genetic polymorphism markers XX PL Claim 1; PL Claim 3; PL Claim 5; XX PB INOKO HIDETOSHI PB TAMIYA GEN XX PI INOKO HIDETOSHI PI TAMIYA GEN XX PO INOKO HIDETOSHI PO TOKAI UNIVERSITY XX OS Homo sapiens XX DR lens.org; 071-967-192-244-830. DR US; US_7510834_B2. XX SQ Sequence 465 BP; 124 A; 118 C; 82 G; 141 T; 0 U; 0 other; actgtagcca tgcactcaca taatgctaat attgcctaat catataatct taaagacttc 60
Here you can download some sample data (1,000 sequences, Rich format, 2.48MB).
Note: Rich file downloads are only available for download via the PatSeq Bulk Download API
How to download
- Before you are able to download, you will need to register and/or sign in to request access to PatSeq Bulk Downloads.
- Once you are signed in, you can request access by going to the PatSeq Data tab on the API & Data page and selecting the option that best suits your needs, as shown below.
- In the PatSeq Data tab, you can check out the pricing and options. Access plans and pricing are structured based on the:
- Product type (FASTA Human Genome dataset, FASTA full data set or Rich full dataset),
- Download frequency (monthly or one-off),
- Use type (academic vs commercial), and
- The commercial use Tier, which is based on the organisation’s size and and license requirements (e.g. whether the data will be used internally or as part of a commercial product).
Please see the PatSeq Bulk Download Terms of Use for license details and definitions of academic and commercial use.
- Read the PatSeq Bulk Download Terms of Use, and the general Lens Terms of Use
- Complete and submit the request form. You will receive an email confirming your request
- Once your request is approved, an invoice from the Lens team (if applicable) will be sent. For payment details, please see the PatSeq Bulk Download Terms of Use.
- Once the bulk download access is granted, you will receive an email confirming your access and it will be enabled in your Lens account. You can then download the sequence files using your web browser and the PatSeq Data app, or programmatically using the download API.
Instructions for both download options are provided below.
PatSeq Data App
The PatSeq Data app allows users to download bulk data* using the user interface with no programming required. Within the PatSeq data app you can use the provided “Sequence download” buttons to download data files for individual jurisdictions or the entire PatSeq database.
*Note: Only FASTA files are available for download via the PatSeq Data app.

The data is provided in multiple files, grouped by
- document type (grants, applications),
- sequence type (nucleotides, peptides), and
- document location (in claims, all locations – please note: “in claims” is a subset of the “all” dataset). For example, “Grants: Nucleotides (all)” refers to nucleotide sequences disclosed in granted patent documents regardless of where they are referenced in the documents whereas “Grants: Nucleotides (in Claims)” refers to nucleotide sequences referenced in the claims of the granted patent documents
Download API
An API is also provided for downloading PatSeq bulk data programmatically (e.g. in automated scheduled scripts). To use the API, you will need to create an API access token to authenticate your application/client and access the download files.
Create your access token
You can create and manage your API access tokens in the Subscriptions tab of the API & Data page. Click on the “Create Token” button to create a new token. You can generate up to 5 tokens, and to allow you to distinguish them you can label them individually.
Note: When generating a new access token, ensure you copy it somewhere safe. For security reasons, it won’t be displayed again. API access tokens can be revoked by deleting the API access token.

Using the API
The API endpoint is https://www.lens.org/lens/bio/psd/api.
We also provide a technical documentation with direct access to the API using a Swagger interface which is available here https://support.lens.org/knowledge-base/patseq-bulk-download-api-documentation/
The Bulk Download API provides two endpoints:
/files – to retrieve a list of available files
- Method: GET
- URL:
https://www.lens.org/lens/bio/psd/api/files
- Request Parameters:
- access_token : (String) your API access token
/download – to download specific files
- Method: GET
- URL:
https://www.lens.org/lens/bio/psd/api/download
- Request Parameters:
- access_token : (String) your API access token
- type: (String) the file type of your access plan (hg_fasta, fasta or rich).
- file : (String) relative file path as returned by the /files API call.
Note: All request parameters are required and the access token can either be submitted using the HTTP header Authorization field, or alternatively using the URI access_token request parameter. See rfc6750 – Bearer Token Usage for more details.
API Examples
Get Files
To return a list of all the available download files, you should use the following URL, replacing {your_personal_token}
with your API access token:
A list of download files will be returned in JSON format, see example below.

Get Download
To download a specific file, you will need to use the following URL and parameters
Where {your_personal_token}
is your API access token, {file_type}
is the file type for your access plan and {file}
is the relative file path as returned by the /files API call. For example to download all the latest nucleotide sequences extracted from US grants and referenced in the claims in FASTA file format, you should use the following link:
with
wget "https://www.lens.org/lens/bio/psd/api/download?access_token={your_personal_token}&type=fasta&file=us/grant/na-claims.fa.gz" -O us-grants-na-claims.fa.gz
Or you can specify the access token in the HTTP Authorization header:
- HTTP Authorization header:
"Authorization: Bearer {your_personal_token}"
- Request URL:
"https://www.lens.org/lens/bio/psd/api/download?type=fasta&file=us/grant/na-claims.fa.gz"
Please note, whatever HTTP client you use will need to be able to follow 302 redirects.