unipept pept2prot
Returns the set of UniProt entries containing a given tryptic peptide.
The unipept pept2prot
command takes one or more tryptic peptides as input and returns the set of UniProt entries containing the peptides as output. This information is fetched by doing API-requests to the Unipept server.
Input
The unipept pept2prot
command expects tryptic peptides as input. The source of this input can be command line arguments, a file, or standard input. If input is supplied using multiple sources at the same time, the order of priority as described above is used.
Command line arguments
If input is supplied using command line arguments, the peptides must be separated by spaces.
Example
$ unipept pept2prot MDGTEYIIVK ISVAQGASK peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606
File input
Use the --input parameter to specify a file to use as input. If input is supplied using a file, a single peptide per line is expected.
Example
$ cat input.txt MDGTEYIIVK ISVAQGASK $ unipept pept2prot --input input.txt peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606
Standard input
If the command is run without arguments and no file is specified, unipept pept2prot
will read its input from standard input. When standard input is used, a single peptide per line is expected.
Example
$ cat input.txt MDGTEYIIVK ISVAQGASK $ cat input | unipept pept2prot peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606
Output
The unipept pept2prot
command outputs all UniProt entries that contain the given (tryptic) input peptides. By default, for each of the matching UniProt entries, the accession number, the name of the protein and the NCBI taxon id are returned. By using the --all parameter, this can be supplemented with the name of the associated taxon and several cross referenes such as the the associated EC numbers and GO terms. Consult the API documentation for a detailed list of output fields. A selection of output fields can be specified with the --select parameter. By default, output is generated in csv format. By using the --format parameter, the format can be changed into json or xml. The output can be written to a file or to standard output.
File output
Use the --output parameter to specify an output file. If the file aready exists, the output will be appended to the end of the file.
$ unipept pept2prot --output output.txt MDGTEYIIVK ISVAQGASK $ cat output.txt peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606
Standard output
If no output file is specified, unipept pept2prot
will write its output to standard output.
Example
$ unipept pept2prot MDGTEYIIVK ISVAQGASK peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606 $ unipept pept2prot MDGTEYIIVK ISVAQGASK > output.txt $ cat output.txt peptide,uniprot_id,protein_name,taxon_id MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606
Fasta support
The unipept pept2prot
command supports input (from any source) in a fasta-like format (for example generated by the prot2pept command). This format consists of a fasta header (a line starting with a >), followed by one or more lines containing one peptide each. When this format is detected, the output will automatically include an extra information field containing the corresponding fasta header.
Example
$ cat input.txt > header 1 ISVAQGASK MDGTEYIIVK > header 2 ISVAQGASK $ unipept pept2prot --input input.txt fasta_header,peptide,uniprot_id,protein_name,taxon_id > header 1,MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412 > header 1,ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 > header 1,ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606 > header 2,MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412
Command-line options
--equate / -e Equate isoleucine and leucine
If the --equate
flag is set, isoleucine (I) and leucine (L) are equated when matching tryptic peptides to UniProt entries. This is similar to checking the Equate I and L? checkbox when performing a search in the Unipept web interface.
Example
$ unipept pept2prot FEALLGDGSQYGLHLQYK peptide,uniprot_id,protein_name,taxon_id FEALLGDGSQYGLHLQYK,D1PLT2,Glucose-1-phosphate thymidylyltransferase,411471 FEALLGDGSQYGLHLQYK,K1TWG3,"Glucose-1-phosphate thymidylyltransferase, long form",408170 $ unipept pept2prot --equate FEALLGDGSQYGLHLQYK peptide,uniprot_id,protein_name,taxon_id FEALLGDGSQYGLHLQYK,D4K7A9,Glucose-1-phosphate thymidylyltransferase,657322 FEALLGDGSQYGLHLQYK,D4K112,Glucose-1-phosphate thymidylyltransferase,718252 FEALLGDGSQYGLHLQYK,D1PLT2,Glucose-1-phosphate thymidylyltransferase,411471 FEALLGDGSQYGLHLQYK,A8SH27,Glucose-1-phosphate thymidylyltransferase,411485 FEALLGDGSQYGLHLQYK,K1TWG3,"Glucose-1-phosphate thymidylyltransferase, long form",408170 FEALLGDGSQYGLHLQYK,E2ZLF5,Glucose-1-phosphate thymidylyltransferase,748224 FEALLGDGSQYGLHLQYK,R6Q2J6,Glucose-1-phosphate thymidylyltransferase,1262898 FEALLGDGSQYGLHLQYK,C7HAW8,Glucose-1-phosphate thymidylyltransferase,411483
--input / -i Specify an input file
All Unipept CLI commands can process input from 3 sources: command line arguments, a file, or standard input. The optional --input
option allows you to specify an input file. The file should contain a single peptide per line.
Example
$ cat input.txt ISVAQGASK OMGWTFBBQ MDGTEYIIVK $ unipept pept2prot --input input.txt peptide,uniprot_id,protein_name,taxon_id ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606 MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412
--output / -o Specify an output file
By default, the unipept commands write their output to standard output. Using the optional --output
option allows you to specify a file to write the output to. If the file already exists, the output will be appended; if it doesn't, a new file will be created.
Example
$ unipept pept2prot --output output.txt ISVAQGASK MDGTEYIIVK $ cat output.txt peptide,uniprot_id,protein_name,taxon_id ISVAQGASK,Q9Y6R7,IgGFc-binding protein,9606 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,9606 MDGTEYIIVK,C6JD41,10 kDa chaperonin,457412
--select / -s Specify the output fields
By default, the Unipept CLI commands output all information fields received from the Unipept server. The --select
option allows you to control which fields are returned. A list of fields can be specified by a comma-separated list, or by using multiple --select
options. A * can be used as a wildcard for field names. For example, --select peptide,taxon*
will return the peptide field and all fields starting with taxon.
Example
$ unipept pept2prot --select peptide,uniprot_id MDGTEYIIVK peptide,uniprot_id MDGTEYIIVK,C6JD41 $ unipept pept2prot --select peptide --select *id MDGTEYIIVK peptide,uniprot_id,taxon_id MDGTEYIIVK,C6JD41,457412
--format / -f Specify the output format
By default, the Unipept CLI commands return their output in csv format. The --format
option allows you to select another format. Supported formats are csv, json, and xml.
Example
$ unipept pept2prot --format json ISVAQGASK MDGTEYIIVK [{"peptide":"ISVAQGASK","uniprot_id":"Q9Y6R7","protein_name":"IgGFc-binding protein","taxon_id":9606},{"peptide":"ISVAQGASK","uniprot_id":"A0A087WXI2","protein_name":"IgGFc-binding protein","taxon_id":9606},{"peptide":"MDGTEYIIVK","uniprot_id":"C6JD41","protein_name":"10 kDa chaperonin","taxon_id":457412}] $ unipept pept2prot --format xml ISVAQGASK MDGTEYIIVK <results><result><peptide>ISVAQGASK</peptide><uniprot_id>Q9Y6R7</uniprot_id><protein_name>IgGFc-binding protein</protein_name><taxon_id>9606</taxon_id></result><result><peptide>ISVAQGASK</peptide><uniprot_id>A0A087WXI2</uniprot_id><protein_name>IgGFc-binding protein</protein_name><taxon_id>9606</taxon_id></result><result><peptide>MDGTEYIIVK</peptide><uniprot_id>C6JD41</uniprot_id><protein_name>10 kDa chaperonin</protein_name><taxon_id>457412</taxon_id></result></results>
--all / -a Request additional information
By default, the Unipept CLI commands only request basic information from the Unipept server. By using the --all
flag, you can request additional information fields such as cross references of the returned UniProt entries. You can use the --select
option to select which fields are included in the output.
Performance penalty
Setting --all
has a performance penalty inferred from additional database queries. Do not use this option unless the extra information fields are strictly needed.
Example
$ unipept pept2prot --all --select peptide,uniprot_id,*name,go_references ISVAQGASK peptide,uniprot_id,protein_name,taxon_name,go_references ISVAQGASK,Q9Y6R7,IgGFc-binding protein,Homo sapiens,GO:0070062 ISVAQGASK,A0A087WXI2,IgGFc-binding protein,Homo sapiens,
--help / -h Display help
This flag displays the help.
Meganize
The unipept pept2prot
command can be used in combination with Megan, for example to perform a functional analysis of the sample. This requires using the --meganize
option that was added in version 1.2.0.
Example
$ unipept pept2prot --meganize MDGTEYIIVK ISVAQGASK MDGTEYIIVK ref|WP_008705701.1| 100 10 0 0 0 10 0 10 1e-100 100 ISVAQGASK ref|NP_003881.2 XP_011547112.1 XP_011547113.1| 100 10 0 0 0 10 0 10 1e-100 100
The generated output can be saved to a file and imported into Megan using the following settings:
- Import from blast
- Select the file with the
unipept pept2prot --meganize
output - Set format to blastTab and mode to blastp
- Remove the fasta mapping that was automatically added
- Enable the tabs you want (taxonomy/interpro2go/...), but always select the "use accession" option for this
- Go back to the first tab and make sure it says blasttab and blastp, because sometimes it changes back when you switch tabs
- Click apply