prot2pept
Splits proteins into peptides based on (trypsin) digest.
The prot2pept
command takes one or more protein sequences as input, performs an in silico tryptic digest on them and returns the digested peptides as output. By default, a trypsin digest is simulated, but other proteases can be specified by using the --pattern parameter. This command runs entirely locally and doesn't connect to any server.
Input
The prot2pept
command expects protein sequences as input via standard input. A single protein sequences per line is expected.
Example
$ cat input.txt AALTERAALE MDGTEKYIIVK $ cat input | prot2pept AALTER AALE MDGTEK YIIVK
Output
The prot2pept
command outputs the split peptides to standard output. All peptides are separated by newlines.
Fasta support
The prot2pept
command supports input in fasta format. This format consists of a fasta header (a line starting with a >), followed by one or more lines containing the protein sequence. When this format is detected, the command behaves slightly different. The main difference is that newlines between fasta headers are ignored: all lines between fasta headers are treated as a single protein. Next to this, the fasta headers are also written to output.
Example
$ cat input.txt AALTE AALRTER $ cat input.txt | prot2pept AALTE AALR TER $ cat input.txt > fasta header AALTE AALRTER > other header PEPTIDE $ cat input.txt | prot2pept > fasta header AALTEAALR TER > other header PEPTIDE
Command-line options
--pattern / -p Specify cleavage pattern
By default, proteins are split by simulating a trypsin digest. This corresponds by splitting the input string by using the regular expression ([KR])([^P])
. The --pattern
option allows you to specify an alternative (ruby-style) regular expression to split the sequences.
Example
$ echo "LGAARPLGAGLAKVIGAGIGIGK" | prot2pept LGAARPLGAGLAK VIGAGIGIGK $ echo "LGAARPLGAGLAKVIGAGIGIGK" | prot2pept --pattern '([KR])([^V])' LGAAR PLGAGLAKVIGAGIGIGK
--help / -h Display help
This flag displays the help.