unipept taxonomy
Returns the taxonomic information for a given NCBI taxon identifier.
The unipept taxonomy
command takes one or more NCBI taxon id's as input and returns taxonomic information about these taxa as output. All this information is fetched by doing API-requests to the Unipept server.
Input
The unipept taxonomy
command expects NCBI taxon id's as input. The source of this input can be command line arguments, a file, or standard input. If input is supplied using multiple sources at the same time, the order of priority as described above is used.
Command line arguments
If input is supplied using command line arguments, the taxon id's must be separated by spaces.
Example
$ unipept taxonomy 817 329854 taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
File input
Use the --input parameter to specify a file to use as input. If input is supplied using a file, a single taxon id per line is expected.
Example
$ cat input.txt 817 329854 $ unipept taxonomy --input input.txt taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
Standard input
If the command is run without arguments and no file is specified, unipept taxonomy
will read its input from standard input. When standard input is used, a single taxon id per line is expected.
Example
$ cat input.txt 817 329854 $ cat input | unipept taxonomy taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
Output
The unipept taxonomy
command outputs taxonomic information for a given list of taxon id's. By default, the taxon name and taxonomic rank of each taxon id are returned. By using the --all parameter, this can be supplemented with the full taxonomic lineage of each taxon. Consult the API documentation for a detailed list of output fields. A selection of output fields can be specified with the --select parameter. By default, output is generated in csv format. By using the --format parameter, the format can be changed into json or xml. The output can be written to a file or to standard output.
File output
Use the --output parameter to specify an output file. If the file aready exists, the output will be appended to the end of the file.
$ unipept taxonomy --output output.txt 817 329854 $ cat output.txt taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
Standard output
If no output file is specified, unipept taxonomy
will write its output to standard output.
Example
$ unipept taxonomy 817 329854 taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species $ unipept taxonomy 817 329854 > output.txt $ cat output.txt taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
Fasta support
The unipept taxonomy
command supports input (from any source) in a fasta-like format (for example generated by the prot2pept command). This format consists of a fasta header (a line starting with a >), followed by one or more lines containing one taxon id each. When this format is detected, the output will automatically include an extra information field containing the corresponding fasta header.
Example
$ cat input.txt > header 1 817 329854 > header 2 817 $ unipept pept2lca --input input.txt fasta_header,taxon_id,taxon_name,taxon_rank > header 1,817,Bacteroides fragilis,species > header 1,329854,Bacteroides intestinalis,species > header 2,817,Bacteroides fragilis,species
Command-line options
--input / -i Specify an input file
All Unipept CLI commands can process input from 3 sources: command line arguments, a file, or standard input. The optional --input
option allows you to specify an input file. The file should contain a single peptide per line.
Example
$ cat input.txt 817 329854 $ unipept taxonomy --input input.txt taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
--output / -o Specify an output file
By default, the unipept commands write their output to standard output. Using the optional --output
option allows you to specify a file to write the output to. If the file already exists, the output will be appended; if it doesn't, a new file will be created.
Example
$ unipept taxonomy --output output.txt 817 329854 $ cat output.txt taxon_id,taxon_name,taxon_rank 817,Bacteroides fragilis,species 329854,Bacteroides intestinalis,species
--select / -s Specify the output fields
By default, the Unipept CLI commands output all information fields received from the Unipept server. The --select
option allows you to control which fields are returned. A list of fields can be specified by a comma-separated list, or by using multiple --select
options. A * can be used as a wildcard for field names. For example, --select peptide,taxon*
will return the peptide field and all fields starting with taxon.
Example
$ unipept taxonomy --select taxon_id,taxon_name 817 329854 taxon_id,taxon_name 817,Bacteroides fragilis 329854,Bacteroides intestinalis $ unipept taxonomy --select taxon_id --select *rank 817 329854 taxon_id,taxon_rank 817,species 329854,species
--format / -f Specify the output format
By default, the Unipept CLI commands return their output in csv format. The --format
option allows you to select another format. Supported formats are csv, json, and xml.
Example
$ unipept taxonomy --format json 817 329854 [{"taxon_id":817,"taxon_name":"Bacteroides fragilis","taxon_rank":"species"},{"taxon_id":329854,"taxon_name":"Bacteroides intestinalis","taxon_rank":"species"}] $ unipept taxonomy --format xml 817 329854 <results><result><taxon_id>817</taxon_id><taxon_name>Bacteroides fragilis</taxon_name><taxon_rank>species</taxon_rank></result><result><taxon_id>329854</taxon_id><taxon_name>Bacteroides intestinalis</taxon_name><taxon_rank>species</taxon_rank></result></results>
--all / -a Request additional information
By default, the Unipept CLI commands only request basic information from the Unipept server. By using the --all
flag, you can request additional information fields such as the lineage of the returned taxa. You can use the --select
option to select which fields are included in the output.
Performance penalty
Setting --all
has a performance penalty inferred from additional database queries. Do not use this option unless the extra information fields are strictly needed.
Example
$ unipept taxonomy --all --select taxon_id,order* 817 329854 taxon_id,order_id,order_name 817,171549,Bacteroidales 329854,171549,Bacteroidales
--help / -h Display help
This flag displays the help.