umgap taxonomy
Includes the taxonomic information in a stream of NCBI taxon IDs.
The umgap taxonomy
command takes one or more NCBI taxon IDs as input, searches for them in the taxonomy and outputs more information about them in a TSV format.
Usage
The input is given on standard input and may be any sequence of FASTA headers and/or lines containing a single NCBI taxon ID. A TSV header is printed to standard output. The FASTA headers (any line starting with a >
) are just copied over. Each of the taxon IDs on the other lines is looked up in a taxonomy, and the ID, name and rank of the taxon are written out separated by tabs.
A taxonomy file must be passed as argument.
$ cat input.fa 2026807 888268 186802 1598 1883 $ umgap taxonomy taxons.tsv < input.fa taxon_id taxon_name taxon_rank 2026807 Zetaproteobacteria bacterium species 888268 Dichanthelium oligosanthes species 186802 Clostridiales order 1598 Lactobacillus reuteri species 1883 Streptomyces genus
The -H flag can be used to suppress the TSV header, for instance when dealing with FASTA input.
$ cat input2.fa >header1 2026807 888268 186802 1598 1883 $ umgap taxonomy -H taxons.tsv < input2.fa >header1 2026807 Zetaproteobacteria bacterium species 888268 Dichanthelium oligosanthes species 186802 Clostridiales order 1598 Lactobacillus reuteri species 1883 Streptomyces genus
The -a flag can be used to request a complete ranked lineage.
$ cat input3.fa 888268 $ umgap taxonomy -a taxons.tsv < input3.fa taxon_id taxon_name taxon_rank superkingdom_id superkingdom_name kingdom_id kingdom_name subkingdom_id subkingdom_name superphylum_id superphylum_name phylum_id phylum_name subphylum_id subphylum_name superclass_id superclass_name class_id class_name subclass_id subclass_name infraclass_id infraclass_name superorder_id superorder_name order_id order_name suborder_id suborder_name infraorder_id infraorder_name parvorder_id parvorder_name superfamily_id superfamily_name family_id family_name subfamily_id subfamily_name tribe_id tribe_name subtribe_id subtribe_name genus_id genus_name subgenus_id subgenus_name species_group_id species_group_name species_subgroup_id species_subgroup_name species_id species_name subspecies_id subspecies_name varietas_id varietas_name forma_id forma_name 888268 Dichanthelium oligosanthes species 2759 Eukaryota 33090 Viridiplantae 35493 Streptophyta 131221 Streptophytina 3398 Magnoliopsida 1437197 Petrosaviidae 38820 Poales 4479 Poaceae 147369 Panicoideae 147428 Paniceae 1648011 Dichantheliinae 161620 Dichanthelium 888268 Dichanthelium oligosanthes
- -a / --all
- Show the full lineage of a taxon
- -h / --help
- Prints help information
- -H / --no-header
- Do not output the TSV header
- -V / --version
- Prints version information