The umgap taxa2freq command creates a frequency table of a list of taxa on a given target rank (species by default).

Usage

The input is given on standard input, a single taxon ID on each line. Each taxon that is more specific than the target rank is counted towards its ancestor on the target rank. Each taxon less specific than the target rank is counted towards root. The command outputs a TSV table of counts, taxon IDs and their names.

The taxonomy to be used is passed as an argument to this command. This is a preprocessed version of the NCBI taxonomy.

$ cat input.txt
9606
9606
2759
9606
9606
9606
9606
9606
9606
9606
8287
$ umgap taxa2freq taxons.tsv < input.txt
2	1	root
9	9606	Homo sapiens

With the -r option, the default species rank can be set to any named rank.

$ umgap taxa2freq -r phylum taxons.tsv < input.txt
10	7711	Chordata
-h / --help
Prints help information
-V / --version
Prints version information
-f / --frequency f
The minimum frequency to be reported [default: 1]
-r / --rank r
The rank to show [default: species]