The umgap taxa2freq command creates a frequency table of a list of taxa on a given target rank (species by default). When invoked with file arguments, it adds a column for each file.

Usage

The input is given on standard input, or in multiple file arguments, a single taxon ID on each line. Each taxon that is more specific than the target rank is counted towards its ancestor on the target rank. Each taxon less specific than the target rank is counted towards root. The command outputs a CSV table of counts, taxon IDs and their names.

The taxonomy to be used is passed as an argument to this command. This is a preprocessed version of the NCBI taxonomy.

$ cat input.txt
9606
9606
2759
9606
9606
9606
9606
9606
9606
9606
8287
$ umgap taxa2freq taxons.tsv < input.txt
taxon id,taxon name,stdin
1,root,2
9606,Homo sapiens,9
$ umgap taxa2freq taxons.tsv input.txt input.txt
taxon id,taxon name,input.txt,input.txt
1,root,2,2
9606,Homo sapiens,9,9

With the -r option, the default species rank can be set to any named rank.

$ umgap taxa2freq -r phylum taxons.tsv < input.txt
taxon id,taxon name,stdin
7711,Chordata,10
-h / --help
Prints help information
-V / --version
Prints version information
-f / --frequency f
The minimum frequency to be reported [default: 1]
-r / --rank r
The rank to show [default: species]