Usage

Basic Usage

As covered in Quickstart, the basic CAI() function is fast and easy. Simply import it and get to your science. Note that it also plays nicely with Biopython Seq objects:

>>> from CAI import CAI
>>> from Bio.Seq import Seq
>>> CAI(Seq("AAT"), reference=[Seq("AAC")])
0.5

The CLI is equally easy to use. For example, to find the CAI of the native GFP gene with respect to the highly expressed genes in E. coli, only one command is required:

$ CAI -r example_seqs/ecol.heg.fasta -s example_seqs/gfp.fasta
0.3753543123685772

Note

Both CAI and cai are valid commands.

More example sequences can be found in the example_seqs directory on GitHub.

Advanced Usage

If you have already computed the weights or RSCU values of the reference set, you can supply CAI() with one or the other as arguments. They must be formatted as a dictionary and contain values for every codon.

To calculate RSCU without calculating CAI, you can use RSCU(). RSCU()’s only required argument is a list of sequences.

Similarly, to calculate the weights of reference sequences, you can use relative_adaptiveness(). relative_adaptiveness() takes either a list of sequences as the sequences parameter or a dictionary of RSCUs as the RSCUs parameter.

Warning

If you are computing large numbers of CAIs with the same reference sequences, first calculate their weights with relative_adaptiveness() and then pass that to CAI() to eliminate redundant computation.

So, to modify the example in Quickstart:

>>> from CAI import CAI, relative_adaptiveness
>>> sequences=["ATGTTT...", "ATGCGC...",...]
>>> weights = relative_adaptiveness(sequences=sequences)
>>> CAI("ATG...", weights=weights)
0.24948128951724224

These are exactly equivalent:

>>> assert CAI("ATG...", weights=weights) == CAI("ATG...", reference=sequences)
True

except the former will be faster if you’re using the same weights repeatedly.

Other Genetic Codes

All functions in CAI support an optional genetic_code parameter, which is set by default to 11 (the standard genetic code).

In the CLI, there is an optional “-g” parameter that changes the genetic code:

$ CAI -s sequence.fasta -r reference_sequences.fasta -g 22
0.25135779681923687