Usage¶
Basic Usage¶
As covered in Quickstart, the basic CAI()
function is fast and
easy. Simply import it and get to your science. Note that it also plays nicely
with Biopython Seq objects:
>>> from CAI import CAI
>>> from Bio.Seq import Seq
>>> CAI(Seq("AAT"), reference=[Seq("AAC")])
0.5
The CLI is equally easy to use. For example, to find the CAI of the native GFP gene with respect to the highly expressed genes in E. coli, only one command is required:
$ CAI -r example_seqs/ecol.heg.fasta -s example_seqs/gfp.fasta
0.3753543123685772
Note
Both CAI
and cai
are valid commands.
More example sequences can be found in the example_seqs
directory on GitHub.
Advanced Usage¶
If you have already computed the weights or RSCU values of the reference set,
you can supply CAI()
with one or the other as arguments. They must be
formatted as a dictionary and contain values for every codon.
To calculate RSCU without calculating CAI, you can use RSCU()
. RSCU()
’s only
required argument is a list of sequences.
Similarly, to calculate the weights of reference sequences, you can use
relative_adaptiveness()
. relative_adaptiveness()
takes either a list of
sequences as the sequences
parameter or a dictionary of RSCUs as the RSCUs
parameter.
Warning
If you are computing large numbers of CAIs with the same reference
sequences, first calculate their weights with relative_adaptiveness()
and then pass that to CAI()
to eliminate redundant computation.
So, to modify the example in Quickstart:
>>> from CAI import CAI, relative_adaptiveness
>>> sequences=["ATGTTT...", "ATGCGC...",...]
>>> weights = relative_adaptiveness(sequences=sequences)
>>> CAI("ATG...", weights=weights)
0.24948128951724224
These are exactly equivalent:
>>> assert CAI("ATG...", weights=weights) == CAI("ATG...", reference=sequences)
True
except the former will be faster if you’re using the same weights repeatedly.
Other Genetic Codes¶
All functions in CAI support an optional genetic_code
parameter, which is set
by default to 11 (the standard genetic code).
In the CLI, there is an optional “-g” parameter that changes the genetic code:
$ CAI -s sequence.fasta -r reference_sequences.fasta -g 22
0.25135779681923687