API Reference

RSCU(sequences, genetic_code=11)

Calculates the relative synonymous codon usage (RSCU) for a set of sequences.

RSCU is ‘the observed frequency of [a] codon divided by the frequency expected under the assumption of equal usage of the synonymous codons for an amino acid’ (page 1283).

In math terms, it is

\[\frac{X_{ij}}{\frac{1}{n_i}\sum_{j=1}^{n_i}x_{ij}}\]

“where \(X\) is the number of occurrences of the \(j\) th codon for the \(i\) th amino acid, and \(n\) is the number (from one to six) of alternative codons for the \(i\) th amino acid” (page 1283).

Parameters:
  • sequences (list) – The reference set of sequences.
  • genetic_code (int, optional) – The translation table to use. Defaults to 11, the standard genetic code.
Returns:

The relative synonymous codon usage.

Return type:

dict

Raises:

ValueError – When an invalid sequence is provided or a list is not provided.

relative_adaptiveness(sequences=None, RSCUs=None, genetic_code=11)

Calculates the relative adaptiveness/weight of codons.

The relative adaptiveness is “the frequency of use of that codon compared to the frequency of the optimal codon for that amino acid” (page 1283).

In math terms, \(w_{ij}\), the weight for the \(j\) th codon for the \(i\) th amino acid is

\[w_{ij} = \frac{\text{RSCU}_{ij}}{\text{RSCU}_{imax}}\]

where “\(\text{RSCU}_{imax}\) [is] the RSCU… for the frequently used codon for the \(i\) th amino acid” (page 1283).

Parameters:
  • sequences (list, optional) – The reference set of sequences.
  • RSCUs (dict, optional) – The RSCU of the reference set.
  • genentic_code (int, optional) – The translation table to use. Defaults to 11, the standard genetic code.

Note

Either sequences or RSCUs is required.

Returns:

A mapping between each codon and its weight/relative adaptiveness.

Return type:

dict

Raises:
CAI(sequence, weights=None, RSCUs=None, reference=None, genetic_code=11)

Calculates the codon adaptation index (CAI) of a DNA sequence.

CAI is “the geometric mean of the RSCU values… corresponding to each of the codons used in that gene, divided by the maximum possible CAI for a gene of the same amino acid composition” (page 1285).

In math terms, it is

\[\left(\prod_{k=1}^Lw_k\right)^{\frac{1}{L}}\]

where \(w_k\) is the relative adaptiveness of the \(k\) th codon in the gene (page 1286).

Parameters:
  • sequence (str) – The DNA sequence to calculate the CAI for.
  • weights (dict, optional) – The relative adaptiveness of the codons in the reference set.
  • RSCUs (dict, optional) – The RSCU of the reference set.
  • reference (list) – The reference set of sequences.

Note

One of weights, reference or RSCUs is required.

Returns:

The CAI of the sequence.

Return type:

float

Raises:
  • TypeError – When anything other than one of either reference sequences, or RSCU dictionary, or weights is provided.
  • ValueError – See RSCU() for details.
  • KeyError – When there is a missing weight for a codon.

Warning

Will return nan if the sequence only has codons without synonyms.