What you will read about: our opinion about gene optimization in E.coli.
The redesign of coding sequences using predetermined parameters allows improved protein expression and simple sub-cloning steps, makes gene synthesis unavoidable in recombinant protein production projects.
The first generation of gene optimization tools dealt predominantly with singular objectives such as codon usage optimization or unique restriction site incorporation. In recent years we have seen the emergence of sequence design tools that pursue additional objectives. Since there are apparently as many algorithms as providers, it is crucial to keep a critical eye on each optimization suggested.
So we thought we would offer a review of the most important criteria that, from our point of view, should be considered for genes design toward optimized protein expression.
When talking about optimizing a sequence, rare codons are clearly the first focus because they are associated with low heterologous expression level due to ribosome pausing and mRNA cleavage, ribosomal frame shifting, or amino-acid mis-incorporation. Several approaches have been developed to avoid rare codons and to adapt synthetic gene codons to the host tRNA pools: CAI maximization, codon harmonization, codon sampling, etc… But the efficiency of these approaches is still unclear, as they have not been systematically compared against each other. Studies arguing both for and against their efficiency can be found in the literature. The best way of addressing this question is probably provided by DNA2.0 (https://www.dna20.com/services/gene-synthesis). They overcome the codon frequency paradigm in considering that favorable codons are predominantly those read by the tRNAs that are most highly charged during amino acid starvation and not codons that are the most abundant in highly expressed E. coli proteins (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007002) . That makes sense!
In addition to the codon bias, additional parameters are now considered to optimize a gene sequence such as the codon context bias, local mRNA folding, GC content, hidden stop codons and motifs to avoid. But once again, no clear evidence has been drawn up on the importance of most of these parameters. To our point of view, the only parameter that should systematically be considered is mRNA secondary structure. Indeed, secondary structures in mRNA have been shown to reduce or block the ribosome binding and transit, significantly affecting translation if they are found in the 5’UTR. But this aspect is neglected by most gene synthesis providers who rarely ask for the 5’UTR sequence carried on the expression vector. This can have a significant impact on the expression of the protein of interest.
As a conclusion,
From the literature, it is difficult to have a clear picture of which elements really have a positive impact on translation throughput and thus to define which parameters are critical and which are artefacts of others. We would recommend first to stay focused on fundamentals: codon bias (based on charged tRNA) and mRNA secondary structure, then to compare whenever possible, several synonymous optimized sequences ordered from different gene synthesis providers.
On our side, in order to ensure the gene design meets our criteria, we have developed CLAF’OUTIL, our own fully customizable gene optimization tool.