Skip to main content
Figure 3 | Microbial Informatics and Experimentation

Figure 3

From: Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli

Figure 3

Statistical significance of primary sequence parameters in predicting outcome. Ordinal logistic regressions were performed to evaluate whether sequence parameters have a significant influence on the expression levels and solubility scores observed for proteins in the Analysis Dataset (Figure 1). Calculations based on E employed a training set of 7,733 proteins, while those based on S employed the 6,046 proteins in this set with E >0. The ordinate, labeled "Signed -log(p)", shows the negative of the base-10 logarithm of the p-value for the corresponding regression multiplied by the sign of that regression's slope. This scales monotonically with the parameter's "predictive value" (the product of the parameter's regression slope and standard deviation in the dataset). Parameters are arranged by the strength of their influence on E value after segregation of fractional amino acid content from compound sequence parameters. The dotted line shows a Bonferroni-corrected significance threshold of 0.0015. The naïvely counterintuitive negative correlations between net electrostatic charge and both E and S derive from two reinforcing sources. Negatively charged residues have a beneficial/positive influence on both E and S (Additional file 1, Figure S4), which makes the regression slopes negative due to the negative mathematical value of their charge. In the case of E, this effect is reinforced by the deleterious influence of positively charged residues, which makes the regression slope negative for this mathematically positive parameter. The deleterious influence of isoelectric point (pI) on E and S, which has been noted previously [82], is attributable to similar causes (Figure 1 & Additional file 1, Figure S4).

Back to article page