Skip to main content
Figure 4 | Microbial Informatics and Experimentation

Figure 4

From: Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli

Figure 4

Statistical analyses of alternative datasets. Single ordinal logistic regressions equivalent to those performed on the Analysis Dataset in Figure 3 were performed on six alternative datasets. Predictive values (defined in Figure 3) are shown for expression level (panel A) and solubility score (panel B). The dotted lines indicate the Bonferroni-corrected threshold for significance in the Analysis Dataset (because significance depends on dataset size). The "Blind Average Dataset" contains all NESG protein constructs from the second phase of the Protein Structure Initiative, including multiple constructs for many targets, with scores averaged from all replicate expression trials (19,746 constructs for 13,342 targets). The "Max ExS Dataset" contains exclusively eubacterial proteins from the Analysis Dataset, with the E and S scores taken from the expression trial with the highest E*S value (7,113 constructs for 5,218 targets). The "Max ExS; 1 Construct Dataset" include exactly one construct per target in the Max ExS Dataset (5,218 constructs and targets); the construct with the highest value of E*S was retained for targets with multiple constructs. The "Blind Av.; No Memb./Secr. Dataset" contains proteins from the Blind Average Dataset excluding those predicted by a wider range of metrics to have a transmembrane α-helix or an N-terminal signal peptide or lipopeptide directing secretion out of the cytoplasm (16,888 constructs for 11,698 targets). The "100% Consistent; No Memb./Secr. Dataset" includes only eubacterial proteins with identical E and S scores in all small-scale expression trials and not predicted by any algorithm to have a transmembrane α-helix, N-terminal signal peptide, or lipopeptide (3,633 constructs for 2,583 targets). The "Human Dataset" includes all human proteins from the Blind Average Dataset (3,350 constructs for 1,534 targets). Results from the Analysis Dataset (Figure 3) are shown for comparison.

Back to article page