Skip to main content
Figure 1 | Microbial Informatics and Experimentation

Figure 1

From: Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli

Figure 1

Distribution of proteins by expression level and solubility score. The principal dataset analyzed in this paper contains 9,644 target proteins that went through small-scale E. coli expression trials in the NESG protein-production pipeline. It was randomly divided into an Analysis Dataset used for regression analyses and model development (7,733 proteins) and a Test Dataset used for model validation (the remaining 1,911 proteins). Each protein was assigned independent integer values from 0-5 for its expression level (E) and solubility score (S), as assessed by visual inspection of a Coomasie-Blue-stained SDS-PAGE gel containing the total cell extract and corresponding soluble fraction. (A) Distribution of E scores in the combined Analysis and Test Datasets. (B) Distribution of S scores for proteins with non-zero expression in the combined Datasets. (C) Bubble-plot showing the relative number of proteins in bins segregated simultaneously by both expression level and solubility score. The area of each bubble is proportional to the number of proteins with that exact combination of E and S values. Therefore, each column in the plot represents all proteins with the corresponding E score, while each row represents all proteins with the corresponding S score. For example, the upper-right-most bubble shows that 1,653 constructs had a S score of 5 among the 3,957 proteins with an E score of 5, which are represented by the total area of all bubbles in the right-most column. In this dataset, 3,880 proteins were considered useable for purification and biophysical characterization, as defined by having E*S > 11.

Back to article page