Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

McElroy, Kerensa; Thomas, Torsten; Luciani, Fabio

doi:10.1186/2042-5783-4-1

Microbial Informatics and Experimentation

Table 1 Representative examples of deep sequencing applied to viral populations

From: Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

Pathogen	Design	Technology	Ref seq	Filter	Align	SNV	Hap	Application	Reference
HIV	RT-PCR, nested PCR of pol fragment	Roche-454 GS-FLX amplicon sequencing	Sanger sequenced pol gene	In-house software: removes reads with ambiguous bases, < 80% similarity to reference, or outside region of interest	GS amplicon software (Roche, Penzberg, Germany), Needleman-Wunsch	In house scripts, manual inspection: remove gaps, remove reads with frameshift indels or stop codons, remove variants only contained in reads in one direction, positional variant cut-off values based on control sequences	Individual reads (40 bp region of interest)	Longitudinal emergence of drug resistance during treatment failure	[10]
HIV	RT, PCR amplificatin of 4 fragments (3.5 kb each). Full genome analysis	Roche-454 GS-FLX Titanium	De novo assembled reference using AssembleViral454 v1.0	NS	Mosaik	RC454 / V-Phaser	V-Phaser (one read length only)	Longitudinal emergence of CD8+ T cell escape variants, viral adaptation	[11]
HCV	RT, PCR amplification of HVR-1, nested PCR using sequencing adapters	Roche-454 GS-FLX Titanium amplicon sequencing	358 HCV HVR-1 representative sequences from Los Alamos National Laboratory HCV	Flow clustering as implemented in QIIME, only reads covering entire region of interest	MAFFT (multiple sequence alignment)	NA	Individual reads	Identification of a transmission event	[12]
HCV	Whole-genome library prep direct from RNA isolated from human serum, using mRNA-seq sample prep kit (Illumina, San Diego, CA)	Illumina GA IIx 76 bp single end reads	970 reference HCV sequences registered at the Hepatitis Virus Database server	Primer stripping using CLC Genomics Workbench (4.6), remove reads aligning to human genome, removal of duplicate reads	BWA 0.5.9-r16	Samtools (0.1.16)	NA	PCR-free whole genome HCV sequencing from human serum; variant comparison between treatment naïve and treatment experienced patients	[13]
HCV	RT-PCR using genotype specific primers, nested PCR of full genome, followed by random shearing and library preparation	Roche-454 GS-FLX Titanium	Sanger-sequenced consensus	In house software (discard reads with Phred quality scores below 20 or length < 55nt)	Mosaik	ShoRAH, manual cleaning	ShoRAH (up to 1600 bp reconstructions)	Within-host evolution/genetic bottleneck	[14]
HRV	Duplicate whole-genome RT-PCR of overlapping primer pairs, nebulisation of pooled fragments and library prep	Illumina GA IIx	Sanger-sequenced consensus	Illumina software: RTA SCS.2.6 and CASAVA 1.6	MAQ v0.7.1	In house scripts; cut-off based on statistical analyses of base frequencies along reference. Comparison between replicates.	NA	Within-host evolution during immunosuppression	[15]
HRV		76 bp single end reads	Sanger-sequenced consensus	Illumina software: RTA SCS.2.6 and CASAVA 1.6	MAQ v0.7.1		NA	Within-host evolution during immunosuppression	[15]
Dengue	RT, PCR amplification of four different fragments, random shearing and adapter ligation	Roche-454 GS-FLX Titanium	De novo assembled using AV454 with manual finishing	NS	Mosaik	RC454/ V-Phaser. Manual removal of variants in primer binding sites or only in ends of reads	NA	Intra-host viral diversity	[16]
Poliovirus	RT-PCR and nested PCR of target amplicon, followed by random shearing and library preparation	Roche-454 FLX Titanium and Illumina GA IIx 76 bp single end reads	Known amplicon sequences	Proprietary Roche/Illumina software. In house software (discard reads with Phred quality scores below 20).	NS	Custom made scripts – disregard variants with strand bias, as well as insertions and deletions adjacent to homopolymers for Roche-454 data.	NA	Detection of emerging resistant variants in a vaccine stock	[17]

Details of the experimental design and analysis pipeline for various applications of deep sequencing to different viruses are given. ‘Design’ describes the types of samples used and any sample processing up to library preparation. ‘Technology’ indicates the type of sequencing employed. ‘Filter’ details any pre-alignment read processing steps. ‘Ref. Seq.’ describes what kinds of reference sequences were used for read alignment, while ‘Align’ gives the actual alignment software used. ‘SNV’ and ‘Hap.’ indicate software used for SNV detection and haplotype reconstruction respectively. ‘Application’ describes the biological motivation for the study. ‘NS’ indicates the method was not specified in the cited publication, while ‘NA’ means not attempted.

Back to article page