Home > Platforms > Sequence Improvement
Sequence Improvement

Computer-based Sequence Improvement

Sequence Finishing involves the improvement of assembled draft sequence to meet established quality standards. After initial shotgun (or draft) sequence data has been assembled, the original DNA sequence may not be accurately represented. Gaps may exist due to under-representation or difficult to sequence structure, or the existing sequence may be poor quality ambiguous data.

The draft data is first directed to Prefinishing and is processed using software tools to select directed sequencing reactions for gap closure and data quality improvement through an automated pipeline. A production style sequencing process using custom oligo nucleotides and specialized sequencing chemistries is performed and assembled with the draft data. This effort can be tailored based on project goals to fit various timelines and budgets. After this automated sequence improvement is complete, projects are passed to the Finishing group for manual efforts. Finishers design custom directed efforts in order to bring the project to a quality standard of fewer than 1 error per 10,000bp, although this rate can be adjusted based on requirements for the project. These efforts include custom primer walks, transposon bombs, PCR reactions, and special subclone library construction. The assembly of data is confirmed by comparing predicted restriction enzyme fragments based on assembled sequence with the actual DNA fingerprints from the clones. A Quality Assessment group within Finishing verifies that the data meet the established quality standard prior to submission to the Analysis group.

The GSC Finishing group has contributed to the finishing of selected regions of numerous genomes including human, mouse, Saccharomyces cerivisiae, Caenorhabditis elegans, Caenorhabditis briggsae, Gallus gallus, Histoplasma capsulatum, Pan troglodytes, Arabidopsis thaliana, Ornithorhynchus anatinus, and several bacterial genomes. The GSC Finishing group is a leader in the United States finishing efforts, completing human chromosome Y, 7, 4, and 2 with many significant publications resulting from this work. The GSC has completed over 1.5 billion bases of genomic sequence from the above-mentioned organisms to an error rate of fewer than 1 error per 10,000bp.

Finishing continues to be an important activity at the GSC. Draft assemblies often give up to 95% of the genomic sequence but many gaps and low quality regions remain in the data. In addition, assembly algorithms are often unable to accurately assemble repetitive regions. The resolution of missing, misassembled, or poor quality data, especially in regions of gene or regulatory coding, is the primary benefit of finishing. These directed efforts can provide varying levels of improvement to the assembly based on scientific value or cost. The automated data improvement of prefinishing is a cost effective means of improving genomes, although it is not as thorough as the improvement provided by manual finishing. Manual improvements such as determining order and orientation of contigs and resolving misassemblies can provide a better quality product for the end user. These vary levels of effort can be tailored to specific regions or genomes of interest and can be adjusted based on timelines, budgets, and quality requirements of the projects.

 
Sequence Improvement Information
GSC Mouse Finishing Rules
 
Sequence Improvement Links
FINS at CSHL
Standard Finishing Practices and Annotation of Problem Regions for the Human Genome Project
Sequence Improvement Contact
Robert Fulton
Group Leader of Sequencing Improvement
Send email

Washington University School of Medicine
The Genome Center
4444 Forest Park Ave
St. Louis, Missouri 63108
USA