三代测序文章

March 3, 2015 | The 2015 Advances in Genome Biology & Technology conference wrapped up over the weekend in Marco Island, Florida, after four days of presentations from the front lines of genome analysis. With less than the usual amount of razzle-dazzle on display in this year’s product launches, the event was stolen by some outstanding scientific achievements pulled off with existing platforms. Pacific Biosciences, this year’s gold sponsor, highlighted several of these in a star-studded workshop Friday afternoon to show off the feats that can be accomplished with its SMRT (single molecule real time) sequencers, the instruments of choice for recovering long-range structural information on the genome. Speakers included J. Craig Venter, who runs the world’s largest genome sequencing center at his company Human Longevity, Inc., and is best known for competing with the Human Genome Project to produce the first whole human genome sequence; Deanna Church, who has helped shape improvements to the human reference genome in her work with the Genome Reference Consortium; and Gene Myers, one of the world’s premier bioinformaticians and co-author of the foundational genome analysis tool BLAST.

In a piece this January looking back on last year’s milestones in genomics, Bio-IT World wrote that “2014 could be looked at as the Year of PacBio, when the [midsize] company proved there was room in the market for a pricier instrument that won’t flinch at high GC coverage, large indels, or de novo assembly.” The present moment might eventually come to be seen as the peak of PacBio’s powers, a window in which the company was truly producing the most comprehensive, highest-quality genomes money could buy.

PacBio’s commercial future is murky: companies like 10X Genomics are toying with more affordable ways to get reliable long-range genomic information, and if Oxford Nanopore gets a handle on its error rates and releases the production-scale PromethION, they’re likely to undercut PacBio on price while delivering the same top-of-the-line features. But whatever its market prospects, scientifically PacBio is driving some of the most innovative sequencing projects going on today. Among other accomplishments, the PacBio workshop at AGBT presented multiple users’ de novo assemblies of whole human genomes — until very recently, a vanishingly rare type of project because no high-throughput instrument could deliver the type of data needed to put together a whole human genome without aligning reads to a reference genome.

De Novo Assemblies as a Commodity?

Today, the very presence of SMRT sequencers on the market has encouraged bioinformaticians to build a whole suite of analytical tools to deal with multi-kilobase reads. As the AGBT workshop made clear, PacBio users now have something like a standard pipeline for going all the way from raw reads to a whole genome. A typical workflow might use Gene Myers’ DALIGNER to find local alignments between reads, FALCON for assembly, and Quiver for variant calling. As PacBio CEO Mike Hunkapillar announced in his opening remarks, DNAnexus recently used this DALIGNER-FALCON pipeline to create a new diploid assembly of J. Craig Venter’s genome, following a sequencing effort that took less than a month to generate all the required raw data on SMRT instruments.

Diploid assembly, correctly distinguishing between the maternal and paternal copies of each chromosome, is the gold standard for a full genome sequence. This ability sets FALCON assemblies apart from even the human reference genome — which, as Deanna Church memorably pointed out in her own presentation, has historically included “Franken-alleles” stitched together from different copies of the same chromosomes.

DNAnexus also appears to have set a world record for the fastest human genome assembly last week, patching together the genome of a peculiar breast cancer cell line, SK-BR-3, in less than 21 hours. The process wrapped up at 10:30 on Friday morning, just in time for a shout-out at the workshop from W. Richard McCombie of the Cold Spring Harbor Laboratory. DNAnexus will now be making this workflow available to all customers through its cloud-based informatics service, offering rapid assembly to any labs with the sequencing capacity to drive through enough PacBio reads.

All this is starting to make de novo assembly look less like a titanic enterprise, and a little more like a commodity. Venter, giving the first talk at the workshop, revealed plans to produce an extraordinary 30 new reference genomes at Human Longevity, Inc., combining two SMRT sequencers with his bank of 20 ultra-high-throughput Illumina HiSeq X instruments. “I’m delighted with the focus I’m hearing here, on getting back to assembled genomes,” said Venter. “If we’re going to understand each of our genomes, we need to do de novo assembly.”

The collection of new reference-grade assemblies at Human Longevity isn’t just a matter of showing off; getting new reference genomes from donors with diverse ethnic and geographical backgrounds will help with all future interpretation of large structural variants, which differ widely between human populations and are difficult to square with a single reference assembly. (Sadly unmentioned was whether and when Human Longevity might share its reference genomes with the wider scientific community.)

Venter, of course, has a knack for thinking big. His 30 reference assemblies will represent just a small fraction of the one million whole genomes he intends to sequence by 2020. In his presentation, Venter even spoke glibly about the pace at which he hopes to see his massively expensive bank of sequencers (an investment in excess of $21 million) become obsolete, based on the historical trend toward ever-cheaper sequencing. “We’re counting on $30 genomes in three or four years, and hopefully we can truck away to the dumpster all the machines we have [now],” Venter said.

Many of our readers should also be interested to hear that Venter casually mentioned looking to hire around 200 new bioinformaticians for his company in 2015.

The second speaker, Gene Myers, was also keenly interested in the possibilities PacBio has opened up for relatively straightforward de novo assembly. Myers spent many years in the 2000’s more or less out of the limelight, reportedly because he was dissatisfied with the industry’s trend toward using short-read sequencers and reference alignment for most applications. However, he reemerged at AGBT last year, after a conversation with Hunkapillar in which Myers learned that SMRT sequencers deliver long reads with both random sampling of the genome, and random, unbiased error rates at any point in the genome.

“As a mathematician, when Mike used this word ‘random’ in those two places I got incredibly excited,” said Myers at this year’s workshop. “Because I understood, from theory alone, that what that meant was immediately that perfect assembly was back on the table.”

Since then, Myers has been hard at work making perfect assembly a reality. In addition to building DALIGNER, he has also started work on a new tool called DAscrub, which was a major focus of his workshop presentation. The purpose of DAscrub is to clean up raw PacBio reads, which are error-prone and vulnerable to sequencing artifacts, without sacrificing valuable data. Myers presented an E. coli assembly produced with 30x coverage of the sample that produced a complete circular genome without requiring any correction steps between running DALIGNER and performing full assembly, except for using DAscrub to clear out artifacts.

Key Genomes

None of these advances in de novo assembly will do much to advance science if we don’t choose samples that truly have something to teach us. The last three speakers at PacBio’s AGBT workshop rounded out the afternoon with some compelling applications for this burgeoning technology.

Deanna Church, formerly of the National Center for Biotechnology Information and now Senior Director of Genomics and Content at genetic diagnostics company Personalis, shared her thoughts on using long-read data to update the human reference genome, and in particular to deal with regions of high structural complexity and large differences between human haplotypes. This is a subject Church has spoken about with Bio-IT World before — in fact, in Hunkapillar‘s opening remarks he quoted an interview we ran with Church in April 2013, in which she said that “if we are truly going to be successful in having genomics affect clinical medicine and we want to understand variation within individuals, we have to have de novo assembly.”

At AGBT, Church noted that the reference genome is essential even when working with de novo assemblies, both as a resource for calling variants, and as a coordinate system for describing those variants. That means missing or confounded sequence in the reference can cause problems for interpretation no matter how scrupulous a new genome may be.

Church touted the addition of many alternate loci in the latest update to the human reference genome, which allow geneticists to consider multiple “paths” through variable regions. She also urged bioinformaticians to update their tools to take these alternate loci into account, something that few groups have done to date. “In aggregate, these alt loci contribute an additional 3.6 megabases of novel sequence that contain 153 unique genes,” said Church. “So if you are not using these sequences in your analyses, you are missing part of the exome, and you are missing some important sequence.”

At the same time, Church acknowledged that the patchwork of alternate loci, in the long term, is not the most efficient way to represent large structural variants across the genome. In a question-and-answer session, she mentioned the Global Alliance for Genomics and Health, which is working on an alternative way to represent chromosomal positions as a branching graph that spans an entire chromosome. “I think this movement to this graph-based representation is really the way we have to go,” she said, “because it allows us to represent this complexity in a much more natural way.” While Church expects it to take some time before this structure is ready to be as widely adopted as the current standards for representing genetic variation, she did say that the alternate loci provide a “graph-lite” approach in the current human reference assembly.

The fourth speaker, Jeong-sun Seo of Seoul National University and Macrogen, presented on a critical new resource for genomics, a diploid assembly of a whole Asian genome. “We have to consider seriously ethnic differences for personalized medicine,” Seo reminded the audience. Ultimately, Seo’s work on this new assembly, of a genome donated by an Altaic Korean individual, is meant to support an Asian Genome Project recruiting 10,000 patient volunteers for whole genome sequencing across South Korea, Japan, China, and Mongolia.

Like Human Longevity, Macrogen has a bank of HiSeq X instruments and has been using a cross-platform approach to generating new reference assemblies. Interestingly, Seo mentioned that his team is also using an Irys device from BioNano, which uses fluorescent markers to map out very large structural variation on the order of hundreds of kilobases. In an interview with Bio-IT World, BioNano CEO Eric Holmlin recently told us that the Irys has been paired with SMRT sequencing but declined to reveal more details; Seo’s presentation offers at least one example of both techniques for getting long-range genomic information being used in parallel.

Highlighting the magnitude of difference between the Korean assembly his group performed and the standard reference genome, Seo noted that on chromosome 20 alone, he was able to pinpoint nearly 500 structural variants, totaling over 210 kilobases inserted or deleted relative to the reference. He also shared one example of a phenotypic difference that appears to be traceable to one of these structural variants, an 8-kilobase insertion in the NINL gene related to pigmentation. “NINL is the most significantly differentially expressed gene between Asians and Caucasians,” Seo observed, a fact that can likely be attributed to this large insertion. Other structural variants that differ widely between ethnic groups are likely to have direct relevance to health and disease risks.

The final speaker was W. Richard McCombie, whose own assembly of interest was the previously-mentioned SK-BR-3 cell line, collected from a Her2-positive case of breast cancer. The SK-BR-3 genome is profoundly disordered — so much so that Hunkapillar, introducing McCombie’s talk, said that looking at this genome, “you wonder how in the heck was this thing alive?”

McCombie, much like Myers, believes that short-read sequencing has been a mixed blessing for the genomics community, offering more data than ever before but at the cost of distracting researchers from profoundly important sources of variation. He quoted Evan Eichler’s term “the seduction of next-gen sequencing,” which he called “very appropriate. You can get really good SNP data from a very large number of individual genomes… but you do miss… a lot of the structural variants.”

Turning to the SK-BR-3 genome, McCombie showed some detailed data, derived from SMRT sequencing, on complex translocations between chromosomes 8 and 17, which occurred across multiple different sites on both chromosomes. With more precise information on precisely how these regions are arranged, which translocations have undergone inversion, and the complete sequence of gene fusions, McCombie’s team is now trying to reconstruct the precise history of the structural events that have produced the SK-BR-3 chromosome 17, particularly at the locus where the Her2 gene resides. Happily, McCombie announced that all his data on this genome is publicly available online, and that he will soon be releasing methylation data as well — something that can be recovered routinely off SMRT sequencers.

PacBio is still very much a niche player in sequencing, and with a notably lower throughput and higher costs than its competitors, that’s unlikely to change any time soon. Nonetheless, the company has done a remarkable job drawing attention to features like haplotypes and structural variants that cannot be captured by short-read sequencing. While the genomics community never really forgot about these factors, they have been shortchanged in the drive for more and cheaper data in the next-generation sequencing era.

Today, it seems possible that projects like those presented at PacBio’s AGBT workshop are just the leading edge of a cultural shift in genomics toward full representations of genomic variation and more routine use of de novo assembly. The full force of that shift will have to wait for technology that brings long-read data in reach of the average user. But whether that comes from future PacBio instruments, a new contender like Oxford Nanopore, a parallel platform like 10X Genomics, or a combination of all three, this year’s AGBT demonstrated that the groundwork has been laid to make the best use of this data once we have it.

时间： 2024-10-29 04:50:10

三代测序文章的相关文章

三代测序组拼接组装工具Falcon

基因组装配工具Falcon工作流程 1 Falcon简介Falcon (Fast Alignment and CONsensus),是由PacBio(太平洋生物科技公司)新开发的二倍体基因组从头拼接组装工具,由HGAP(Hierarchical Genome Assembly Process)扩展而来,但拥有更快的拼接组装效率.Falcon的正常运行,需要DAZZ_DB模块用来构建序列的数据库,DALIGNER模块进行序列比对寻找序列之间的重叠和pypeFLOW模块记录和追踪流程进度. 2 Fa

测序简史

测序简史一代二代测序三代测序一文从一代到最新的测序技术,希望能够帮助你. 序这几天天气很热,热的人心惶惶.因此一直提上日程的所谓的测序简史,也没有时间去好好的落实.中途找过一个行业内的颇有影响力的人,但是他由于种种原因,也没有能踏踏实实的去做这件事情.几经周折,这个任务还是落到了我自己的肩上. 于是乎,我鼓鼓勇气,尝试着去把这段从1977年到2017年的漫长而又渺小的四十年说的有趣些儿. 当我起笔去写这篇文章的时候,小伙伴们还在工作室因为某个服务器后台技术争论不了,这样看来生信人团队还

三代组装小基因组研究综述

三代组装小基因组研究综述三代测序组装三代组装各种原理和方法都有. 近日illumina发布了新的测序仪NovaSeq系列,这个测序是相当的便宜,这个可能对于打开100美元人类基因组时代的大门有巨大的帮助.不过本篇文章小编不讲NovaSeq,感兴趣的自行百度就可以了. 小编目前主要是三代动植物基因组方向,今天高铁上跨了个界读了14年的一篇三代在小基因中的应用的综述文章.今天看来这篇文章确实分析的对,可以认为是所谓的领路人吧. 今天分享给大家我的阅读理解,希望大家能有所收获. One chro

转录组分析的正确姿势

转录组分析的正确姿势转录组分析是目前应用最广的高通量测序分析技术之一.常见设计是不同样品之间比较,寻找差异基因.标志基因.协同变化基因.差异剪接和新转录本,并进行结果可视化.功能注释和网络分析等. 转录组的测序分析也相对成熟,从RNA提取.构建文库.上机测序再到结果解析既可以自己完成,又可以在专业公司进行. 概括来看转录组的分析流程比较简单,序列比对-转录本拼接 (可选)-表达定量-差异基因-功能富集-定制分析.整个环节清晰流畅,可以作为最开始接触高通量测序学习最合适的技术之一. 但重点和难点

解读人：范徉，Methylome and Metabolome Analyses Reveal Adaptive Mechanisms in Geobacter sulfurreducens Grown on Different Terminal Electron Acceptors（甲基化组学和代谢组学分析发现Geobacter sulfurreducens生长在不同电子终受体中的适应机制）

一. 概述: Geobacter sulfurreducens乃一种能量代谢方式为化能异养的厌氧格兰仕阴性细菌,它在无氧条件下通过TCA循环消耗有机物产生电子,以高价铁化合物为电子受体完成电子转移.该研究使用三代测序技术(SMRT)测定了生长在3种包含不同电子受体的培养基,柠檬酸铁,水合氧化铁和延胡索酸中的Geobacter sulfurreducens的甲基化组学.该研究还利用GC-MS分析了3种生长条件下的Geobacter sulfurreducens的代谢组学,并在最后测定了3种生长条件

三代测序文章

三代测序文章的相关文章

三代测序组拼接组装工具Falcon

测序简史

三代组装小基因组研究综述

转录组分析的正确姿势

解读人：范徉，Methylome and Metabolome Analyses Reveal Adaptive Mechanisms in Geobacter sulfurreducens Grown on Different Terminal Electron Acceptors（甲基化组学和代谢组学分析发现Geobacter sulfurreducens生长在不同电子终受体中的适应机制）

群体结构图形三剑客——PCA图

常见问题，解惑，总结

深度学习文献阅读笔记（1）

FusionCancer-人类癌症相关的融合基因的数据库