Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing

Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing
Author :
Publisher :
Total Pages : 51
Release :
ISBN-10 : OCLC:970211718
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing by : Andi Dhroso

Download or read book Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing written by Andi Dhroso and published by . This book was released on 2015 with total page 51 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genomics is one of the first life science disciplines to enter the era of Big Data, facing challenges in all three dimensions—volume, variety, and velocity. Yet, in spite of a plethora of sequencing data, we are still far from creating a complete encyclopedia of functional and structural elements of the genome. In 2004, an example of this knowledge gap came about when Bejerano and Haussler discovered 481 DNA elements in the syntenic positions of human, mouse and rat genomes that were 100% identical, called the ultra-conserved elements (UCEs). Recently, using an advanced data-mining alignment-free approach, it was shown that this phenomenon exists beyond the animal kingdom and outside the regions of synteny (conservation of blocks of order within two sets of chromosomes that are being compared with each other). Our ultimate goal is to provide a comprehensive atlas of the regions of extreme conservation in higher eukaryotes providing insights into the structural organization, function and evolution of these elements. However, the all-against-all comparison of dozens, if not hundreds of eukaryotic genomes may not be feasible using current approaches. For instance, the original findings of syntenic-only UCEs relied on a whole-genome alignment of three mammalian genomes and it took one day on a 24-nodes cluster. A comprehensive alignmentfree algorithm that guaranteed finding all syntenic and non-syntenic long identical multi-species elements (LIMEs) took three days on a 48 CPU cluster between two assembled genomes. Here, we present a new hybrid approach that integrates the ideas of hash mapping and cacheoblivious in-memory computing. Our algorithm leverages the concept of help-me-help-you, where the data structures are tailored to maximize cache-hit while minimizing cache-miss. As a result, our hybrid algorithm is approximately 800 times faster than the current state-of-the-art method and is scalable to deal with the unassembled genomes. The new hybrid approach has been applied to detect the earliest evidence of extreme conservation by including into the largescale analysis recently sequenced genomes of coelacanth and lamprey. The integration of efficient software with hardware-optimized approaches has shown to be a promising direction in comparative genomics, allowing scientists to provide even deeper insights into the function and evolution of eukaryotic genomes.


Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing Related Books

Detecting Genomic Elements of Extreme Conservation in Higher Eukaryotes by Integration of Hash Mapping and Cache-oblivious In-memory Computing
Language: en
Pages: 51
Authors: Andi Dhroso
Categories:
Type: BOOK - Published: 2015 - Publisher:

DOWNLOAD EBOOK

Genomics is one of the first life science disciplines to enter the era of Big Data, facing challenges in all three dimensions—volume, variety, and velocity. Y
Introduction to Bioinformatics
Language: en
Pages:
Authors: Arthur M. Lesk
Categories:
Type: BOOK - Published: 2019 - Publisher:

DOWNLOAD EBOOK

Bioinformatics and Phylogenetics
Language: en
Pages: 410
Authors: Tandy Warnow
Categories: Computers
Type: BOOK - Published: 2019-04-08 - Publisher: Springer

DOWNLOAD EBOOK

This volume presents a compelling collection of state-of-the-art work in algorithmic computational biology, honoring the legacy of Professor Bernard M.E. Moret
Finding Groups in Data
Language: en
Pages: 376
Authors: Leonard Kaufman
Categories: Mathematics
Type: BOOK - Published: 1990-03-22 - Publisher: Wiley-Interscience

DOWNLOAD EBOOK

Partitioning around medoids (Program PAM). Clustering large applications (Program CLARA). Fuzzy analysis (Program FANNY). Agglomerative Nesting (Program AGNES).
Bioinformatics Algorithms
Language: en
Pages: 528
Authors: Ion Mandoiu
Categories: Computers
Type: BOOK - Published: 2008-02-25 - Publisher: John Wiley & Sons

DOWNLOAD EBOOK

Presents algorithmic techniques for solving problems in bioinformatics, including applications that shed new light on molecular biology This book introduces alg