Tech Vedika was approached by a leading multinational healthcare company for facilitating their Global Customer Support team to analyze telemetric run data of next-generation sequencers. The process involves a complex analysis of the logged instrument data. Each analysis run could generate 100s of GB data.
Tech Vedika’s Data Analytics Lab built a novel system on top scalable computing and storage Hadoop framework. The system ingests large data sets from transcript expression analysis along with the metadata and builds a data model for efficient and fast analysis.
Following are the few salient features our system:
- Expose Web Services/Queues to integrate with other downstream systems.
- SQL engine on top distributed system for ad-hoc analysis.
- Auto management of complex workflows
Our solution ingests 100s of sequencing runs and processing 100s of GB daily to enable global support team to investigate failed runs timely. Our data model crunches large data sets to bring actionable information using machine learning. The visualization dashboards enable easy navigation through actionable information to understand the root causes of the failures.
Big Data and Genomics
It is often said that if 20th Century was of Silicon, 21st Century would be of biology. In recent years, new discoveries are churning out in the field of Genomics & Molecular Biology at a pace unmatched due to deeper data that we can derive from the cells. Every day, we are unearthing new knowledge of cancer, genetic disease, and emerging areas. Thanks to the next-generation sequencing systems that are accessible to enterprises of all sizes.
The human genome contains roughly 20,000 genes, and each gene contains millions of base pairs. Sequencing the entire human genome could generate petabytes of data. Analyzing large data sets requires inferring data at scale, for example, protein structure prediction, cancer classification based on microarray data, clustering of gene expression data. The first human genome project took a decade to complete at a cost of $3 billion dollars. Now with the help of Big Data Sciences, the whole process can be completed in days that costs only thousands of dollars.