Tech Vedika was approached by a leading multinational healthcare company for facilitating their Global Customer Support team to analyze telemetric run data of next-generation sequencers. The process involves a complex analysis of the logged instrument data. Each analysis run could generate 100s of GB data.

Tech Vedika’s Data Analytics Lab built a novel system on top scalable computing and storage Hadoop framework. The system ingests large data sets from transcript expression analysis along with the metadata and builds a data model for efficient and fast analysis.

Following are the few salient features of our system:
  • Expose web services/queues to integrate with other downstream systems
  • SQL engine on top distributed system for ad-hoc analysis
  • Auto management of complex workflows

Our solution ingests 100s of sequencing runs and processes 100s of GB daily to enable the global support team to investigate failed runs timely. Our data model crunches large data sets to bring actionable information using machine learning. The visualization dashboards enable easy navigation through actionable information to understand the root cause of failures.

Big Data and Genomics

It is often said that if 20th Century was of Silicon, 21st Century would be of Biology. New discoveries are being made in the field of Genomics & Molecular Biology at an unmatched pace due to the deep data that we can derive from the cells. Every day, we are unearthing new knowledge of cancer, genetic disease, and emerging areas, thanks to the next-generation sequencing systems that are accessible to enterprises of all sizes.

The human genome contains roughly 20,000 genes and each gene contains millions of base pairs. Sequencing the entire human genome could generate petabytes of data. Analyzing large data sets requires inferring data at scale, for example, protein structure prediction, cancer classification based on microarray data, clustering of gene expression data. The first human genome project took a decade to complete at a cost of $3 billion dollars. Now with the help of Big Data Sciences, the whole process can be completed in days that cost only thousands of dollars.