Next-generation sequencing (NGS) is an emerging technology to determine DNA/RNA sequences for whole genome or specific regions of interest at a lower cost than traditional sequencing and a major disruptor in the biotech space. With NGS data having a major impact in the field of genomics research, it was great to sit down and discuss the topic in detail with Joseph Pearson, Global Product Manager, OmicSoft at QIAGEN, on a recent episode of Data in Biotech.
Joseph came to his current position from a largely academic background focusing on bioinformatics and gene regulation research and completing his PhD in Biology at UC San Diego. Following a stint in academia, he joined OmicSoft when it was a small bioinformatics company as part of the tech support team, interacting with customers to help answer scientific questions. He later progressed into product management roles at QIAGEN following its acquisition of OmicSoft, where he and his team are focused on helping make data more useful and valuable for scientists.
Joseph gave us a detailed walkthrough of how the OmicSoft solution works, the pain points of customers, and how access to this type of data and the relevant analytics tools is allowing biotech organizations to push forward their development. If you have a spare 30 minutes and want to listen to Joseph’s interview in full you can find it here, and these are the highlights.
Introduction to OmicSoft and QIAGEN (3:50): OmicSoft is a dedicated division within QIAGEN that covers the full spectrum of bioinformatics needs. The platform includes a full suite of tools and resources for the development of and access to bioinformatics pipelines, cleaned, prepared, and normalized data, data analysis, interpretation, and integration, for basic research and for drug discovery. OmicSoft focuses on curated examples of NGS analysis, especially RNA sequencing, and the organization, management, and unification of NGS datasets to allow scientists to make full use of the data.
The Importance of Data Quality Control (09:32): Joseph notes that bioinformatics questions that require multiple datasets to be integrated tend to become more difficult. A key roadblock to this is analyzing and controlling the quality of data submitted to public repositories. Differences in spelling on sample labels or the switching of treatments and times would give the wrong answer if taken at face value. There is a real need to clean up the data ahead of analysis, and that is a key part of what OmicSoft offers to its customers, alongside the analysis tools.
Adding Value with APIs (17:25): We discussed how OmicSoft has created a flexible API that allows customers to query and analyze data. One of the surprising things Joseph noted was that customers who’d had access to the full flat files for years were engaging differently with the data when given access via APIs. Although they could have answered their questions by downloading the data as tabular storage, many opted to use the APIs to query OmicSoft’s hosted data. It allowed them to focus on the scripting side as opposed to the database management side and derive value differently.
Build vs. Buy (22:44): Joseph explains that there isn’t a simple answer to what is the right approach, and it depends on both the customer and also the team that they have in place. If they have been burned by trying to self-build in the past, they may come straight to a provider like QIAGEN rather than developing a custom solution. Other customers may look for a turnkey solution in the first instance and then realize they need more customization, and that the raw data is what their in-house data team needs to integrate with assets that are being developed in-house. OmicSoft strives to provide the flexibility to meet customers where they are in their bioinformatics journey, from enabling smaller ad hoc analyses that answer business-critical questions to integrating OmicSoft data into their own data warehouse.
Vision for the Future (33:46) When asked about his vision for the future, Joseph focused on the importance of removing as many barriers as possible that stand in the way of good data. It aligns with the mission of OmicSoft, but the vision he has is to make it easy for people to get real insights out of massive data repositories and use that to advance research. This means removing the friction that frequently prevents teams from getting value from the data: cleaning and normalizing the metadata elements associated with each dataset and exposing that data through a consistent interface.
Further Reading: For anyone interested in diving a little deeper into the QIAGEN solutions Joseph discussed on the podcast, he recommends the website as a starting point for additional resources, insights, and webinars that go into more detail than we had time to discuss on the episode.
One of the key topics of the discussion with Joseph was the need for data quality and curated data, and we looked in detail at how OmicSoft cleans up public and proprietary data sets to allow bioinformaticians to get the most out of that data.
As we continue the conversation this week, we wanted to turn the spotlight onto in-house data and exactly how an organization can look to improve its data competency and ensure that the valuable data already gathered can be utilized. At CorrDyn we specialize in helping our clients formulate and implement a data strategy that helps turn disparate data into useful insights.
Identify your primary strategic objective(s): Whatever steps you take with your data should be focused on the highest value opportunities that can be achieved with it. Do you have the data you need, but it is inadequately cleaned, prepared and integrated? Or do you have to start acquiring the data you need in the correct format? Or do you need to democratize access to and analysis of the data assets you have available already?
Identify skills gap(s): Does your in-house team have the skills to undertake the required data initiative(s)? We often find that although teams have a wealth of scientific knowledge in-house, and possibly even the development expertise to accomplish all of their goals with infinite time and resources, but they don’t necessarily have the time or expertise to build the data pipelines and workflows needed to be effective on the timeline required. Acknowledging this and finding a partner that has the right experience is a game changer when it comes to achieving data competency.
Where to start: You know your strategic objectives, but breaking those down into an achievable roadmap is not always easily achievable. Identify the quick wins; where is better data or better management of data going to have the biggest immediate impact? How can we build momentum toward strategic objectives by delivering value as soon as possible? Defining this roadmap from the outset helps keep your data transformation on track.
Selecting the right tools: There are dozens, if not hundreds of ways to approach data pipelines and workflows but the key is selecting the right option for your organization. Is it going to fulfill all of your strategic and functional requirements? Does it work within your budget constraints? Does it enable the flexibility we need to change requirements and potential paths of development for the organization as a whole? At CorrDyn, we help our customers select the tools that give them the greatest ROI and the lowest time to value, based on the use cases they are trying to achieve.
This is by no means an exhaustive guide to getting a data competency project off the ground, but it is a good place to start. Having the right people in place, with the expertise to ensure success is critical and why it is often sensible to work with a specialist data partner.
With over 50% of the delivery team at CorrDyn coming from a science background, many with advanced studies in physics, chemistry, and biology, our team excels at understanding the data within your organization and helping you make the most of it.
If you’d like to speak to CorrDyn about how to make the most of your in-house data and use it to support your business goals, get in touch for a free SWOT analysis.
Want to listen to the full podcast? Listen here: