Clearing the Technology Path to Clinical Data Science

SCDM recently published “The Evolution of Clinical Data Management into Clinical Data Science” which defines the term “clinical data science” and provides guidance on creating a clinical data science organization.

They define clinical data science as follows (abridged):

Clinical Data Science can be defined as the strategic discipline enabling the execution of complex protocol designs in a patient-centric, data-driven, and risk-based approach, ensuring subject protection as well as “the reliability and credibility of trial results”.

Clinical Data Science broadens [the field of Clinical Data Management] by adding the data risk, data meaning, and value dimensions for achieving data quality (i.e., data is credible and reliable). Clinical Data Science also expands the scope of Clinical Data Management beyond the study construct by requiring the ability to generate knowledge and insights from clinical data to support other clinical research activities which require different expertise, approaches, and technologies.

While we wholeheartedly agree with this terminology and the growing movement of clinical data management toward clinical data science, we noticed that there was a key component missing from consideration in this paper: what technology is necessary to facilitate this transition from clinical data management to clinical data science? How can an organization reliably “generate knowledge and insights” with today’s solutions? If this were a simple thing, it would have happened a long time ago. However, technology that facilitates this important transition has only recently become available.

Understanding the roadblocks to clinical data science

Clinical data management has always been about minimizing data risk and maximizing data quality, but these tasks always required significant effort and time. Typically, leading up to database lock, data managers would work to consolidate data from multiple source systems together to prepare it for analysis. However, the tools to facilitate this were typically not industry-focused, meaning the data from an increasing number of sources was not easily brought into contextual alignment. The connections attempting to aggregate the data were fragile and manual intervention was often required to translate millions of data points into a unified structure for analysis.

In a nutshell:

  1. There has been an explosion of data sources as trial complexity has increased. Consequently, there is a significantly greater volume of disparate data that must be consolidated and cleaned before analysis can begin.
  2. The tools that existed did not make this easy, as each data source defined data differently and databases intended to consolidate the data together did not have contextual understanding of the data to facilitate alignment.
  3. Connections between these systems were often custom-built, meaning that they were also fragile. If there were unexpected incoming data formats, they frequently broke and manual intervention was required.
  4. Putting together this patchwork of systems and connectors was complex and time-consuming, meaning data managers typically had little spare time to analyze the resulting dataset before moving on to manage data for the next study.

Clearing the path to clinical data science

Each of these problems need an answer if we are ever going to be able to evolve clinical data management toward clinical data science. To do so, we need to effectively make data managers’ lives easier and free up time for data analysis.

We had these problems in mind when we began developing Veeva Clinical Database (CDB), a data management platform that aggregates all internal and third-party clinical data sources to enable targeted reviews and automated cleaning across all study data. Veeva CDB is industry-built to understand the types of data coming into the system and provides unique and comprehensive tools to make the process of consolidating and cleaning the data easier.

Among the capabilities already available with Veeva CDB:

  1. Robust data ingestion and aggregation capabilities. With our Veeva CDB Data Providers Partner Program, we are able to bring in data from various sources with confidence because Veeva has simplified and standardized data exchange with our Partner Providers, increasing reliability.
  2. Automated data checking that runs whenever data comes into Veeva CDB. Similar in concept to EDC data validation check, CDB can automatically generate and close queries saving data managers time and manual effort.
  3. The system can automatically detect data changes and prompt review only if the data is new or has been updated.
  4. An industry-specific database and syntax called clinical query language (CQL), similar to structured query language (SQL), understands the context of your clinical data and allows you to efficiently retrieve and format your data across multiple data sources.
  5. A data workbench that allows you to record and manage discrepancies across all study data with ease.
  6. A centralized place for data providers to log in and address discrepancies directly, without having to pull queries into an external spreadsheet outside of the system.

These capabilities create some amazing possibilities for data managers. For one, many of the most onerous and manual processes will be automated for speed, ease of use, and efficiency, such as in the areas of data consolidation and harmonization; automated data checking for more focused scrutiny only where needed; and data cleaning and validation. What this means practically is that the data can be ready for lock – that is, ready for analysis – at any point during the study.

Data managers will be able to spend less time looking at the data to make sure it is as accurate as possible, and more time exploring the data to see what that data can reveal. Up until now, this simply wasn’t possible because the tools did not exist to allow data managers enough free time to do so. But as tools like Veeva CDB become industry standard, clinical data management will truly begin to evolve more towards clinical data science. And the entire industry will benefit from it.

Learn more about how Veeva CDB can help enable your data science organization by clicking here.

Interested in learning more about how Veeva can help?