{"id":74825,"date":"2023-12-08T23:03:03","date_gmt":"2023-12-08T22:03:03","guid":{"rendered":"https:\/\/www.veeva.com\/eu\/?p=74825"},"modified":"2026-01-22T11:13:07","modified_gmt":"2026-01-22T10:13:07","slug":"clinical-data-needs-smart-automation-now-for-a-better-ai-future","status":"publish","type":"post","link":"https:\/\/www.veeva.com\/eu\/blog\/clinical-data-needs-smart-automation-now-for-a-better-ai-future\/","title":{"rendered":"Clinical Data Needs Smart Automation Now for a Better AI Future"},"content":{"rendered":"<p>Our industry has successfully used artificial intelligence (AI) and machine learning (ML) in <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10302550\/\" rel=\"noopener noreferrer\" target=\"_blank\">drug discovery<\/a> for target identification, molecular simulations, predictions of drug properties, and other applications. These incorporate petabytes of molecular, rather than patient, data to draw out patterns and find \u201cneedles in haystacks.\u201d Companies like Novo Nordisk utilize AI methods to <a href=\"https:\/\/sloanreview.mit.edu\/audio\/from-data-to-wisdom-novo-nordisks-tonia-sideri\/\" rel=\"noopener noreferrer\" target=\"_blank\">bring the data to life<\/a>, and improve quality control and commercialization.<\/p>\n<p>In clinical study execution, however, the practical uses of AI and ML are still very nascent. Unlike drug discovery, clinical data (with up to <a href=\"https:\/\/www.globenewswire.com\/news-release\/2021\/01\/12\/2157143\/0\/en\/Rising-Protocol-Design-Complexity-Is-Driving-Rapid-Growth-in-Clinical-Trial-Data-Volume-According-to-Tufts-Center-for-the-Study-of-Drug-Development.html\" rel=\"noopener noreferrer\" target=\"_blank\">10M data points<\/a> in a typical phase III trial) doesn\u2019t typically represent \u201cbig data\u201d or reach the levels needed to train complex machine learning models.<\/p>\n<p>We can radically improve our Clinical Data Management (CDM) processes now, but AI isn\u2019t the proven answer (yet). A wave of recent advancements in automation and augmentation have reduced cycle times and cost while improving first time quality. But with AI hype, we under-utilize these smart automation use cases that save human effort, time, and money.<\/p>\n<p>Currently, knowledge of AI lags far behind the <a href=\"https:\/\/www.gartner.com\/en\/articles\/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle\" rel=\"noopener noreferrer\" target=\"_blank\">peak hype<\/a> and experimentation that surround it.  Business and IT leaders stress the importance of early investment in AI\/ML and naturally want to show forward-thinking approaches within their organizations. Vendors leverage the market\u2019s desire for AI\/ML by messaging it everywhere. Meanwhile, a recent <a href=\"https:\/\/learning-scdm.org\/courses\/60816\" rel=\"noopener noreferrer\" target=\"_blank\">SCDM webinar<\/a> poll on Smart Automation revealed that 54% of clinical data attendees had low understanding of AI\/ML terms. I hope to dispel some confusion around this.<\/p>\n<p>There is <a href=\"https:\/\/eclinicalforum.org\/downloads\/white-paper-released-by-the-artificial-intelligence-and-machine-learning-working-group\" rel=\"noopener noreferrer\" target=\"_blank\">future AI\/ML benefit for CDM<\/a>, especially in use cases that assist humans and augment decision making. Magda Jaskowska, Global Oncology Data Management at GSK, said during the <a href=\"https:\/\/learning-scdm.org\/courses\/60816\" rel=\"noopener noreferrer\" target=\"_blank\">SCDM webinar<\/a>: \u201cIs AI\/ML hype or not? I\u2019m on the realistic side. There are limitations, but there are also long term opportunities, especially with use cases that keep the \u2018human in the loop.\u2019 The person needs to be the final decision maker and is responsible for it.\u201d<\/p>\n<p>AI\/ML in CDM should be purposeful since it adds cost, complication, change management, and risk. We need to carefully consider the ethical applications, such as who is responsible for errors when the study software is learning? Additionally, since AI\/ML is unregulated today, we must work with regulatory bodies to confirm fit-for-purpose use cases.<\/p>\n<p>It is an expensive and urgent race to get clean and analysis-ready data. We can\u2019t afford to become overly distracted by AI theory. <em>I hope we find a better balance of optimizing ROI and cleaning data through automation, while also preparing for practical AI use cases<\/em>.<\/p>\n<h2>Clean data as a fundamental requirement<\/h2>\n<p>When Forbes asked Vas Narasimhan, CEO of Novartis, about AI and ML in 2019, he said their team \u201chad to spend most of the time just cleaning the data sets before you can even run the algorithm. That\u2019s taken us years just to clean the datasets. I think people underestimate how little clean data there is out there, and how hard it is to clean and link the data.\u201d While numbers vary greatly, time spent on data prep is commonly cited between <a href=\"https:\/\/www.mckinsey.com\/capabilities\/quantumblack\/our-insights\/rethinking-ai-talent-strategy-as-automated-machine-learning-comes-of-age\" rel=\"noopener noreferrer\" target=\"_blank\">60-80%<\/a> of a data scientist&#8217;s time when developing models.<\/p>\n<p>This challenge of clean data has only been exacerbated by the complexity of study protocols and data sources. Even today, it is far too manual and resource intensive for many biopharmas. <em>Cleaning data is not a fit-for-purpose use case for AI\/ML, yet it is a requirement for future AI\/ML use cases<\/em>.<\/p>\n<h2>Distinguish between AI and automation: putting automation first<\/h2>\n<p>There is no denying that our industry is at a crossroads.  We can no longer manage the volume of data and processes without automation. Automation reduces manual effort \u2013 including data cleaning effort \u2013 and delivers intelligence today. We saw a few of these examples at SCDM, such as automated data quality checks that raise discrepancies across all trial data sources in bulk. More innovative examples will emerge.<\/p>\n<p>However, some organizations call these smart use cases \u201cAI\u201d instead of  \u201cautomation.\u201d In an SCDM session, a speaker presented \u201cAI reconciliation\u201d and, when pressed, suggested that this could be accomplished with automation. So, why label real automation use cases with AI?  These types of claims are inaccurate and generate confusion.<\/p>\n<h2>The need for a common language<\/h2>\n<p>Then, let\u2019s consider AI and ML, which are so often combined in the term AI\/ML. But it is important to define them separately.  Artificial Intelligence (AI) is the ability of technology to mimic aspects of human intelligence.  Underneath the AI umbrella, familiar terms relate to what AI delivers (e.g. Natural Language Processing) and how AI delivers them (e.g. Machine Learning).<\/p>\n<p>When distinguishing AI from Automation, think about correlation vs causation. Automation is the \u201cgear\u201d in the system that reliably evokes an effect (action) from a cause (input). In contrast, AI is the \u201cbrain\u201d that finds correlation and learns patterns. But, AI does not know the cause, nor does it necessarily produce a repeatable effect.<\/p>\n<div style=\"display: flex; justify-content: center; align-items: flex-start; flex-wrap: wrap; row-gap: 30px;\">\n\t<img decoding=\"async\" class=\"img-left img-responsive\" src=\"\/eu\/wp-content\/uploads\/2023\/12\/smart-automation-1.png\" alt=\"\"><\/p>\n<p>\t<img decoding=\"async\" class=\"img-right img-responsive\" src=\"\/eu\/wp-content\/uploads\/2023\/12\/smart-automation-2.png\" alt=\"\">\n<\/div>\n<p>It may be useful to develop a common language around a few established terms. Magda and I discussed these terms, along with relevant use cases, in this SCDM <a href=\"https:\/\/learning-scdm.org\/courses\/60816\" rel=\"noopener noreferrer\" target=\"_blank\">webinar<\/a>. I will delve further into these terms in future blogs.<\/p>\n<h2>Five Key Terms<\/h2>\n<h3>Rule-based Automation (not AI)<\/h3>\n<p>Currently, most automations that users encounter will have been implemented via classic logical, \u201cif\/then\u201d rule-based algorithms. These are written by a human in a programming language and range from simple rules to an optimized combination of smart rules that automate process flows. Being rules based, these automations yield the same result every time.<\/p>\n<p>Clear business problems (usually involving less than 100 rules) are solved fastest and most reliably with rule-based automation.<\/p>\n<p>The clinical research industry operates with great respect to rules.  We execute with a robust framework of SOPs and work instructions and our systems are designed with strict adherence to logical workflows and statuses.<\/p>\n<h3>Robotic Process Automation (not AI)<\/h3>\n<p>Robotic Process Automation (RPA) repeats low critical thinking tasks, thus saving time. Automation software, or \u201cbots,\u201d emulate the actions of humans by clicking buttons and entering data into fields to carry out error-free tasks at high volume and speed.<\/p>\n<p>RPA is able to record tasks performed by a human on their computer, then perform those same tasks without human intervention. It is trained to emulate specific user actions, but it does not \u201clearn\u201d using mathematical modeling, so this is not an example of ML.  Confusing matters is the fact that RPA processes are sometimes combined with AI methodologies to increase their utility, a combination termed \u201cIntelligent Automation.\u201d<\/p>\n<h3>Machine Learning<\/h3>\n<p>Machine learning (ML) uses mathematical models to develop algorithms from data whilst improving those models via either supervised or unsupervised processes. ML is typically used where development of similar algorithms by human programmers would be cost prohibitive. For example, when seeking to develop a computer system that has contextual understanding of the English language i.e. Chat GPT.<\/p>\n<h3>Natural Language Processing (NLP) &#038; Large Language Models (LLMs)<\/h3>\n<p>A Large Language Model (LLM) is the most familiar method by which Natural Language Processing (NLP) is delivered. NLP is the &#8220;what&#8221; ability for computers to understand text and spoken words, like voice recognition. Meanwhile, LLMs are the &#8220;how&#8221; deep learning algorithms that are trained to process and generate text, like document generation for data review plans. In the CDM space, NLPs\/LLMs can be used to support natural language interaction of a Data Manager with their <a href=\"https:\/\/globalforum.diaglobal.org\/issue\/june-2023\/seven-best-practices-for-adopting-a-clinical-data-workbench\/\" rel=\"noopener noreferrer\" target=\"_blank\">Clinical Data Workbench<\/a> system, reducing the technical barriers required to interrogate and manipulate the data within the Workbench. LLMs can also be used as part of an automated document generation process e.g. for Data Review Plans.<\/p>\n<h3>Generative AI<\/h3>\n<p>Generative AI is a specific type of AI functionality that is capable of generating text, images or other media using generative models. Chat GPT is an example of a sub-category of generative AI known as Natural Language Processing. Generative AI is most commonly delivered using artificial neural networks which are themselves a sub-category of ML, since they learn. Generative AI does not only draw conclusions from data, but it can also predict outcomes and prescribe solutions based on given criteria.<\/p>\n<h3>We have near-term opportunity with smart automation<\/h3>\n<p>Smart automation describes the application of any technology that leverages a deep understanding of both physical processes and volumes of data to automate traditionally laborious human activity.  As study designs and data sources increase in scale and complexity, clinical data managers need more smart automation to ensure quality and efficiency.<\/p>\n<h2>Conclusion<\/h2>\n<p>There are no silver bullets that will lift us to where we want to be in the future. Meanwhile, we must be smart about our investments today. I hope to normalize conversations about smart automation use cases that are available \u2013 but under-utilized \u2013 to save us effort, time, and money with less risk. We must focus our collective knowledge to maximize the value of available technology and processes, while in parallel, ruthlessly prioritize attention toward future solutions. In the next few years, I expect that rule-based automation will contribute the most towards cleaning data, while also feeding that quality data to AI models that show early signs of value.<\/p>\n<p>As a CDM industry, we must balance the \u201cnow\u201d that demands efficiency and value with an AI future that requires better data. To watch the free on-demand recording of a webinar on this topic, visit <a href=\"https:\/\/learning-scdm.org\/courses\/60816\" rel=\"noopener noreferrer\" target=\"_blank\">SCDM\u2019s learning portal<\/a> (if you are not a member of SCDM, simply create an account via the \u2018Friend of SCDM\u2019 option).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI promises to improve clinical trial efficiency, but it can only work from a foundation based on clean data and smarter use of automation.<\/p>\n","protected":false},"author":346,"featured_media":72672,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"product":[982,983],"area":[970],"coauthors":[1565],"class_list":["post-74825","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","product-clinical-database-cdb","product-edc","area-clinical-data","blog-area-rd","blog-product-data","blog-html-content-yes"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/posts\/74825"}],"collection":[{"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/users\/346"}],"replies":[{"embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/comments?post=74825"}],"version-history":[{"count":5,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/posts\/74825\/revisions"}],"predecessor-version":[{"id":92951,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/posts\/74825\/revisions\/92951"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/media\/72672"}],"wp:attachment":[{"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/media?parent=74825"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/categories?post=74825"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/tags?post=74825"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/product?post=74825"},{"taxonomy":"area","embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/area?post=74825"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.veeva.com\/eu\/wp-json\/wp\/v2\/coauthors?post=74825"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}