Using High-Quality Real World Data for Precision Medicine

When planning the post-approval strategy for your drug or medical device, real world data (RWD) is often the best source of evidence, irrespective of whether you are monitoring for safety signals, indication expansion or health technology assessment more generally. Working on data from multiple sources, such as EHR:s, claims databases and pharmacy prescription data, will enable you to address your questions, but this inevitably poses a problem: this data was collected, often over many years, to treat the individual patient, meaning it is rarely directly and immediately usable for answering your specific questions.

While the quality of the data begins at the source, i.e. with the people that collect the data from patients and in trials in the first place, much can be done afterwards to ensure that the data sources provide a wealth of information that is machine readable and can be combined with other data sets to increase volume and ensure interoperability.

Common standards, such as SNOMED CT, ICD10, LOINC and RxNORM already go some way to ensure that data can be interpreted and combined in a consistent manner. However, this still does not guarantee that analytical projects in statistics, machine learning and/or AI do not face difficulties interpreting the content. Additional data standardization, such as using a common data model as OMOP, usually helps in overcoming these hurdles.

Making data compatible is especially important as patient volumes matching your search criteria can be small and, in order to find scientifically appropriate volumes, CROs often gather their data from different sources – and thus with the associated different formats and standards making compatibility challenging.

Bringing the required data sets together is very tedious, time-consuming and subject to the risk of human error. MediSapiens’ Accurate© can however significantly automate and speed up this process, helping you to complete your projects faster. In addition, Accurate© adds an added quality assurance throughout the data, carefully scanning for possible anomalies that the human eye does not see.

Free text, often associated with patient files, are an additional challenge in getting vital information about drug efficiency, treatment plans and other events into the process of verification. Technologies such as natural language processing help with this, searching the free text for specific words, terms and sentences and converting these into structured, standardized content.

An example of the applied use of the above processes, standards and technologies is a recent case with a remote cardiac analytics provider. This provider has access to both structured data directly from the application, but also to a large number of medical notes that contains valuable information for the CRO that is testing the efficiency of the application. However, the notes are unstructured and with 54.000 different patients not easy to analyze by the CRO staff. MediSapiens has developed novel specialized deep learning algorithms to analyze this unstructured medical text and bring relevant pieces of information into a structured, harmonized and curated format. This information is then computationally accessible for the CRO, bringing a wealth of new information yet without the painstaking effort to unlock this.

The large collection of (electronic) data sources that are becoming more and more available, combined with the vast experience of data cleaning and quality assurance from MediSapiens are bringing the next step in data science. MediSapiens enables precision medicine to benefit from a high volume of topic applicable content that (dis)proves theses according to regulatory standards, enables consistent replication and health technology assessments. Data that previously was used for one sole purpose is now much easier to repurpose for new research. Even more important, potentially valuable information in unstructured free text can now be unlocked without the effort previously associated with it and in much higher volumes.

For more information contact us at

Share this article: