“” What Are the Essential Features of a Powerful Data Curation Tool?

Blog

What Are the Essential Features of a Powerful Data Curation Tool?

28 Mar 2018
Share on

The challenge of creating a curation and ontology mapping application is not only to make it work robustly and with high precision, but also to make it easy to use. The latter is especially important, as people curating clinical and phenotypic data can range from a student recruited to do occasional project work to a seasoned bio-scientist whose research is dependent on consistent, homogeneous data for repeatable analysis. User experience is a key element for an application that is used in day-to-day work.

When we set out to develop Accurate, we had several important key elements in our hands. One was our years of experience curating and applying ontologies in large datasets for multiple clients and the work done for our own databases, such as IST Online. This experience provided us with insight of all the steps and pain points that might come along during these processes. In addition, we had put substantial effort in creating algorithms that would significantly speed up the (normally very tedious and slow) work and that would ensure reliable ontology suggestions – contrary to some of the tools we had used in the past. What we needed now was a great interface that would combine all of this this in an easy-to-use application, something where you could see through large amounts of information and subsequently perform curation tasks and implement ontologies of choice at the touch of a button.

With the above in mind, our data team, supported by our developers, carefully went through the requirements, interviewed several of our strategic partners for their needs and wishes and started implementing. Great attention was given to process flows, ease-of-use and cooperative work.

Accurate allows you to create project folders. These projects can be shared with collaborators. Once you have created a project, you upload the files. This is followed by an immediate and very visual overview of the data sets’ content; this allows you to quickly decide where to start. Everyone that has worked with spreadsheet data sets knows how hard it can be to get an overall view of the data and this visual presentation helps you significantly to decide what needs to be done.

Each column comes with a histogram, providing you a visual view of the content of that specific column. When you assign a data type to a column, the histogram immediately indicates possible outliers, be this incorrect dates (such as February 30th) or entries with text in (for instance) integer columns. Incorrect and out-of-range items are thus easily spotted and can receive the extra care they might need.

In addition to the many assistive tools for curating, enriching and harmonizing your data, the application uses an AI-driven automated ontology mapping option. The ontologies ensure that all terminology is consistent, machine-readable and standardized. As different column might require different ontologies, we have made sure that you have the option to choose your preferred ontology, including ones you might have developed. So while you might use (for instance) CHEBI for drug names, you can also apply SNOMED-CT for diseases. Within the blink of an eye, you will have your columns mapped with the best match. And if for one reason or another that automatic AI driven suggestion needs to be changed to another one, you can easily select another option from the top 5 recommendations or by searching the ontology tree. And, thanks to the smart memory and machine learning capabilities built in, the next data sets in your project will benefit from your choices and automatically apply the same choices, saving you increasing amounts of time and ensuring consistency throughout your projects.

All of the above can be tracked, with a simple yet effective history panel to ensure data provenance and a complete audit trail. You can thus not only track each step you have taken, but also undo specific actions should there be a need. Your curation and ontology mapping process is thus fully tracked, documented and compliant with your quality assurance processes.

Curious to see Accurate in actual use? Ask us for a demo and a subsequent trial.

 

Writer: Hans Garritzen, Sales Director

hans.garritzen@medisapiens.com