How to Preserve Data Quality in Healthcare When Working with Big Data Sets
A clinical registry is only as valuable and impactful as its data is trusted and usable.
High-quality data is the foundation of any clinical data registry, yet it does not develop on its own. In the world of healthcare, we have millions of unique data elements of different sizes, formats, and characteristics.
Before we get into the process of achieving high quality data, let’s start with a brief definition.
What Is High-Quality Data in Healthcare?
Characteristics of data quality include data that is precise, validated, and comprehensive. These characteristics provide valuable assets to meet the evolving data requirements of varied healthcare stakeholders.
Acquiring data from various sources is the first step. The necessary and important next step is to take all the data blocks and build something grand, nimble, and valuable.
Data Integrity Process in Healthcare
In order to preserve and extract data value, you need a solution that is centered around a multi-layered data integrity process. This is of paramount importance for registry activities.
Multi-layered data integrity processes enable comprehensive, persistent, and widespread data integrity protocols that protect registry data and reporting in serving the needs of diverse stakeholders.
There are five methods for perpetually protecting data integrity
Data Review & Validation
Whenever connecting to a new data source, EHR system, or modifying an existing source, registries should engage in a thorough analysis to assess data integrity across several domains:
- Completeness: Is the data element available and populated?
- Concordance: Do the values of the data element agree with those of related variables?
- Plausibility: Do the values of the data element make clinical sense?
- Currency: How recent is the available data?
Comprehensive Validation
It is critical to preserve data quality throughout the life of a registry. The best way to accomplish this is through real-time and ongoing data validation checks that deliver the following components.
- Validation Rules: Each data integration interface should be configured to include robust and comprehensive validation rules that ensure only valid records enter the registry.
- Data Quality Reporting: Dedicated registry dashboards should provide registry participants with real-time information on their data quality across dimensions such as completeness, concordance, plausibility, and currency.
- Systems Monitoring and Alerting: Real-time monitoring and alerting tools should report any unexpected data events during the data processing. Alerts should go to the right stakeholders for immediate action and resolution. For example, a practice might receive an alert that their most recent data included data elements outside of normal ranges.
Using this multi-layered strategy ensures that all registry data is efficiently ingested, transformed, persisted, and analyzed while maintaining complete fidelity to the original clinical record.
Documentation and Transparency
A registry is only as valuable as it is trusted. Registry stakeholders need to feel confident interacting with their data and interpreting its results. Thus, all data transformations need to be fully documented and approved.
This includes documentation of data processing rules, data validation logic, and measure calculations. This information provides full transparency into how data is used in the registry and helps to increase trust in the clinical insights generated from the registry.
Change Management
Registries will evolve and grow as new priorities emerge – from new data elements being added to changes in measure calculations. These updates increase the value of the registry, but in the process, they must not interrupt current data protections that might affect registry results and further confuse users.
Any data change should go through a systematic change control process to transparently document, test, and have approved any update so its impact is known and widely shared prior to implementation.
Data Logging and Auditing
Health data is sensitive and requires data protections to ensure it is responsibly handled. Furthermore, registry engagement will lead to numerous questions from registry stakeholders on how data was processed and handled. All data entering the registry must be fully traced from the receipt of data to when it is displayed on reports.
This includes auditable logs of all activity related to data submission, data manipulation, and registry interactions. Each data element underlying the registry is tied to information on the date of receipt, data submission protocols, date and time, data quality results, data transformation steps, and the capturing of raw source data for comprehensive auditing and tracking.
Dedicated Team
Data integrity is best protected when the above processes are controlled by data experts who are intimately familiar with the registry data, goals, and stakeholders. Projects should have a dedicated team of data scientists, data engineers, and data integration specialists who provide long-term, end-to-end registry support through their extensive knowledge of all aspects of the project.
Achieve High Data Quality with Your Registry
Data quality is essential to registries because it establishes trust and value. When your clinicians, researchers, and other stakeholders are confident in the data, they can use it to advance research and improve outcomes.
Understanding Clinical Registry Data Types & Data Sources
Building a successful clinical registry is a three-step process that involves acquiring data, transforming it with advanced analytics that identify what matters, and informing clinicians and other stakeholders what and how to improve.
Let’s focus on the first step: acquiring data. For a high-achieving registry, you need to collect a large amount of quality data.
Data for your registry can come from a variety of sources, such as medical records and health insurance claims. Regardless of the source, you need to be sure that the data you’re collecting will both enhance your registry and help you meet your registry goals.
With so much data available, it’s easy to feel overwhelmed. As long as you keep your focus on your goals, you’ll be able to compile a healthy dataset that will serve your organization’s purposes successfully.
Let’s explore the various data sources you can use to build your registry.
In this post we will cover:
- Clinical Data
- Patient-Generated Data
- Cost & Utilization Data
- Public Health Data
For each, we’ll define what it is, where and how to acquire it, and address any common challenges or limitations of the data source.
Clinical Data
Clinical data is at the core of many registries. It has a powerful ability to summarize clinical experiences and capture information needed to inform healthcare progress.
What Is Clinical Data?
Clinical data helps establish who a patient is – their demographics, family history, comorbidities, procedure and treatment history, and outcomes. The breadth and depth of clinical data opens up opportunities to advance quality improvement initiatives, research, registry-based studies and virtual trials, and other stakeholder activities.
Sources of Clinical Data
The key source of clinical data resides in a patient’s medical record, which collects nearly all care delivery and treatment decision details, regardless of treatment setting. The record resides in outpatient and inpatient facilities and contains information from various ancillary services, such as chemistry labs, pathology or radiology departments, and pharmacies.
Because a patient’s medical record aims to show as complete a picture of the patient as possible, the data abstracted from those records is often more comprehensive than data from other sources. Additionally, medical records follow patients over time, allowing registries to collect and analyze longitudinal information. Medical record data is therefore extremely valuable to budding registries.
Collecting Electronic Health Record Data for Registries
Medical record data can be highly useful, but gaining effective access to that data isn’t always simple.
For clinical registries, successful medical record data acquisition relies on the combined effectiveness of trusted relationships with the record stewards and technologies used to transmit data, such as electronic health record (EHR) systems.
The vast majority of hospitals in the United States use electronic health records (EHRs). There are many different EHR vendors, which can cause difficulties with data sharing. Given the variety of EHR solutions and the varying policies at different sites, having a flexible data integration approach for your registry is key.
Here at ArborMetrix, we integrate with hospital and clinic EHRs and other data systems with minimal impact to IT and clinical resources. Whether healthcare organizations use cloud-based EHR systems, custom-defined layouts, or anything in between, our registry data platform seamlessly collects the necessary data.
We support industry standards for interoperability, as well as custom data formats. Industry standards such as HL7 FHIR® and C-CDA serve to bridge documentation from disparate EHRs in ways that facilitate data aggregation across populations.
Limitations of EHR Data for Registries
Sometimes, data collected in EHRs lacks the clinical specificity needed for quality improvement and research done through registries.
Many EHRs are not configured to collect the details on the high resource intensive conditions. This leaves the data with gaps that prevent it from being used to make impactful changes in care delivery. Consequently, there is the need to supplement the EHR data with specialized and targeted assets required for real clinical improvement.
Some registries use technology-enhanced clinical abstraction to address this and create a more complete dataset. For example, our electronic case report forms (eCRFs) allow registry participants to enter cases on-site or remotely. Once all the required data have been entered into the system, data analytics and dashboard reports are available immediately.
Leveraging eCRFs to fill any gaps in EHR data creates more complete data set.
Patient-Generated Data
Driving high-value healthcare often requires data that goes beyond an exam room or treatment facility. Your registry can generate many healthcare insights through patient-generated data.
What Is Patient-Generated Health Data?
Patient-generated health data (PGHD) “are health-related data created, recorded, or gathered by or from patients (or family members or other caregivers) to help address a health concern,” according to the National Learning Consortium at HealthIT.gov. [1]
A few examples of patient-generated data include a person’s health and treatment history, symptoms, biometric data, and patient-reported outcome measures.
Sources of Patient-Generated Data
A core source of patient-generated data comes from patient-reported outcomes (PRO). PRO are health data sourced directly from patients, without any interpretation or alteration by a clinician or other individual.
Because inpatient documentation in the hospital medical record stops when patients are discharged or transferred, and outpatient documentation occurs longitudinally but is sporadic and only captures new data clustered around patient encounters., Adequately capturing patient data outside of the walls of a hospital or outpatient clinic requires PRO.
Collecting PRO Data
A user-friendly PRO solution encourages patients and caregivers to share details of their experiences, quality of life, and outcomes in ways that extend the period of data acquisition following inpatient care. It also fills the gap in patient communication between outpatient encounters.
Important pieces of information that come from PRO include:
- Type, frequency, and severity of symptoms
- Nature and severity of disability
- Impact of condition on daily life
- Patients’ perceptions and feelings about their conditions and/or treatments
Including PRO in their registries can help healthcare organizations track outcomes, support shared decision-making, develop guidelines, inform best practices, and calculate predictive analytics.
Usually, organizations acquire PRO data through surveys and other technology-enabled digital platforms. Other sources of patient data include wearables, population health measures, and patient registries.
Challenges of Collecting PRO Data
A question that sometimes comes up with PRO data is whether it is accurate. Fortunately, research shows a high correlation between patient-reported data and clinically documented chart abstracted data. [2] This means data collected from patients is accurate and trustworthy.
A second potential challenge in collecting PRO data involves patient engagement. Patients who are engaged in their care can share information and communicate more easily with their caregivers, making PRO acquisition much more effective.
Organizations that have successfully engaged patients have employed a few helpful tactics, including:
- Treating patients like consumers, by making surveys and feedback platforms responsive and user-friendly.
- Recognizing the role of technology to help meet patients where they are at with a simple and effective survey experience.
- Delivering a tailored experience so surveys can be customized to different patient populations and flexible and responsive to their unique needs.
- Being creative and compelling by giving patients and their families access to other resources, such as a shared decision-making platform.
Collecting PRO Data from Patient Devices
Other patient-generated data sources include passive data collection from patient devices, such as wearables or smart healthcare electronics. These systems often have data connectivity tools, such as APIs, to security transfer patient data points to a registry platform. Real-time, streaming data can further augment information gathered from PRO surveys and clinical data systems, without burdening patients or providers.
Cost & Utilization Data
Many healthcare and clinical registry pursuits are driven by value-based care initiatives — improving outcomes while controlling for costs. These programs require measuring health outcomes against the cost of delivering those outcomes.
What Are Cost & Utilization Data in Healthcare?
Cost and utilization data helps to assess care value for preventing and treating health problems.
Healthcare cost has a specific meaning, but its “interpretation often depends on whose perspective is being considered.” According to the AMA Journal of Ethics, the definition of cost means:
- To providers: the expense incurred to deliver health care services to patients.
- To payers: the amount they pay to providers for services rendered.
- To patients: the amount they pay out-of-pocket for health care services.” [3]
Healthcare utilization “refers to the use of healthcare services. People use healthcare for many reasons including preventing and curing health problems, promoting maintenance of health and well-being, or obtaining information about their health status and prognosis,” according to the Encyclopedia of Behavioral Medicine. [4]
Sources of Cost & Utilization Data
Key sources of cost and utilization data are collected by health insurers, governmental organizations, and public payers. They include claims data regarding patient treatments, as well as public datasets from organizations like the Centers for Medicare and Medicaid Services (CMS) and the Agency for Healthcare Research and Quality (AHRQ).
Payer and regulatory data are especially useful for registries examining the economic aspect of healthcare. Specifically, by including these types of data, registries can more easily perform cost and quality analyses due to the clear records of a patient’s treatment history. Also, especially for government data, the dataset may be larger-scale and include information from a broader patient population.
These data can be acquired from private and public health insurers, and many regulatory datasets are available for public use directly from the sponsoring governmental agencies. For example, we use the Healthcare Cost and Utilization Project (HCUP) dataset from AHRQ and the Limited Data Set from CMS to calibrate propensity-score-based risk adjustment models and to provide external benchmarks for our registry clients such as the Michigan Value Collaborative.
Challenges with Healthcare Cost & Utilization Data
While cost and utilization datasets can be extremely valuable, they also have their limitations.
Dealing with claims data, whether they come from commercial or public payers, can be subject to reversals and adjustments, completion lags, and coding variation and changes that can present considerable challenges when incorporating such data into a new registry.
Overcoming these challenges requires registries to use sophisticated analytic tools and have deep expertise with payer and regulatory data.
For example, we developed and use an elastic episode-based model that lends itself to more appropriate assessments of longitudinal cost. Our model also allows us to more fully assess the outcomes of medical treatments by looking at longitudinal episode cost across a variety of conditions, settings, and care types. This approach captures episodes of care, episodes of illness, and health maintenance episodes to illustrate cost effectiveness across a variety of components. Through this model, we can track care in systematic ways to make comparisons on cost and outcomes.
Public Health Data
There are a variety of factors that significantly influence healthcare outcomes, with clinical care and patient health behavior only representing a portion of the influences. Up to 50% of health outcomes are contributed to community and population health factors. [5]
Consequently, efforts to advance health care requires a full view of all factors that influence health outcomes. This view is critical to understanding the pathways to advance the direction of health care.
Registries can be powerful for providing data to understand the multitude of pathways influencing patient health and using that data to support the development of evidence-based clinical interventions, community guidelines, even advocacy for policies and regulations.
Sources For Public Health Data
There are numerous public health data assets for evaluating community and population health factors. Government data sources are easily available and can layer valuable information on the prevalence and severity of health conditions across different populations in a community. Some sources of public health data include:
- Behavioral Risk Factor Surveillance System (BRFSS): Includes data at the county and census tract-level related to health outcomes, clinical care, and health behavior.
- National Center for Chronic Disease Prevention and Health Promotion’s (NCCDPHP) Chronic Disease and Health Promotion Data and Indicators Open Data Porta: Includes numerous data access tools across a wide range of chronic disease data, risk factor indicators and policy measures.
- National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention – NCHHSTP AtlasPlus: Includes more than 15 years of CDC’s surveillance data on HIV, viral hepatitis, STD, and TB. AtlasPlus also provides access to indicators on social determinants of health (SDOH) allowing users to view social and economic data in conjunction with surveillance data for each disease.
- National Environmental Public Health Tracking Network: The National Environmental Public Health Tracking Network (Tracking Network) brings together health data and environment data from national, state, and city sources.
- CDC National Vital Statistics System (NVSS): Includes health outcome data, such as mortality rates by cause of death at the county level.
- CDC Wide-ranging Online Data for Epidemiologic Research (CDC WONDER): Includes nearly 20 collections of public-use data for U.S. births, death, cancer diagnoses, tuberculosis cases, vaccinations, environmental exposure, population estimates, among other topics.
- USDA Economic Research Service (ERS) – Food Access Research Atlas: offers census-tract-level data on food access that can be downloaded for community planning or research purposes.
Public health data sources add critical information to registry systems to enable registry stakeholders to best serve the needs of the populations they serve. Public health and clinical information can be blended together within a single registry to have a comprehensive, multi-factorial view of health. This creates a platform that can best fulfill the meaningful and diverse healthcare objectives registries pursue.
Set Your Registry Up for Success
Whether you’re building a condition-specific registry for quality improvement or expanding your registry to address new research priorities, you need to be thoughtful about what data you include. Further, you need the tools, expertise, and technology to ensure your registry can achieve your goals.