Article Text

Answering calls for rigorous health equity research: a cross-sectional study leveraging electronic health records for data disaggregation in Latinos
  1. John Heintzman1,
  2. Dang Dinh1,
  3. Jennifer A Lucas1,
  4. Elena Byhoff2,
  5. Danielle M Crookes3,
  6. Ayana April-Sanders4,
  7. Jorge Kaufmann1,
  8. Dave Boston5,
  9. Audree Hsu6,
  10. Sophia Giebultowicz5 and
  11. Miguel Marino1
  1. 1Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
  2. 2Department of Medicine, University of Massachusetts, Boston, Massachusetts, USA
  3. 3College of Social Sciences and Humanities, Northeastern University, Boston, Massachusetts, USA
  4. 4Rutgers School of Public Health, Piscataway, New Jersey, USA
  5. 5OCHIN, Portland, Oregon, USA
  6. 6California University of Science and Medicine, Colton, California, USA
  1. Correspondence to Dr John Heintzman; heintzma{at}


Introduction Country of birth/nativity information may be crucial to understanding health equity in Latino populations and is routinely called for in health services literature assessing cardiovascular disease and risk, but is not thought to co-occur with longitudinal, objective health information such as that found in electronic health records (EHRs).

Methods We used a multistate network of community health centres to describe the extent to which country of birth is recorded in EHRs in Latinos, and to describe demographic features and cardiovascular risk profiles by country of birth. We compared geographical/demographic/clinical characteristics, from 2012 to 2020 (9 years of data), of 914 495 Latinos recorded as US-born, non-US-born and without a country of birth recorded. We also described the state in which these data were collected.

Results Country of birth was collected for 127 138 Latinos in 782 clinics in 22 states. Compared with those with a country of birth recorded, Latinos without this record were more often uninsured and less often preferred Spanish. While covariate adjusted prevalence of heart disease and risk factors were similar between the three groups, when results were disaggregated to five specific Latin countries (Mexico, Guatemala, Dominican Republic, Cuba, El Salvador), significant variation was observed, especially in diabetes, hypertension and hyperlipidaemia.

Conclusions In a multistate network, thousands of non-US-born, US-born and patients without a country of birth recorded had differing demographic characteristics, but clinical variation was not observed until data was disaggregated into specific country of origin. State policies that enhance the safety of immigrant populations may enhance the collection of health equity related data. Rigorous and effective health equity research using Latino country of birth information paired with longitudinal healthcare information found in EHRs might have significant potential for aiding clinical and public health practice, but it depends on increased, widespread and accurate availability of this information, co-occurring with other robust demographic and clinical data nativity.

  • Health Policy
  • Health Equity

Data availability statement

Data may be obtained from a third party and are not publicly available.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Survey-based research and national recommendations suggest that the birth country of Latino patients (the largest ethnic minority) in the USA should be collected and is associated with health outcomes.

  • Few if any real-world healthcare datasets contain this place of birth information. Therefore, real-world research, especially research over time verifying differential healthcare outcomes and/or utilisation, is lacking.


  • Thousands of Latino patients had place of birth collected in a large network of hundreds of federally funded clinics.


  • Clinics and health systems should prioritise the further understanding of disaggregated data in Latinos to better the health outcomes of the largest ethnic minority in the USA.


The Latino population is the largest ethnic minority1 in the USA and is heterogeneous, especially with respect to birth country.2 Country of birth is sometimes studied as an important health and healthcare factor in Latino patients in the USA,3 4 and may be associated with differences in disease prevalence, social disadvantage, immigration status and/or acculturation.5 6 Along these lines, there have been numerous calls in the health services literature for ‘data disaggregation’ when studying minority populations,7–9 including Latinos,10 in order to better understand health and healthcare inequity. Disaggregating country of birth among Latino patients is complex, however. Rigorous and effective health services research using Latino country of birth information might have significant potential for aiding clinical and public health practice, but it depends on widespread and accurate availability of this information, co-occurring with other robust demographic and clinical data.7 11 Many commonly used sources of demographic information used in population-based research do not have robust race/ethnicity data,12–14 let alone country of birth information in Latinos,15 16 making evaluation of country of birth and its impact on healthcare impossible in many circumstances. Long-term cohort studies on Hispanic/Latino health (such as the Hispanic Community Health Study/Study of Latinos)17 18 may have country of origin data, but not detailed health system data.

Electronic health records (EHRs) are a burgeoning data source in public health and health services research in Latino populations, including in numerous Latino subpopulations19 20 and evaluations of quality of care in Latinos.20 21 To date, there are few examples of Latino country of birth reported in EHR. Community health centres (CHCs) disproportionately care for low-income Latino patients,22 and therefore, are an important setting in which to study the availability and utility of country of birth information in a clinical dataset. The OCHIN network of CHCs is one of the largest hosted linked EHR-based networks in the USA.23 Country of birth is reported in a proportion of Latino patients in the OCHIN network. Our objective was to describe the extent to which place of birth is recorded in EHRs for Latinos across a multistate network of CHCs. Furthermore, to better understand the clinical and public health utility of place-of birth information, we aimed to describe the characteristics of Latino patients with and without a recorded place of birth as well as the clinics that collect place of birth information by comparing USA)-born, non-US-born and place-of-birth not recorded Latino patients in this network.

Cardiovascular disease is the biggest cause of mortality in the USA and continues to be the main driver of mortality in Latinos. Previous studies suggest that Latino individuals experience more frequent and more poorly controlled risk factors for cardiac events,24 and evidence suggests that cardiovascular risks and care may not be uniform across Latinos of varying national origins.25 Therefore, we examined cardiovascular disease risk factors as a test case for the comparison of demographic, disease risk and care measures in US-born and non-US-born Latinos, and among Latinos where country of birth was not recorded.


Data sources

We used EHRs from the OCHIN network, a hosted, linked multistate EHR network of CHCs across the USA. All study data, including country of birth, are collected in the routine course of clinical care in structured, pre-existing fields in OCHIN Epic (EHR software), and was not collected specifically for this study.


Our study population consisted of 914 495 Latino patients (ages 9–79 years) who had at least one ambulatory or telemedicine visit between 1 January 2012 and 31 January 2020, within an OCHIN clinic. All Latinos were included regardless of having country of birth information recorded in the EHR.

Main independent variables

Our primary independent variable was a mutually exclusive three-category grouping of the combination of ethnicity and nativity: Latinos with a recorded birthplace of the USA (ie, US-born), Latinos with a recorded birthplace outside of the USA (ie, non-US-born) and Latinos without a place of birth recorded in the EHR. This birthplace information was self-reported by patients and was extracted from a discrete variable field used/built for the purpose of recording country of birth. While we use ‘Latino’ in the majority of our discussion because it is often preferred in our study population, the actual ethnicity information collected by clinics is Hispanic and non-Hispanic, ‘Hispanic’ and “Latino” differ slightly in their definitions.26

Cardiovascular disease prevention outcomes

We described our three ethnicity/nativity groups by basic clinical features relevant to cardiovascular disease prevention (visits per year, smoking status, obesity, hyperlipidaemia diagnosis and ever receipt of a lipid test, diabetes diagnosis and ever a haemoglobin a1c test, hypertension diagnosis, heart disease diagnosis). All diagnoses data come from the ‘problem list’ in the EHR; labs were resulted labs to the clinic. A patient’s most frequently visited clinic was used as the index clinic in this portion of the analysis.

Potential confounders

In descriptive and modelling analyses, we considered common patient-level demographics including sex, age at first visit, insurance status, income, race, preferred language, annual ambulatory visit rates and smoking status. While we account for clustering by a patient clinic (see “Statistical analysis”), we describe these potential confounders by clinic characteristics as well.

Statistical analysis

We described our patient characteristics by ethnicity/nativity groups using means, SD, frequencies and percentages. We also described patient characteristics by specific country of birth among non-US-born Latino patients. Next, we visually described through a histogram the number and per cent of Latino patients with a country of birth recorded by state. In addition, we described select characteristics of clinics (eg, total number of patients, total number of clinics, patient-panel characteristics) that collect this information and those that do not collect this information.

We then reported unadjusted and covariate-adjusted prevalences of International Classification of Disease (ICD)-9/10 coded heart disease, obesity, diabetes, hypertension and hyperlipidaemia by our three ethnicity/nativity groups. We also described the EHR reported demographic factors and ICD-9/10 coded disease prevalence of patients with specific countries of birth among the non-US-born Latino patients in our sample. To estimate covariate-adjusted prevalences for heart disease, obesity, diabetes, hypertension and hyperlipidaemia by three ethnicity/nativity groups, we used generalised estimating equations (GEE) logistic regression that included indicators for ethnicity/nativity groups and the potential confounders listed above. We accounted for clustering on the patient’s most frequented clinic through the use of a robust sandwich variance estimator with exchangeable correlation structure.

Lastly, among a subset of non-US-born patients within five Latin countries with large sample sizes (Mexico, El Salvador, Guatemala, Dominican Republic, Cuba), we estimated unadjusted and adjusted prevalences of heart disease, obesity, diabetes, hypertension and hyperlipidaemia diagnoses using a similar GEE logistic model replacing the indicators for ethnicity/nativity groups with indicators for country of birth. We performed this analysis on the entire sample and in those age 18 and over. Analyses were performed in Stata V.15 and R V.1.1 and two-sided testing with set 5% type I error.

Patient and public involvement

Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.


Frequency of country of birth reported in the EHR

Descriptive statistics of our patient sample (N=914 495) are in table 1. Of the 914 495 Latino patients who met study criteria, 127 138 (13.9%) had a reported place of birth in the EHR. Of the 127 138 Latinos with a country of birth, 81 427 (64.0%) were non-US-born and 45 711 (36%) were US-born.

Table 1

Characteristics of Latino patients with at least one ambulatory visit at an OCHIN clinic in 2012–2020, by reported country of birth, N (column %)

Comparison of Latinos with and without a place of birth reported

Latino patients with record of their place of birth were less often uninsured, and more likely to prefer Spanish than patients without a country of birth reported (table 1). Fewer identified as black compared with those with a place of birth reported. Latino patients with a place of birth documentation had similar diagnoses prevalence (of diabetes, heart disease, hyperlipidaemia and hypertension) to Latinos without a country of birth recorded, but had a lower prevalence of having a lipid test or haemoglobin a1c in the study period.

Patient characteristics of US-born and non-US-born Latinos

Of Latinos with a country of birth recorded, non-US-born Latinos were more often always under 138% of the federal poverty level, more often uninsured, and more preferred Spanish than US-born Latinos (table 1). Non-US-born Latinos had a higher rate of clinic visits per year as well. They more frequently had documented diagnoses of diabetes, hyperlipidaemia and hypertension, and more often had at least one Hba1c and lipid test in the study period. Of note, the prevalence of these measures (eg, insurance, language, diabetes, hypertension, hyperlipidaemia, Hba1c, lipid screen) in the US-born and non-US-born Latino groups differed from the more general groupings (country of birth recorded or not). More non-US born Latinos spoke Spanish and had diabetes, hypertension, and hyperlipidaemia diagnoses, as well as ever had a haemoglobin A1c or lipid test. Latinos without a place of birth recorded were more often uninsured.

The specific countries of birth of the non-US-born Latino patient group, with demographic information and disease prevalence, are described in online supplemental appendix tables 1–6. Nineteen countries of birth outside of the USA were represented among non-US-born Latino patients with a country of birth recorded. The most frequently represented countries in this data were Mexico (44% of non-US-born Latinos), Guatemala (17%), El Salvador (13%) and the Dominican Republic (12%). Across several patient demographics (eg, sex, income), we observe varying distribution of those demographics by country of birth.

Supplemental material

Clinic characteristics

While Latino patients in the OCHIN network spanned 1811 clinics in 26 states, only 782 (43%) clinics across 22 states recorded at least 1 patient’s country of birth. Despite country of birth being collected in clinics in 22 states, greater than 98% of Latinos with a country of birth reported were seen at clinics in California, Massachusetts, Texas, Minnesota, Oregon, New Jersey, Wisconsin or Washington (figure 1).

Figure 1

Number and per cent of Latino patients with country of birth recorded by state (2012–2020). The denominator for each state included as Latinos who visited an OCHIN clinic during the study period in that state regardless of whether country of birth was recorded or not. AK, Alaska; CA, California; CO, Colorado; CT, Connecticut; FL; Florida; GA, Georgia; ID, Idaho; IN, Indiana; MA, Massachusetts; MN, Minnesota; MT, Montana; NC, North Carolina; NJ, New Jersey; NM, New Mexico; NV, Nevada; NY, New York; OH; Ohio; OR, Oregon; TX, Texas; UT, Utah; WA, Washington; WI, Wisconsin.

Characteristics of clinics collecting or not collecting country of birth information

In comparing clinics in the OCHIN network who did or did not collect place of birth information, there were numerous demographic and basic utilisation similarities in Latino patients in each category of clinic (table 2). A few differences were that more Latino patients in clinics who did not collect place of birth information were uninsured, fewer preferred Spanish, more had >5 visits per year and fewer had one visit per year. Fewer Latino patients overall visited these clinics.

Table 2

Clinic characteristics of clinics who did and did not record a country of birth in the OCHIN electronic health record

Covariate-adjusted prevalences of heart disease and cardiovascular disease risk factor diagnoses among the three major ethnicity/nativity groups are displayed in figure 2 and online supplemental appendix table 7. Of note, foreign-born Latinos had a lower adjusted prevalence of obesity than the other groups, but diagnosis prevalences of all conditions were similar.

Figure 2

Covariate-adjusted prevalence of heart disease and cardiovascular disease risk factor diagnosis, by ethnicity/nativity category. EHR, electronic health record.

Covariate-adjusted disaggregated prevalences of heart disease and its risk factors are shown in figure 3 and online supplemental appendix table 8 for the five Latin countries with the highest sample sizes. Of note, when this data are disaggregated into specific country of origin, there is increased variability in findings between countries, especially in hyperlipidaemia, hypertension and diabetes. In sensitivity analyses for the above outcomes including only adults (age 18 and over), the results did not appreciably differ between the entire sample and the adult sample.

Figure 3

Covariate-adjusted prevalence of heart disease and cardiovascular disease risk factor diagnosis prevalences, by Latin country of birth. EHR, electronic health record.


We endeavoured to describe, in one of the largest multistate data networks serving low-income patients in the USA, the frequency and characteristics of Latino patients with country of birth information recorded, as compared with those without such information recorded, using cardiovascular disease risk factors/care as a clinical test case. Our goal was to more fully understand the potential clinical, policy and research implications of using EHRs to address calls for data disaggregation in health equity research.7–10

There were several evident themes in this analysis. First, place of birth was reported across hundreds of health systems and clinics in a network of CHCs. This reinforces the widely distributed capacity to collect this information across CHCs caring for thousands of Latino patients. CHCs are an environment uniquely suited to collect this information; previous work has shown that immigrant Latino populations engage care at CHCs and are comfortable disclosing sensitive information in this environment.27–29 This environment, coupled with a robust, linked EHR may allow the pairing of country of birth data with longitudinal healthcare information that can be missing from national surveys to further improve health equity research. This expands on prior long-term cohort studies of risk factor prevalence30 31 by instead examining the real-world documentation of these risk factors in a multistate network.

While country of birth information is collected across numerous and varied health systems and clinics, the collection of this information is concentrated in several states. This increased collection may be associated with states with a high non-US-born population,32 although not in all cases. The collection of nativity data in Latino patients may not always be perceived to be safe, for fear that disclosure of nativity among undocumented immigrants may lead to discovery by federal authorities and eventual deportation. Specific state policies may make this disclosure of nativity (and therefore, possibly immigration status) more or less risky.33 34 Indeed, the 8 states where most country of birth data were collected rank in the top 15 of immigrant-friendly policy states in some evaluations.35 Our analysis was not designed to formally evaluate this association, and unadjusted confounders (eg, number of clinics) between states make comparisons difficult. Further research should investigate this relationship. Because of safety concerns, one could hypothesise that clinics who never collect place of birth information do so because their Latino patients are more likely to be immigrants, and disclosure of this information may involve risk of discovery by authorities and deportation for those who may be undocumented. While clinics not collecting place of birth information did see more Latino patients who were never insured (which might be a proxy for immigration status),19 a substantial percentage of Latino patients who were never insured were seen at clinics who collected birth country information more frequently. Clinics never collecting place of birth information were less likely to see Spanish-preferring patients, which also has been an oft-used proxy for immigration-status or acculturation level. Therefore, these findings do not suggest a clear association between birthplace collection (or lack of it) and proxies for recent immigration.

Our description and analysis of individual patient characteristics by US-born, non-US-born and not-recorded groups, compared with our disaggregated results, demonstrates why it should be a health equity research priority to understand this data more fully. Our analysis of heart disease and cardiovascular risk factor diagnosis in our three broad categories reveal very similar prevalences of diagnoses. However, when disaggregated, we observed wide variation in some conditions. Hypertension prevalence ranged from 15% to 23%, hyperlipidaemia from 15% to 27% and diabetes varied from 5% to 18%. This suggests that Latino populations may vary in their basic cardiovascular risk and/or care by specific country of origin, and combining them into a single broad category (or even smaller broad categories, like ‘non-US-born’) inappropriately combines a heterogeneous population that may have very different needs for prevention and treatment of heart disease and cardiovascular disease risk factors. Despite the challenges, health systems and researchers should continue to pursue safe ways to collect nativity information with co-occurring longitudinal, multilevel clinical data in order to better prevent and/or mitigate cardiovascular disease inequities.


Country of birth information was present for a small subset of our total Latino population, which could produce greater variation in results. These findings and their implications could change with a larger sample. We were not able to consider time in the USA, generation after immigration or subgroups of US-born patients (eg, US-born Mexican Americans or US-born Cuban Americans) which may play a significant role in the healthcare utilisation of this population. We did not analyse local or state policies or environments that may affect clinic workflow and practice in the collection of country of birth. CHCs are likely a unique environment for collecting sensitive information,28 and may not represent other clinical environments.


In a multistate network of CHCs with a linked EHR, country of birth was collected in numerous Latino patients in many clinics and health systems, but not equally across all states in our network. Non-US-born, US-born and Latino patients without a place of birth reported had differing demographic and clinical characteristics. State policies that enhance the safety of immigrant populations may enhance the collection of health equity related data. CHCs should strongly consider the collection of country of birth information, as the utilisation and risk factors of Latinos from specific countries of origin may differ, and population differences may not be observable when broader categories than specific country of origin are used. Rigorous and effective health equity research using Latino country of birth information paired with longitudinal healthcare information such as those found in EHRs might have significant potential for aiding clinical and public health practice. However, this depends on increased, widespread and accurate availability of this information, co-occurring with other robust demographic and clinical data nativity.

Data availability statement

Data may be obtained from a third party and are not publicly available.

Ethics statements

Patient consent for publication

Ethics approval

The study was formally approved by the Oregon Health & Science University Institutional Review Board.


We acknowledge Roopradha Datta MPH for her assistance in manuscript completion and submission. We also acknowledge the patients, staff and clinicians of the OCHIN Practice Based Research Network. This work was conducted with the Accelerating Data Value Across a National Community Health Centre Network (ADVANCE) Clinical Research Network (CRN). ADVANCE is led by OCHIN in partnership with Health Choice Network, Fenway Health, ((and)) Oregon Health & Science University (,and the Robert Graham Centre HealthLandscape). ADVANCE is funded through the Patient-Centred Outcomes Research Institute (PCORI), contract number RI-OCHIN-01-MC.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors This was an extensive collaboration between five organisations, necessitating eleven authors, all of which contributed substantially to the work. JH was lead in funding acquisition, conceptualisation, writing and revising the original draft, and is acting as guarantor of the manuscript. MM was equal in methodology, conceptualisation and revising, and supporting in visualisation and formal analysis. DD was lead in methodology and formal analysis, visualisation and supporting in editing/revising. SG was lead in data curation, and supporting in editing and revising. JAL was equal in conceptualisation, data curation and supporting in revising/editing. EB, DB, JK, DMC and AA-S was supporting in conceptualisation and revising/editing. AH contributed to writing the original draft and revising/editing the draft. All authors read and approved the final manuscript.

  • Funding This work was funded by the NIH National Institute for Minority Health and Health Disparities (grant number R01MD014120 awarded to JH), (grant number K23MD015267 awarded to EB). DMC was in part supported by the Robert Wood Johnson Foundation.

  • Disclaimer The views expressed here do not necessarily reflect the views of the Foundation.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.