Article Text

Download PDFPDF

Geospatial analysis of salmonellosis and its association with socioeconomic status in Texas
  1. Anand Gourishankar
  1. Division of Pediatric Hospital Medicine, Children's National Hospital, Washington, District of Columbia, USA
  1. Correspondence to Dr Anand Gourishankar; agourishan{at}


Objective The study’s objective was to find the association between salmonellosis and socioeconomic status (SES) in hot spot areas and statewide counties.

Design A retrospective cohort study.

Setting The data were recorded regarding salmonellosis in 2017 from the Texas surveillance database. It included assessment of hot spot analysis and SES association with salmonellosis at the county level.

Participants Patients with salmonellosis of all age groups in Texas.

Results There were a total of 5113 salmonellosis from 254 counties with an unadjusted crude rate of 18 per 100 000 person-years. Seven SES risk factors in the hot spot counties were as follows: low values of the severe housing problem, unemployment, African American and high values of social association rate, fast food/full-service restaurant use, Hispanic and Hispanic senior low access-to-store (p<0.05). A 12% difference existed between local health departments in hot (25%) and cold spot (37%) counties (χ2 (1, n=108)=0.5, p=0.81).

Statewide independent risk factors were severe housing problem (incidence rate ratio (IRR)=1.1; 95% CI: 1.05 to 1.14), social association rate (IRR=0.89; 95% CI: 0.87 to 0.92), college education (IRR=1.05; 95% CI: 1.04 to 1.07) and non-Hispanic senior local access-to-store (IRR=1.98; 95% CI: 1.26 to 3.11). The severe housing problem predicted zero occurrences of infection in a county (OR=0.51; 95% CI: 0.28 to 0.95).

Conclusions Disparity exists in salmonellosis and SES. Attention to unmet needs will decrease salmonellosis. Severe housing problem is a notable risk.

  • public health
  • infectious disease medicine
  • healthcare disparities
  • primary health care

Data availability statement

No data are available.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key points


  • This study explored the geographic variation of salmonellosis and its association with socioeconomic status in Texas.


  • Salmonellosis clustered in the north-central part of Texas.

  • High cluster counties had disparities in ethnicity, unemployment and low access to the store.

  • Severe housing problem, social association rate, college education and low access to the store were associated with salmonellosis globally across Texas.


  • This study highlights the importance of location for patients with salmonellosis, emphasising location-based issues such as socioeconomic status, access to local health departments and various social structures.


Salmonellosis is one of the leading foodborne illnesses in the USA. Salmonellosis (non-typhoidal salmonellosis (NTS)) is defined as a clinical illness that results from infection by serotypes of Salmonella other than typhi or paratyphi (A/B/C) that causes typhoid fever.1

In the USA, among 31 major pathogens of foodborne illness, non-typhoidal Salmonella spp was the leading cause of hospitalisation (35%) and death (28%).1 Salmonellosis accounted for 31% of all foodborne illnesses.2 Salmonellosis risk varies by serotypes, for example, Salmonella enteriditis infection rates were higher in lower socioeconomic status (SES), minority and high average number of children per family.3 4 SES was known to influence the risk of exposure to a specific pathogen.5 6 A systematic review of four foodborne-illness pathogens, including Salmonella, showed that higher SES, higher education, unemployment, poverty, urban area and lower deprivation were associated with higher incidence.5 Geographic Information System (GIS) and spatial analysis have expanded into the science of spatial epidemiology.7–9 Area-based SES such as levels at the county, census tract or block group analysis may provide information about transmission and the opportunity to prevent infections.10 The population-based ecological studies are limited in number and not generalisable, as described further here. Using forward sortation area-specific SES and disease rate, a study in Canada found that salmonellosis was associated with age <5 years, non-white race and poverty.11 A study that used block group-level found that decreasing years of education was associated with reduced Salmonella infection.12 A previous census tract-level study in Connecticut found an association between high SES and NTS in persons ≥5 years of age and some serotypes.13 A county-area level non-spatial analysis for SES found unemployment, black, Hispanic and Latino ethnicities were associated with NTS.14 Currently, there is no spatial analysis of area-based (county-level) SES and NTS in Texas.

A county-level assessment will provide new insights for community interventions to reduce the infection rate and community health service disparities. The objective of the current study was to analyse the relationship between non-typhoidal Salmonella infections and county-level SES in the hot spot areas and across all counties within the state of Texas.


The estimated population for each Texas county was derived from the US Bureau of Census. County-level SES was derived from The County Health Rankings & Roadmaps programme and Food Environment Atlas data (table 1). For this study, the county-level NTS (2017) across Texas was retrieved from the Texas Department of Health ( The study focused on the yearly incidence, which included the cumulative outbreak strains of Salmonella in Texas. All the data were joined to administrative boundary shapefile of counties obtained from the TIGER/Line database ( using ArcGIS Pro 2.5 (ESRI Inc, Redlands, California, USA). There was no missing data. The data were derived from the following resources available in the public domain:

  1. Texas Department of Health;

  2. RWJ Foundation Program. County Health Rankings and Roadmaps 2020; https://wwwcountyhealthrankingsorg/.

  3. Food environment atlas data; https://wwwersusdagov/data-products/food-environment-atlas.

Table 1

Explanatory variables: definitions and sources

Statistical analysis

Analysis of the total population and NTS counts included descriptive statistics. The mean (SD) and median (IQR) were calculated for continuous SES variables. Data distributions of SES variables determined the inferential statistics (Student’s t-test or Wilcoxon test). The χ2 statistics compared the proportions of local health departments of Texas. The spatial empirical Bayes (EB) smoothing method reduced the random variation of infection rates. In addition, it accounted for unstable incident rates in areas with small populations (second-order queen contiguity weights using R software V.3.6, Vienna, Austria).15 16 This study does not contain human participant involvement, and no private health information was used. A p value 0.05 or lower was considered statistically significant.

Spatial analysis

A choropleth map was created with the Texas county base map and Jenks optimisation classification method in ArcGIS Pro. Choropleth will show a large-scale variation of salmonellosis incidence rate per 100 000 population (normalised) in each county. Moran’s index represented global clustering (spatial autocorrelation) of counties with high or low salmonellosis across Texas. Moran’s index measures proximity by inverse distance weights (IDWs)—an index that combines two measures of attribute similarity and location proximity into a single index.11 With continuous data, spatial conceptualisation, IDWs, every feature is a neighbour to each other feature is appropriate. The data had 254 observations, and hence a ‘Threshold distance’ was calculated. With incremental spatial autocorrelation, the distance band of 200 000 metres had the highest z-score. Moran’s index is considered statistically significant with a high z-score and p≤0.05. Next, local clustering of the disease investigates the spatial variations and spatial associations. Therefore, optimised hot spot analysis (extension of Getis-Ord Gi* statistics) identifies hot and cold spots by comparing individual high or low values (NTS counts) with surrounding other high or low values, respectively. The optimised hot spot analysis automatically selects an appropriate scale and adjusts results for multiple testing and spatial dependence. A large positive z-score that is statistically significant (p≤0.05) detects hot spots. Similarly, cold spots have a largely negative and statistically significant z-score. The choropleth map can display the hot/cold spot z-scores with confidence intervals in red and blue colours.

Modelling approach

Multiple linear regression was associated with multicollinearity and residual autocorrelation. Also, the NTS counts were skewed. The dependent variable was the number of salmonellosis cases in Texas county in 2017. The overdispersion and 32 counties with excessive zero infections had limited Poisson regression model fit. Therefore, we applied the zero-inflation negative binomial regression model (Zinb). The county-level population at risk obtained from the census was the ‘offset’ in the model. Forward regression was carried out using p value to include variables until the final model. The χ2 test on the log-likelihood difference showed that the final model fitted the data significantly better than the null model.

Vuong test showed that the final model using Zinb performed better than the negative binomial model. The Zinb model fit was further confirmed by the absence of clustering in the model residuals in ArcGIS Pro (no spatial autocorrelation of residuals). The univariable screening included the Zinb regression of the 16 variables (table 1).


In 2017, there were 5113 cases in 254 Texas counties. The average unadjusted county’s crude incidence rate (IR) of salmonellosis in Texas was 18 per 100 000 person-years. The median IR (per 100 000) was 20 (IQR: 11–36). Fourteen per cent (35/254) of counties had zero salmonellosis. The annual county population for the year 2017 ranged from 80 to 4 629 700 (median=18 934; IQR: 6975–52 020). The choropleth map illustrated the county-level comparison of crude IR and EB smoothed rate. Despite EB smoothing of unstable rates, visual inspection of both maps showed (large scale) almost similar variations in Texas (figure 1).

Figure 1

Salmonellosis choropleth map showing crude rate and EB rate. EB, empirical Bayes.

Geographic clustering of the IR was high in central Texas (red) compared with low rates (blue) in east Texas (figure 2A). Hot spot corresponds to public health regions 1 and 2|3 (figure 2B). The global spatial clustering measure, Moran’s Index was 0.1 (p<0.001), which explains that the salmonellosis cluster pattern (hot spot) was unlikely due to random chance (table 2).

Table 2

Comparison of variables between counties identified by hot spot analysis

When comparing hot and cold spot counties, the following SES indicators were significant (p<0.05): low access-to-store in senior Hispanic, Hispanic and African American ethnicities. Hispanics were about 28% compared with 3% of African Americans in hot spot counties. On the contrary, the proportions of white were higher than other races in both hot and cold spot counties without statistical significance. The ‘social association rate’ (high means healthy) was high in the hot spot, and the ‘severe housing problem’ (low means healthy) was high in the cold spot. The study found high proportions of fast-food and full-service restaurant availability and expenditures in hot spot counties. Median income was not different between the groups, but high unemployment was in cold spot counties. There were no differences in those with high-school or college education. Furthermore, there was a disproportionate distribution of local and city health departments among Texas’ 11 public health regions. Higher values of salmonellosis (hot spot) was noted in the regions with a lower number of health departments (hot spot=25%, cold spot=37%, χ2 (1, n=108)=0.5, p=0.81, figure 2B).

Figure 2

Hot spot analysis (A) showing clustering of higher rates (red) of salmonellosis corresponding to public health regions (B).

In the global multivariable model, among the 16 independent variables, the final model included only four variables. These variables were ‘severe housing problem’ (%), ‘social association rate,’ college education (%) and percentage of ‘low access-to-stores in non-Hispanic Asian seniors’ (table 3). Estimates (incidence rate ratio or OR) resulted from one unit increase in the final variable, provided the other variables were held constant in the model. As such, for ‘low-access-to-store of non-Hispanic Asian seniors,’ ‘severe housing problem’ and college education, the rate of salmonellosis would be expected to increase by a factor of 1.98, 1.1 and 1.05, respectively. By contrast, if a county had an increase in ‘social association rate’ by one unit, the salmonellosis rate would be expected to decrease by a factor of 0.89. The ‘severe housing problem’ predicted zero occurrences of salmonellosis. If the percentage of ‘severe housing problem’ increased by one unit, there were 51% odds that a county would have ‘zero’ salmonellosis. Thus, the higher the housing problem, the more unlikely the county had zero infection rate (zero-inflation model, table 3).

Table 3

Multiple regression of salmonellosis rate with different SES factors


Salmonellosis of Texas in 2017 showed disparities in regional clustering of cases, public health regions and SES indicators at the county area level. Optimised hot spot and global Moran’s index method identified 21 counties in the north-central part of Texas with a high IR. The higher incidence of cases in the central part of Texas can be attributed to differences in the availability of local health departments and SES indicators. The central Texas (Public Health Regions 1 and 2/3) presented high disease clusters (statistically significant hot spots, figure 2A). Although no statistical significance, a gap of 12% in the presence of local health departments between cold and hot spot regions is meaningful. Expansion of health services and expenditure may reduce Salmonella incidence.17 18 In bivariate analyses, there was a statistically significant differences between seven SES indicators in the hot and cold spot counties. Based on this study, a combination of four SES indicators of ‘severe housing problem’ (%), social association rate, college education (%) and ‘low access to store’ in non-Hispanic Asian seniors (%) could explain the variability in the occurrence of salmonellosis in Texas.

At county-level GIS analysis, this study is the first one to demonstrate hot spot and SES association regarding salmonellosis (broad, not species specific). Previous studies had attempted a similar effort to analyse SES at different area-based levels; GIS analysis at the census tract-level showed the association of age and high SES in certain Salmonella species.13 The block-level analysis had demonstrated the association of decreasing years of education with a decrease in Salmonella infection.12 Both these studies showed a general trend without hot spot analysis. Although the current study did not analyse various serotypes, it is unclear why specific serotypes affect high SES. Varga et al found S. enteritidis area-level hot spot clustering with SES indicators (income, visible minority, number of children/family).11 The hot spot analysis method differed between the current study and Vargas et al. Understanding these differences includes the interplay of physical, biological, behavioural, cultural, health services utilisation, SES and environmental factors. The current study results were consistent with some of the results from these studies. However, they could expand on analysing various county-level serotypes in the future.

A higher percentage of a college education was associated with high clusters of salmonellosis in this study. Most, but not all, studies have shown associations between high educational attainment and infection.12 14 19 It is postulated that higher education increases awareness of food safety labels.20 By contrast, the decreased incidence was reported in low SES (low education and income) due to better hand hygiene practices, less risky food and better food storage.21–23 Other explanations for higher education and infection associations are greater access to healthcare, health-seeking behaviour, pet ownership and eating fresh produce, raw or uncooked food.24 25 Unemployment has a protective effect on salmonellosis. This factor by itself or concurrent with a lower education level can explain this observation. An Italian study by Borgnolo et al found higher non-typhi Salmonella infection rates in children whose fathers were either unemployed or working in non-blue-collar jobs.26 Thus, SES indicators, economic status (income) and higher educational attainment are intertwined, manifest a differential effect of SES in salmonellosis. By contrast, this study did not show a salmonellosis association with median household income consistent with previous research.19 Hence, the current study underscores the several explanations that interplay between education and economic status.

In a study by Lay et al27 African Americans had a higher incidence of salmonellosis, whereas this study found a higher percentage of African Americans in non-hot spot counties. In Younus et al,12 they found no association with ethnicity and salmonellosis, but our study found that the Hispanics were higher in hot spot counties, consistent with Arshad et al.28 Ethnicity may be a function of individual risk factors and pathogen-specific (ecological effect of serotypes and SES).13 Although disparities exist, it is an unclear association between foodborne pathogens and ethnicities.29 The disparity in salmonellosis among ethnicities can arise from gentrification and housing segregation. Thus, exposing the population segment to any number of high-risk SES indicators noted in this study. Although the Hispanic ethnicity emerged significant, demographic factors, behavioural or cultural, and other individual risk factors may affect these associations.

The other highlight of this study was that the full-service restaurant utilisation among seniors in the county was associated with salmonellosis (seniors are a known, high-risk age group). Darcey and Quinlan and Signs et al have found differences in SES with health code violations and food safety in retails, respectively.30 31 Appling et al reported the risk of Salmonella infection and violations in the restaurant.32 Although the differences between hot–cold counties on restaurant use in this study were small but statistically significant, further studies can strengthen this study’s findings. Furthermore, low SES communities are more likely to visit fast-food restaurants.33 34 Fast-food and full-service restaurant availability and expenditures can be associated with economic disadvantages such as poverty, unemployment or low educational attainment. Thus, despite food safety measures by agencies and food education, SES indicators are the significant determinants for salmonellosis. The ‘low access to stores for seniors’ had divergent results for Hispanic compared with non-Hispanic. There may be a bias due to a higher percentage of Hispanics in the hotspot counties.

The ‘social association rate’ is a powerful predictor of health status (positive perception and health behaviour).35 Although the ‘social association’ is a ‘rate,’ limited by self-reporting of local entities, it measures vital health-related memberships, such as fitness centres, sports organisations, religious organisations, civic and business organisations.36 In addition, social networking and community improvement (social capital) support the belief that if individuals are not isolated and have strong social networks, they make healthy choices.

Based on this study, ‘severe housing problem’ (a measurement of the percentage of lack of kitchen or plumbing facilities, overcrowding and severely cost-burdened) demonstrated substantial influence in salmonellosis. It is the only indicator that predicted the ‘zero’ occurrence of salmonellosis infection. Although it appears protective, it underscores the magnitude and strength of adverse societal problems.37 ‘Severe housing problem’ also reflects reduced food access, poverty, positive food storage and likely under-reporting and less access to healthcare.10 17 38 Adequate housing (a proxy of high SES) prevents harmful exposures and provides a sense of safety, contributing to health. The current study supports allocating resources and services to home environment assessments, indoor pest management, grants for community development, housing and inclusionary zoning and housing policies.39 With growing awareness of geomedicine in primary practice, physicians can expand on the knowledge of where the patients live. Therefore, one may explore their location-based access to local health resources and social, economic and environmental conditions.

SES is challenging as it lacks a single metric and SES index. Also, measurements vary among studies for a single variable. However, Jouve et al had described the inherent issue of a complex interaction between SES and the outcome of interest, a function of differential exposure and differential vulnerability.40 The analysis of Salmonella as a homogenous group may underestimate the association between various serotypes and SES. Under-reporting (decreased case ascertainment) due to passive surveillance of salmonellosis reduces the true incidence. Hence, the established associations will need cautious interpretation. Finally, ecological analyses do not assess confounding, and ‘ecological fallacy’ is inevitable. Although multiple regression addresses confounders, the final model is susceptible to mis-specification. The study’s strength includes group-level analysis accounting for both individual-level and community-level SES, rigorous hot spot analysis, no missing data and modelling at the local and global levels of SES. Data included in the county-level analysis can miss individual-level variation, whereas it provide information for directing policy and resources to the community.


The regional disparity in salmonellosis demands better research, improved capacity and an effort for surveillance to identify actual infection rates and the allocation of resources. However, the weight of the evidence suggests improving SES indicators and access to health services can reduce salmonellosis in Texas and across the USA.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication


I thank the following individuals, Swati Jain, PhD, and Himanshu Batra, PhD, for their expertise and assistance in writing, technical editing and proofreading the manuscript.



  • Twitter @a_gourishankar

  • Contributors AG conceptualised, analysed and wrote the entire manuscript.

  • Funding The author has not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Map disclaimer The depiction of boundaries on this map does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. This map is provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.