Data acquisition and matching
Data sources
The primary data source for the Community Benefit Insight tool is IRS Form 990. Additional data sources are used to confirm nonprofit hospital status and provide contextual information about each hospital or health system. This supplemental information also allows for hospital comparisons. Data sources include:
- IRS Form 990
- Schedule H
- CMS Cost Report
- CMS Providers of Service
- Kaiser Family Foundation
- Hilltop Institute
Data acquisition and matching
The steps for acquiring and matching relevant hospital data are as follows:
- Electronic IRS Form 990 data (including all submitted schedules) is extracted, by employer identification number or tax ID (EIN), from the Amazon Web Services (AWS) hosting site. Broad selection criteria are used to capture nonprofit hospital EINs from these sources:
- IRS Exempt Organization Annual Extract of Financial Data (where Schedule H submission is indicated)
- IRS Exempt Organization Business Master File Extract (where foundation or NTEE code is a hospital)
- List of nonprofit hospital EINs collected in research performed by Dr. Gregory Tung and colleagues at the University of Colorado Denver - Department of Health Systems, Management, and Policy.
- List of nonprofit hospital EINs collected by Northeastern University in development of the Community Benefit Web Tool prototype, a precursor to CBI.
- Nonprofit hospitals, along with their CCN (CMS certification number), are identified from these sources and meeting the below criteria (d):
- CMS Cost Report – where PRVDR_CTRL_TYPE_CD is voluntary nonprofit, church, or other.
- CMS Providers of Service (POS) – where GNRL_CNTL_TYPE_CD is church, private (not for profit), or other.
- And retain only hospitals meeting these criteria:
- Short-Term/Acute
- Children's
- Critical Access (CAH)
- Name and address information is extracted from Form 990 and Schedule H of (1) above and nonprofit CCNs of (2) above.
- Addresses are standardized by running them through Google's Geocoding API
- Form 990 and Schedule H data (1) are matched to CCNs (2) by name and standardized address. The results create the EIN-to-CCN crosswalk, which contains exact matches based on:
- Standardized address match
- Hospital name plus city and state match (including city/state ensures facilities with same names but located in different locations don't result in false match)
- Nonprofit hospitals without an exact match or questionable match are output for further examination. This occurs in approximately 20% of cases in any given year. These cases occur as:
- Exact matches where the CCN's EIN has changed from other years. This may be valid, but further examination is warranted to confirm an erroneous match did not occur.
- Partial matches, some components of name and/or address are similar
- No valid match, this category will include CCNs which submitted paper returns and are therefore not found in the electronic data extracted in (1) above.
- After further analysis is performed on cases noted above (6a-c) and correct EIN-to-CCN crosswalk information is obtained:
- Records are added to the list of exact matches in (5) and used to continue to (8) to build the Community Benefit Insight database.
- If the EIN is determined to be correct, but no electronic form is available, send to GuideStar for extraction of IRS data from paper form (approximately 4% of nonprofit hospitals a year).
- The EIN-to-CCN crosswalk (5 and 7) is used to build the Community Benefit Insight database from these sources:
- Electronic Form 990 (Parts I and III) and Schedule H information.
- CMS POS – Hospital county, bed count, medical school, church, and teaching affiliations.
- Area Health Resource File (AHRF) – County level information for hospital county, such as per capita income, median income, % in poverty, % <65 w/o health insurance, unemployment rate.
- Kaiser Family Foundation website – ACA Medicaid expansion and enrollment indicators.
- Hilltop Institute – State community benefits reporting requirement indicator.
Data release schedule
NOTE: The CBI dataset has traditionally been updated in the fall and spring of each year with records from previous tax years. However, due to recent IRS delays in releasing Form 990 data, the timing and content of these data updates have become less predictable. Therefore, it is unclear when we will be able to add new data to CBI. The most recent data update occurred in November 2023, with the addition of 2020 and 2021 records. For news on data releases, please check back on the site, or contact the CBI team at SupportCommunityBenefitInsight@rti.org.
Data Availability
Tax Year | # Tax Reporting Entities |
---|---|
2010 | 2,327 |
2011 | 2,306 |
2012 | 2,304 |
2013 | 2,285 |
2014 | 2,229 |
2015 | 2,206 |
2016 | 2,164 |
2017 | 2,174 |
2018 | 2,127 |
2019 | 2,071 |
2020 | 2,110 |
2021 | 2,083 |