Publicly disclosed data from federal housing surveys must strike a delicate balance: specific enough to render an accurate “big picture” of a region, abstract enough to protect the anonymity of individual respondents. Working with the Department of Housing and Urban Development, our research team conducted a series of experiments to assess whether current reporting practices pose a significant risk to respondent confidentiality.

Problem

The American Housing Survey (AHS) is a biennial audit of housing units conducted by the Census Bureau on behalf of the Department of Housing and Urban Development (HUD.) Provided anonymously by respondents around the country, AHS data is utilized across the public, private, and academic sectors, supporting effective policy development and providing key agencies with a comprehensive snapshot of the American housing ecosystem.

Some of the information disclosed by the AHS may be sufficient for outside parties to identify individual respondents on public use microdata files. Publicly available real estate tax assessment records, for example, might be matched up with variables in the AHS’s Public Use Files. Working with HUD, our research team conducted a series of experiments to assess the scope and seriousness of the risks faced by federal survey agencies in today’s data environment.

Methods

The SDAL team set out to develop a dynamic tool capable of re-creating the distinctive qualities of a region through publicly available data. Our initial experiment used information on geographic features, such as floodplains and green spaces, in addition to county-level tax data to produce a "pseudo universe" of housing units with features mimicking those of Washington D.C., Arlington County, and Fairfax County.

Using this pseudo universe, our researchers were able to assess the risk level each resident might face in being identified based on their housing information. To ensure a rigorous representation of real-world conditions, our researchers conducted a thorough comparison between the information contained in our simulated universe to data provided in the 2013 American Housing Survey.

Impact

The preliminary results of our simulation demonstrate that no one variable makes a particular record identifiable, but certain variables contribute more to identifiability than others. Also, certain combinations of many variables can make a record more identifiable.

Looking forward, we see a broad range of applications for this “pseudo universe approach” to risk assessment. The ability to re-create both the physical and informational qualities of a region could be a valuable tool for predicting the efficacy of possible intervention strategies for reducing risk of disclosure in future federal surveys. This work also lends insight as to how viable it would be to remove certain questions from the AHS since that information could be “looked up” using publicly available data sources.

Back to top