Data Processing

 

Base File Processing

The geographic base files provided by DMTI Spatial include both a point file showing the centroids of six-digit postal codes, or "local delivery units" (e.g. V5A 1S6), as well as a detailed street network file including civic addresses in the proper form for geocoding. Initial surveys of the data revealed that many of these addresses had been entered in forms that were ambiguous, or poorly organized. As such, patients were geocoded via postal code instead of by street address. Postal codes provide a sufficient degree of locational precision, while at the same time facilitating ease of linkage between tables, given that there is a standard, six-digit form for all postal codes, and that each postal code is unique to a specific local delivery unit (LDU).

Postal codes (or local delivery units) were provided by DMTI Spatial as point form data. Included among the attributes for this dataset was a field indicating which enumeration area (EA) each postal code is contained within. 1996 Canadian Census enumeration area shapefiles were matched using the EA as the common identifier.

Matching the TB Dataset to Postal Codes

Joining the postal codes of all of the patients to the DTMI Spatial postal code shapefile allowed for the commencement of visualization and spatial analysis. As there were multiple patients in 38 postal codes, a "count" field containing the number of patients per postal code was added. This provided an indication as to the density of patients in any given postal code. Once the postal codes were joined to the spatial shapefiles, we were able to compare individuals' addresses to their postal codes. Several EAs containing patients did not contain any socio-economic data in the dataset. Thus, socio-demographic and economic data had to be manually assigned. Initially, a point-in polygon search indicated 14 tuberculosis patients residing in 'blank' EAs. Rather than discount these patients, we assigned values to the EAs based on averages of the surrounding EAs. For example, for EA 59029105, we chose the nearest neighbours, 59029106 and 59029107.

Data Scrambling

In the interest and necessity of protecting the privacy of patients, especially patients of potentially controversial diseases such as tuberculosis, direct point mapping was avoided as it may have facilitated the direct identification or location of individual patients.

For the purposes of visualization only, a scrambling program was developed in Avenue (ESRI's ArcView GIS 3.2 object-oriented programming language). By developing a script based on the Number.MakeRandom operator, a random 'scrambler' was added to the actual X and Y coordinates of each point, effectively moving the points to within a 200-metre square of their initial location. This level of scrambling is the equivalent of one to two city blocks, and was deemed to be sufficient in areas with population density, such as the Downtown Eastside or parts of Mount Pleasant or Central Burnaby. The number of possible addresses for each "dot on the map" is, of course, quite high in these areas, high enough to ensure a sufficient level of privacy.

In outlying areas, however, scrambling the locations by 200 metres is far from sufficient to protect the privacy of patients not residing in densely populated areas. As a result, for views of outlying areas, points were scrambled by 400 metres.



Data:
Sources | Study Area | Processing


Home | Introduction | Background
Objectives | Data | Analysis | Error | Conclusions
Links | References | Contact