Simon Fraser University, Geography 455, Dr. Nadine Schuurman and Blake Walker
Matthew Bakker, Kelvin Co, Clayton Crawford, James Mack
Team | Proposal | Acknowledgements | Sitemap

 

 
↑Abstract↓
   
 
 
Point Cluster Method | Within Cluster Distance | Cluster to Cluster Distance | Comparison with Euclidean Cluster Analysis
 
 


      Our model is based upon spatial network cluster analysis using an agglomerative hierarchical clustering method. We apply this approach using the SANET toolbox extension in ArcGIS 10 along networked space to more accurately capture the underlying real-world processes in question. In this way we address some of the limitations of current methods and provide VIHA with a more effective means to carry-out home support service scheduling.

TOP

Point (Patient Address) Cluster Method
      As described above, for the purposes of our study no real data was used. Accordingly, our first step, having created our fake addresses through the random point generator in SANET, was to geocode the addresses from a Microsoft Excel spreadsheet using an address locator and the geocode tool in ArcMap (Figure 2). Following this, we clipped the geocoded points to each of the 14 administrative subregions of the VIHA. As the home support service of each LHA is administered separately the division of points into these groups aligned with administrative practices. Additionally, by reducing the number of points to the LHA scale computational speed was increased which has been a problem with large networks (Sugihara, Okabe, and Satoh 2011). Next, we used the SANET software program’s “Point Clustering Tool” to carry-out a spatial network cluster analysis on the geocoded addresses and generate a hierarchy of nested clusters. The result of this process was an R file readable through the R software program. We next added a line of code to the R file so that it would generate an output in comma separated values (CSV) file format which would be readable by Microsoft Excel and the ArcMap program. Importantly, the hierarchical clustering technique used in SANET does not allow us to predetermine the number of clusters to be created. Instead the number of clusters to be created is selected based on the total amount of points in a LHA and VIHA’s criteria that a cluster should contain no more than 30 addresses. For example, a LHA with 330 addresses would have 11 clusters.


Figure 2. Flowchart of the GIS Optimization Model for VIHA. (Click Image to Zoom)


Driving time as represented by weighted distance, the other criteria for cluster identification, can also be used by specifying the distance between clusters at which the clusters merge. However, these two criteria cannot be used in conjunction due to the R file data format.
      With the clusters identified and the output readable by ArcMap, the clustered addresses were joined to the original attribute table and visualized in ArcMap. Finally, once joined the attribute table of the addresses for a given LHA can be exported as a CSV file with the cluster membership included. In this format the attribute table can be viewed and edited in an Excel spreadsheet for scheduling purposes.

TOP

Within Cluster Distance
      Important to the VIHA is the ability to determine the driving time between patients within the same group. Within the SANET software within cluster distance is calculated as the measure from every patient address to every other patient address with an average distance produced a number which is of more use for VIHA schedulers and staff. We used the same road network and the same weights as we did for the initial spatial network cluster analysis.
      To accomplish this we first created point shapefiles out of every cluster found within a LHA. We then used the Shortest Path Distance between Points in a Set of Points tool within the SANET software program. The results were exported as a CSV file, readable with Microsoft Excel, from which the average distance was taken.

TOP

Cluster to Cluster Distance
      In addition to driving time within patient groups, driving time between patient groups was also of interest to VIHA schedulers and staff. Similar to the calculation of the within cluster average distance, distance between clusters was determined by averaging the distance from every patient address in one cluster to every patient address in the second cluster along a weighted network. Additionally, the distance between home support worker origin and a cluster can be calculated in the same way so long as the home support worker address is classified as a cluster onto itself.
      To begin, we created shapefiles of the clusters found within each LHA. We then ran the Shortest Path Distance from A Points to B Points tool found in the SANET software. The resulting output was saved as a CSV file which included the average distance between clusters.

TOP

Comparison with Euclidean Cluster Analysis
      In order to confirm that the advantages of cluster analysis along a spatial network over traditional Euclidean approaches (Lu 2005) we carried out a Euclidean based cluster analysis using S PLUS software. Important to this second method is the presence of X and Y coordinates for each point generated when we created the random points with the SANET Random Point Generator tool. These were used as the basis for the cluster analysis.
      To run the comparison, the attribute table of the points of a LHA were exported as a DataBase File and imported into the S PLUS statistical software. A k-means cluster analysis was then run on the X, Y coordinates of the points. K-means analysis is a non-hierarchical cluster analysis that requires the number of clusters to be predetermined (Okabe and Sugihara 2012). It provides for the optimization of clustering in which similarity within each cluster is maximized while similarity between clusters is minimized (Okabe and Sugihara 2012). Importantly, within S PLUS this process takes place in Cartesian space using Euclidean distance. For our purposes the number of predetermined clusters for a LHA was the same as those in the spatial network cluster analysis. The analysis was run for 1,000 iterations to achieve the best possible cluster arrangement. The result was a cluster membership which was assigned to each patient address in the dataset. This file with cluster membership was then exported as a new Database File. The new DataBase File was then added into ArcMap, and through the join tool, connected with the original point file so results could be visualized on a map and compared.

TOP

 

© 2013 Bakker, Co, Crawford, Mack. All rights reserved.
This website has been optimized for all desktop and mobile devices in the landscape view.