My goal was to create an algorithm that determines—based on historical data—which tree species have the best survival record in specific New York city locations. In the process of doing this, I developed an innovative way for matching trees across NYC Tree Censuses.
How I Started
City trees only survive for a small fraction of their natural lifespan in the wild. As a lot of effort and money is spend annually by the City of New York to maintain its urban forest, my app advises on the trees with the best survival record in any NYC address entered by the user.
How I Built This
Plan(t)wise was my personal project during my data science fellowship at Insight Data Science, New York. I worked on it for approximately two months, starting mid-June of 2016, soon after the most recent NYC Street Tree census dataset was released. Data preprocessing and analysis were done using Python and R. The web app is hosted at AWS, and is powered by Flask. The code I used for the project is open source and can be found at my personal github repos: (1) https://github.com/DrRadan/NYCtrees_munging_model(preprocessing and analysis), (2) https://github.com/DrRadan/Plan_t_wise (application). A slide deck with details about this project, can be found at the “About” link, at the upper-right corner of my web app page (www.plan-t-wise.com).
The goal of this project was to create a predictive algorithm for tree mortality in NYC using historical NYC Street Tree census datasets. The first problem I had to overcome though, was to be able to match the trees across census releases as the trees in the census are not longitudinally tagged with unique identifiers. To achieve this, I took advantage of the latitude and longitude information at 4-decimal points precision (approximately 11 x 11 meters squares), as well as the tree species/variety information that accompanied the 2015 and 2005 datasets.
I devised an experimental scheme (uploaded as ProjectImage1) where, inspired of the contingency tables used in clinical trials, I evaluated at which tree distance between the 2005 and the 2015 tree entries I can get the most consistent results with regards to tree species/variety identity. That distance was identified to be 20 meters, and was confirmed by evaluating that the tree diameter of the “matched trees” is increasing in all but very few outliers (a self-evident truth for trees that are correctly identified to be at the same location between 2005 and 2015 is that they are expected to grow, and definitely not “shrink”).
Having matched the trees between the two datasets, I modeled the features included in the two census datasets using a Random Forest model. The model performed well (AUC of 0.72) with the longitude and latitude parameters, as well as the tree’s size (trunk diameter) at 2005, being the best predictors to its survival. In other words, this model gave me the insight that not all New York neighborhoods are doing the same in terms of tree survival, and that, in general, trees that had managed to survive up to a certain size, had better chances of making it for another ten years. But the model gave me very little information in terms of how to plan so that even neighborhoods with low survival rates can have the best outcomes possible.
Armed with the knowledge that tree location is the most important factor in tree survival, I used k-means clustering to divide NYC in 50 “areas” that were determined by tree characteristics and not human divisions (i.e. boroughs or neighborhoods). Within each area, I determined the total number of trees, the different tree species and varieties that were present, and their proportions, and calculated the 10-year survival rate for each of them.
I my web app, through an intuitive and simple interface, the user enters any NYC address, which using the Google Maps API is translated to longitude and latitude coordinates in the background and matched to one of the 50 tree areas. The app next looks up the data that have been calculated beforehand for the specific area and directs the user to a “results” page (screenshot uploaded as ProjectImage2). At the results page, the user can verify the address he or she entered, gets their top recommended tree species/varieties for the area with their survival likelihoods, and also has the opportunity, through an interactive chart, to see which trees are currently (2015 census) present in that area, at what frequency (a total number of trees used in the calculations is also displayed at the top of the graph and allows extrapolation of the actual number of trees), and how each of those species/varieties has historically fared in terms of 10-year survival. Why don’t you give the app a go with a NYC address of your choice at www.plan-t-wise.com?