scout is an innovative new way to browse New York City’s open data portal. Developed by Two Sigma Data Clinic in partnership with the NYC Mayor’s Office of Data Analytics, scout enhances data discoverability and collaboration by evaluating thematic similarity and joinability, and facilitating the creation of curated dataset collections.
How I Started
NYC's Open Data Portal has nearly 4,000 datasets. As the number of datasets continues to grow, so does the challenge of discovering those you need, and those you didn’t know you needed for richer context. scout addresses this by providing user-friendly filters for the user's initial search, and offering thematically similar and joinable datasets to enhance the search results.
Used Data
How I Built This
Inspired by input from the Mayor’s Office of Data Analytics and Open Data Coordinators, scout is a web app powered by Socrata’s API, which provides metadata on all NYC datasets. A key focus in this tool is using data and machine learning techniques to surface datasets similar to a user’s search. We identify datasets that share common columns and evaluate how “joinable” they are by using the data dictionaries and samples of the data. To evaluate thematic similarity, we identify key topics from the data descriptions with the help of natural language processing tools that provide more nuanced results than keyword matching.
BetaNYC is a civic organization dedicated to improving lives in New York through civic design, technology, and data.
We envision an informed and empowered public that can leverage civic design, technology, and data to hold government accountable, and improve their economic opportunity.
