scout

Created by BetaNYC
Launch Project
Tags:
Data Discovery Data Management

How I Started

NYC's Open Data Portal has nearly 4,000 datasets. As the number of datasets continues to grow, so does the challenge of discovering those you need, and those you didn’t know you needed for richer context. scout addresses this by providing user-friendly filters for the user's initial search, and offering thematically similar and joinable datasets to enhance the search results.

How I Built This

Inspired by input from the Mayor’s Office of Data Analytics and Open Data Coordinators, scout is a web app powered by Socrata’s API, which provides metadata on all NYC datasets. A key focus in this tool is using data and machine learning techniques to surface datasets similar to a user’s search. We identify datasets that share common columns and evaluate how “joinable” they are by using the data dictionaries and samples of the data. To evaluate thematic similarity, we identify key topics from the data descriptions with the help of natural language processing tools that provide more nuanced results than keyword matching.


BetaNYC

BetaNYC is a civic organization dedicated to improving lives in New York through civic design, technology, and data.

We envision an informed and empowered public that can leverage civic design, technology, and data to hold government accountable, and improve their economic opportunity.