For those familiar with the study of cities, there is nothing new about mixing data and statistics to address urban public policy. It is however important to understand that data, statistics and public policy exist in a kind of equilibrium – as a major stride is made in one area, the other two need to adapt to the change in landscape. Take the various voting rights court rulings and legislation during the 1960s: better data and methods were needed to identify the race and location of individuals to understand and implement new policies around political districting. Today, the social science community is reacting to the vast amount of new (“big”) data becoming available. How should the methodological approaches adapt? What new public policy advances can be made with this new data? This is the context within which Alex Singleton (University of Liverpool), Seth Spielman (University of Colorado Boulder) and I wrote a new textbook entitled Urban Analytics.
So called “big data” is substantively different from the traditional data produced by government entities like the Census Bureau and Bureau of Labor Statistics. Two areas for which social scientists need to be mindful are representativeness and quality. First, big data is not collected by statisticians in a systematic way with the goal of accurately representing the population. The data is collected by for profit firms trying to sell more products to individuals. There is nothing wrong with the motivations of Silicon Valley firms, but their data collection approach simply represents their users and not the population at large. Second, big data tends to come to us in its raw form, warts and all. It mirrors in some respects the initial raw data government agencies collect directly from individuals and firms. The difference is that government agencies have statistical models and follow-up approaches to deal with unanswered questions or illogical responses (e.g., a married three year-old or a 67 year-old enrolled in elementary school).
These two factors mean that our statistical training needs to adapt to the new data environment. The emergence of big data indirectly shines a spotlight on the important work done by governmental statistical agencies. By focusing on collection, cleaning and aggregation of raw data, these agencies allow public policy analysts with basic statistical training to simply use the government data with a high degree of confidence. In contrast, big data pushes those data management and statistical tasks down to the analyst. The data providers in effect are saying, “here ya go, see if you can find a use for this big mess of stuff.” This means that big data requires more advanced statistical and computational skills to handle the large messy datasets. To be clear, with this additional burden comes additional public policy opportunities.
From this milieu has emerged the “smart city.” A smart city leverages detailed data collection to better understand the city and improve services. The smart city leverages a wide array of sensors; for example, GPS devices on buses to provide real-time location information and CCTV cameras to understand pedestrian flows and criminal activity. Residents and visitors can also be “sensors” through smartphone apps such as Tallahassee’s DigiTally that allows users to easily provide information on graffiti, street light outages, etc. The list of innovative apps ranges widely; for example, odor mapping in Pittsburgh and pothole detection in Boston. Ideally, the smart city provides the collected data back to the community in the form of “dashboards” that summarize the information, for example, an overview of Los Angeles statistics and detailed data on Chicago building permits; and the city makes the raw data available for download by researchers. These efforts support an “open government” that is more accountable to its citizens and encourages researchers outside government to contribute back to their communities by studying this data.
I see data driven public policy as critical to successful urban governance. We wrote Urban Analytics to help train the next generation of urban researchers who want to use and contribute to this new data rich environment. Declining computer costs and (relatively) user friendly free open source statistical software have opened the door to analysts outside the traditional disciplines of computer science and statistics. This is why we structured the textbook for social scientists. While most textbooks are either focused on methods or policy, we strove to meld the two into one book as an introduction for urban data scientists. I am optimistic that this approach to teaching will prove attractive to other instructors and beneficial to students. From a broader perspective, I see the demand for data science trained social scientists to only grow for the foreseeable future.
David C. Folch is an Assistant Professor in the Department of Geography.