Using data to solve problems
The explosion in the availability of new sources of data and the emergence of new data science technologies for making use of such data are expected to have a significant impact on public institutions and how they solve problems and make decisions. Whether the goal is improved outcomes and equities, reduced cost and inefficiency, more evidence about what works or the identification of new operational solutions, the ability to use data is becoming essential to governing well in the 21st century.
“Solving Public Problems with Data” examines how data can be used to improve decisionmaking and problem solving in the public sector. Through real world examples and case studies, we discuss the fundamental principles of data science to help foster a data analytic mindset. The goal is to enable you to define and leverage the value of data to achieve your public mission. No prior experience with computer science or statistics is necessary or assumed.
“The world’s national, state and local governments don’t have the right digital skills in the right quantities to meet the challenges of the coming century,” declared mySociety Founder Tom Steinberg in his Manifesto for Public Technology. With “Solving Public Problems with Data” public entrepreneurs have a chance to make up lost ground. This and other data primers and lecture series can also spark movement. We can lament the dearth of data scientists in government, or refocus our efforts on arming public entrepreneurs with tools they need to build a data-driven public sector.
If you invest in things that do not work rather than those that do, real people’s lives may be affected in dramatic ways. However, how do you know what works? Through real-world examples and contemporary debates (e.g., the expansion of health insurance, or crime and violence prevention), you will understand the basic principles of evidence-based decision making, how this relates to “counterfactual” reasoning, and frameworks for thinking about how to apply rigorous, scientifically-based methods to solve problems in ways that are both effective and cost-effective.
Read: “Introduction to Evaluations” by JPAL – an overarching summary of “impact evaluations”, their history, how to randomize effectively in experiments, why randomization is important, and common concerns in study design “Promoting Policies that work: Six steps for the Commission on Evidence-Based Policymaking” by Quentin Palfrey – a list of six concrete steps policymakers “can take to institutionalize the use of administrative data to support policy-relevant research and evidence-informed policymaking.” “Incentives for Immunization” by JPAL – a write-up of a study referenced in Quentin’s lecture, on how using costly incentives in an evaluation may actually decrease the marginal cost of participation in a social program.
Watch: Watch “Social experiments to fight poverty” by Esther Duflo, Professor of Economics at MIT, explain in further detail why social experiments may help policymakers with poverty alleviation, and why experimental designs provide compelling and unforeseen insights where other study approaches may fail.
How-To: Utilize the Methods Guides an online experimental Tools written by Evidence in Governance and Politics (EGAP). These guides provide both technical and non-technical discussions on challenges commonly faced in causal inference, why randomization is important to experimental design and causal attribution, but also cases when of non-experimental designs may lead to causal insights. Example code, written in the R programming language, is accessible here online, as well as example vignettes.
During this session you will learn about quantitative data—including so-called “big data”— and some of the statistical techniques researchers and policy officials use to derive value from it. The lecture emphasizes the broad structure and necessary steps needed in any data-related inquiry: define a research question, formulate a testable hypothesis, think about what data is available, what data is missing, issues of measurement error, and other applied concerns, such as how to link datasets. At the end, we will discuss some of the statistical paradigms that can be used to draw inferences, as well as some of the key ideas when addressing the privacy and confidentiality issues.
Read: Big Data and Social Science: A Practical Guide to Methods and Tools by Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, and Julia Lane How-To: For an in-depth training, check out the Applied Data Analytics Training Program, the first-of-its-kind program to give working professionals an opportunity to develop the key computer science and data science skill sets necessary to harness the wealth of newly-available data and to creatively address real civic problems.
Oftentimes a business or organization may wish to do the same task over and over again, and there is a lot of data at its disposal. In such settings, machine learning algorithms may be a useful way forward to help solve many problems that would be too burdensome for humans to do each time by hand. The lecture discusses differences between prediction systems and recommendation tasks, supplemented by examples from industry that include e-commerce applications, language modeling, and image analysis. The lecture discusses challenges machine learning algorithms face when the underlying data collected are “biased” or not representative of your target population or problem.
Read: A visual introduction to machine learning by R2D3 – machine learning explained in interactive visualizations. An executive’s guide to machine learning by McKinsey – read how are traditional industries using machine learning to gather fresh business insights? Automation Beyond the Physical: AI in the Public Sector by Government Technology – 26 ways artificial intelligence is, or could, help government do its job.
This session will focus on strategies for identifying, finding and getting data: where to locate information; when and how you can collect your own data internally and externally (e.g. through FOIA requests or crowdsourcing) as well as techniques to verify and validate your data sources. We also discuss questions of data quality and how to identify what skills you need to have access to in order to make use of the data you need.