Eric Talley Spearheads the Use of Machine Learning to Study Law

Columbia Law School professor and Data Science Institute (DSI) member Eric Talley used machine learning for the first time to study constitutional discourse. He found that rather than transcending political divides, constitutional arguments in the U.S. Congress tend to magnify the divides between Republicans and Democrats.

Talley published the findings in an article titled “A Computational Analysis of Constitutional Polarization” in the Cornell Law Review. He and co-author David Pozen, another Columbia Law professor, used machine learning specifically to mine a dataset of constitutional remarks made in the U.S. House and Senate over 147 years—from 1873 to 2016. They also studied more recent newspaper editorials on constitutional debate.

“We wanted to see if congressional rhetoric that references the Constitution was more or less polarized than other forms of speech,” Talley said. “Scholars have tended to view the Constitution as a mediating influence that restrains discord and tempers partisan passions. But we discovered the opposite. Since around 1980, constitutional discussion has outdistanced other forms of congressional speech in terms of polarization.”

In particular, the authors found that constitutional discourse over the past four decades has grown increasingly polarized, with conservative-leaning speakers driving the widening gap. Congressional members whose political party doesn’t control the presidency or their own chamber, moreover, are more likely to invoke the Constitution. The pair also noted that contemporary conservative legislators have developed a constitutional vocabulary by which they “own” terms conventionally associated with the constitution, Talley said.

To conduct their research, Talley and Pozen applied a range of machine learning and text analysis techniques to a newly available data set comprising all remarks made in the U.S. House and Senate floors during that time period. They first identified hundreds of thousands of constitutional remarks, then trained a machine-learning algorithm to predict—based solely on the semantic content of the remarks—whether conservative or liberal members were speaking. They used the predictive accuracy of the algorithm as a proxy measure of polarization. If the algorithm couldn’t easily distinguish Republicans from Democrats, it implied that the parties were apt to use constitutional rhetoric in similar or overlapping ways. Conversely, if the algorithm was extremely accurate in predicting politics, it implied that the parties were largely talking past each other. They found that it has become fairly easy for an algorithm to predict whether a constitutional remark was uttered by a Republican or a Democrat over the past four decades. Members of each party have come to follow a kind of constitutional script, said the authors. The details change, depending on the issue, but the broad themes and ideological fault lines remain the same. And instead of transcending political divides, arguments framed in constitutional terms tend to mirror or reinforce those divides.

This is the sixth published paper for which Talley has used machine learning to inform his research. Lawyers produce reams of unstructured data, he said, and “machine learning is a Rosetta Stone for identifying patterns and making predictions about legal outcomes.”

Talley became interested in data science by way of economics. Along with a law degree, he has a doctorate in economics from Stanford and studied statistics and statistical modeling, an earlier form of machine learning. Later, he taught himself techniques such as machine learning and natural language processing.

With six law professors using computational methods in their research, Columbia Law has become a national leader in using data science to inform its research and teaching. Talley credits DSI with helping the law school, and the broader University community, to conduct more data-intensive research.

“At the law school, Dave Pozen, Joshua Mitts, Justin McCrary, Talia Gillis, Ben Liebman, and I all use machine learning in our research,” Talley says. “Most law schools would be lucky to have one or two professors who use machine learning and other data techniques.”

Talley is an active member of the DSI’s Center for Cybersecurity and is part of a team that received a DSI Seed Fund grant to study the engineering and legal aspects of self-driving cars. He belongs to a working group on computational social sciences, forming a working group on data science education. He co-teaches a machine learning course at Columbia Law with Tamar Mitts, who is also a DSI member.

Talley said DSI provides a valuable service to the University by encouraging researchers to collaborate across Columbia. They come from different disciplines, but are united in the need to use advanced data science techniques to solve pressing research problems.

“Over the last century, most professors became more and more specialized, with a tendency to remain siloed from other scholars,” Talley said. “The Data Science Institute is helping to change that by drawing Columbia researchers to truly work together on interdisciplinary research. Data science is becoming the new lingua franca of the academy, and I’m delighted to be a member of DSI.”

— Robert Florida, Data Science Institute

About rfowler