‘Habanero’ Spices Up High-Performance Computing at Columbia

Deep in the subbasement of the Jerome L. Greene Science Center, the first building on Columbia’s new Manhattanville campus, there is enough computing power to perform 269 trillion mathematical calculations per second.

The high-performance computing cluster, nicknamed Habanero (like the chili pepper), is the latest system to help Columbia researchers in any discipline solve increasingly complex problems.

“Computation has become basically a part of everyone’s research, it doesn’t matter if you’re doing work in a biology lab or analyzing records for history research, you’re collecting data,” said Chris Marianetti, chair of the faculty committee that oversees Habanero and other shared research computing and an associate professor in the departments of material science, applied physics and applied mathematics. “Today, you need an on-campus resource for the rapid development of data-heavy research.”

The Mortimer B. Zuckerman Mind Brain Behavior Institute is the newest addition to the more than 30 research groups and departments that have bought nodes, or shares, in the computer cluster.

The idea for a shared computing cluster originated in 2007, when researchers in several disciplines needed a more powerful machine than those available at Columbia. The Office of Research Initiatives proposed a shared computing system for which researchers would pool resources to meet the not inconsiderable costs.

“It’s a very social arrangement — quite unusual and fabulously effective,” said Michael Purdy, the University’s executive vice president for research.

The first cluster was installed on the Morningside Heights campus in 2009, in the basement of Uris Hall. The astronomy and statistics departments shared it with the Stockwell Laboratory, whose principal investigator is Brent Stockwell, a professor of biological sciences and chemistry, who researches cell processes.

At the time, nobody knew how widespread data-intensive research would become. “I wish I could say I was prescient,” Victoria Hamilton, who directs the Office of Research Initiatives, said. “Now we have faculty recruits coming in and asking us, ‘So tell me about your high-performance computer.’”

In 2013, a speedier cluster, called Yeti, was phased in. Yeti’s 24 users included the Columbia School of Journalism, the department of psychology and the Social Science Computing Committee.

Habanero, which went live in November, offers double the processing power of its predecessor, with 5,528 processing cores, and it stores more than 400 terabytes of data. A standard MacBook Air, by contrast, has two processing cores and stores 500 gigabytes.

The cluster is faculty-governed by the cross-disciplinary Shared Research Computing Policy Advisory Committee, which “oversees everything, from the hardware procurement process to the policy on cluster-sharing,” said Gaspare LoDuca, vice president and chief information officer for Columbia University Information Technology (CUIT). “This ensures we’re providing a resource that will benefit a broad spectrum of the Columbia research community.”

There have been other recent enhancements to research computing, such as establishing a relationship with Amazon for access to the Cloud and producing a handbook for faculty and students who use the computing cluster for educational purposes. Marianetti, the committee chair since July, succeeded Kathryn Johnston, professor and chair of the department of astronomy.

Habanero allows researchers like Tian Zheng, associate professor of statistics, and her students to run data sets quickly on a local resource. This helps them to be confident in their numbers before approaching research partners like the National Science Foundation.

“It encourages researchers to be more creative and it accelerates the work,” said Zheng, whose work relies on running simulations with data sets many times over. Sharing is also efficient, she added. “Nobody uses a server 24 hours a day seven days a week.”

Students and faculty unaffiliated with Habanero can use it for classes and research through a free educational tier.

Such a powerful computer cluster is a strong selling point for attracting new faculty. For a few thousand dollars from their research budgets, junior faculty have access to state-of-the-art computing without worrying about technological problems — staff from CUIT perform upkeep.

“It’s very difficult to get grants these days, everyone is looking for an edge,” said Purdy. “If one of our professors can say in their proposal that they can run a model in 43 minutes, while for someone at another university it would take several days, who is going to get funded? We are.”

—Acacia O’Connor, Columbia News

About garen