Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


“Digging Into Data Challenge” Winners Announced

January 4th, 2012 / in big science, research horizons, Research News / by Erwin Gianchandani

Eight international funding agencies award $4.8 million for humanities and social sciences research [image courtesy Digging Into Data Challenge via NSF].Last March, we noted that the National Science Foundation (NSF), together with 7 other international funders, was launching the second round of an international grant competition designed to spur cutting-edge research in the humanities and social sciences. Called Digging Into Data, the challenge specifically sought to promote large-scale, international and interdisciplinary analysis of large data sets in these fields. Yesterday, 14 winners representing the U.S., Canada, Netherlands, and U.K., were announced, and together they will receive nearly $5 million in grants “to investigate how data processing, analysis, and transmission techniques can be applied to ‘big data’ to change the nature of humanities and social sciences research.”

According to the NSF press release, the winning projects span a variety of topics, including using information retrieval techniques to investigate changes in Western music; high-resolution medical imaging scanning to study Egyptian mummies; data-mining to shed light on the impact of economic opportunity and spatial mobility on social structure; and natural language processing to analyze large bodies of textual materials to study human rights abuses.

Take a look at all the projects and their PIs after the jump…

Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research

 

(Principal Investigators: Cassidy R. Sugimoto, Ying Ding, Staša Milojeviæ, Indiana University, Bloomington, NSF; Mike Thelwall, University of Wolverhampton, AHRC/ESRC/JISC; Vincent Larivière, Université de Montréal, SSHRC.)

 

This project will examine topic lifecycles across heterogeneous corpora, including not only scholarly and scientific literature, but also social networks, blogs and other materials. While the growth of large-scale datasets has enabled examination within scientific datasets, there is little research that looks across datasets. The team will analyze the importance of various scholarly activities for creating, sustaining and propelling new knowledge; compare and triangulate the results of topic analysis methods; and develop transparent and accessible tools. This work should identify which scholarly activities are indicative of emerging areas and identify datasets that should no longer be marginalized, but built into understandings and measurements of scholarship.

 

ChartEx

 

(Principal Investigators: Robert C. Stacey, University of Washington, IMLS; Arno Knobbe, Leiden University, NWO; Sarah Rees Jones, University of York, AHRC/ESRC/JISC; Michael Gervers, University of Toronto, SSHRC. Additional participating institutions: University of Brighton, Columbia University.)

 

This project will develop new ways of exploring the full text content of digital historical records. The project will demonstrate its approach using medieval charters which survive in abundance from the 12th to the 16th centuries and are one of the richest sources for studying the lives of people in the past.

 

Digging Into Connected Repositories (DiggiCORE)

 

(Principal Investigators: Andreas Juffinger, The European Library Office, NWO; Zdenek Zdrahal, The Open University, AHRC/ESRC/JISC.)

 

This project will analyze a vast set of Open Access research publications using Natural Language Processing and social network analysis methods to identify patterns in the behavior of research communities, to recognize trends in research disciplines, to learn new insights about the citation behaviors of researchers and to discover features that distinguish papers with high impact. This will enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher’s impact.

 

Digging by Debating

 

(Principal Investigators: Colin Allen and Katy Börner, Indiana University, Bloomington, NEH; Andrew Ravenscroft, University of East London, Chris Reed, University of Dundee, and David Bourget, University of London, AHRC/ESRC/JISC.)

 

A project to develop and implement a multi-scale workbench, called “InterDebates”, with the goal of digging into data provided by hundreds of thousands, eventually millions, of digitized books, bibliographic databases of journal articles and comprehensive reference works written by experts. The team’s hypotheses are: that detailed and identifiable arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and, that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better-informed critical debates among students and the general public.

 

Digging Into Human Rights Violations: Anaphora Resolution and Emergent Witnesses

 

(Principal Investigators: Ben Miller, Georgia State University, NSF; Lu Xiao, University of Western Ontario, SSHRC. Additional participating institutions: University of North Florida.)

 

This project will develop an automated reader for large text archives of human rights abuses that will reconstruct stories from fragments scattered across a collection, and an interface for navigating those stories.  By improving on anaphora resolution techniques in Natural Language Processing for the connection of pronouns to specific nouns, this system will help researchers and courts reveal witnesses and patterns contained in their own collections.

 

Digging Into Metadata: Enhancing Social Science and Humanities Research

 

(Principal Investigators: Mick Khoo, Drexel University, IMLS; Diana Massam, University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: University of Glamorgan.)

 

The project will automatically generate new forms of metadata tags from existing metadata records and associated resources that will support discovery across multiple repositories.  The project will utilize four repositories that vary in size, domain, metadata creation method and workflow, and quality. PERTAINS, a tool developed by one of the partner schools, will be used to analyze the metadata records in each repository and then to generate Dewey Decimal Classification-based tags. Clustering algorithms will be used to generate an index of similarity and match between resources in different repositories.  After conducting a search, the user will retrieve a list of resources from the different collections that have been tagged in similar ways. Visualization techniques will be used to display the results in ways that enhance the research process.

 

Electronic Locator of Vertical Interval Successions (ELVIS): The First Large Data-Driven Research Project on Musical Style

 

(Principal Investigators: Michael Scott Cuthbert, Massachusetts Institute of Technology, NEH; Frauke Jürgensen, University of Aberdeen, AHRC/ESRC/JISC; Julie E. Cumming, McGill University, SSHRC. Additional participating institutions: Yale University.)

 

A project to study changes in Western musical style from 1300 to 1900, using the digitized collections of several large music repositories. The team notes that in order to understand style change in Western polyphonic music we need to be able to describe acceptable vertical sonorities (chords) and melodic motions in each period, and how they change over time. The project aims to do this for European polyphony from 1300 to 1900, using advanced music information retrieval techniques to study highly contrasting kinds of music that are nevertheless unified by common concepts of tonality, consonance vs. dissonance, and voice leading.

 

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic

 

(Edward T. Ewing, Bernice L. Hausman, Bruce Pencek, and Narendran Ramakrishnan, Virginia Polytechnic Institute & State University, NEH; Gunther Eysenbach, University of Toronto, SSHRC.)

 

This project seeks to harness the power of data mining techniques with the interpretive analytics of the humanities and social sciences to understand how newspapers shaped public opinion and represented authoritative knowledge during this deadly pandemic. This project makes use of the more than 100 newspaper titles for 1918 available from Chronicling America at the United States Library of Congress and the Peel’s Prairie Provinces collection at the University of Alberta Library. The application of algorithmic techniques enables the domain expert to systematically explore a broad repository of data and identify qualitative features of the pandemic in the small scale as well as the genealogy of information flow in the large scale. This research can provide methods for understanding the spread of information and the flow of disease in other societies facing the threat of pandemics.

 

Imagery Lenses for Visualizing Text Corpora

 

(Principal Investigators: Katharine Coles, University of Utah, NEH; Min Chen, University of Oxford, AHRC/ESRC/JISC.)

 

A project to explore new visualization techniques for use in large scale linguistic and literary corpora using the collections of the British National Corpus and various smaller archives of poetry. The team will investigate whether or not advanced visualization techniques can provide an interface that enables humanities researchers to use their domain knowledge dynamically, while using the computational capability of computers. In particular, can data visualization help users make new observations and generate new hypotheses? The aim of this project is to answer the above methodological research question, and to create a set of new visualization tools for future scholarly research.

 

IMPACT Radiological Mummy Database

 

(Principal Investigators: Randall Thompson, Saint Luke’s Mid America Heart Institute, NEH; Andrew Nelson, University of Western Ontario, SSHRC. Additional participating institutions: Al Azhar Medical School, Cairo, Quinnipiac University, Canadian Museum of Civilization, University of Southern California, University of California, San Diego, Mount Sinai School of Medicine, South Coast Radiological Medical Group, Newport Diagnostic Center, University of California, Irvine, Wisconsin Heart Hospital.)

 

This project is designed to provide mummy and medical researchers with a large-scale comparative database of medical imaging of mummified human remains. This departure from a case-study model for mummy studies will drive the field towards a large-scale comparative and epidemiological paradigm. The Canadian team will be investigating the evisceration and excerebration components of the Egyptian mummification tradition, and the US teams will apply the database to a greatly expanded study of atherosclerosis in ancient Egyptian mummies, as part of the IMPACT Ancient Health Research Group, and to the refinement of a novel system of diagnosis by consensus for mummified remains.

 

Integrated Social History Environment for Research (ISHER) – Digging Into Social Unrest

 

(Principal Investigators: Dan Roth, University of Illinois, Urbana-Champaign, NSF; Antal van den Bosch, Tilburg University, NWO; Sophia Ananiadou, The University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: International Institute of Social History.)

 

This project will develop an integrated environment using sophisticated text mining tools to facilitate knowledge discovery in social history research. It will provide social historians and social scientists with the means to detect and associate events, trends, people, organizations and other entities of specific interest to social historians.

 

Integrating Data Mining and Data Management Technologies for Scholarly Inquiry

 

(Principal Investigators: Ray R. Larson, University of California, Berkeley and Richard Marciano, University of North Carolina at Chapel Hill, IMLS; Paul B. Watry, University of Liverpool, AHRC/ESRC/JISC.  Additional participating institutions: Internet Archive, JSTOR.)

 

This project will integrate large-scale collections including JSTOR and the books collections of the Internet Archive stored and managed in a distributed preservation environment. It will also incorporate text mining and Natural Language Processing software capable of generating dynamic links to related resources discussing the same persons, places, and events. In this 17-month project we go beyond basic analysis by providing a prototype system developed to provide expert system support to scholars in their work.

 

Mining Microdata: Economic Opportunity and Spatial Mobility in Britain, Canada and the United States, 1850-1911

 

(Principal Investigators: Evan Roberts, University of Minnesota, NSF; Kevin Schürer, University of Leicester, AHRC/ESRC/JISC; Kris E. Inwood, University of Guelph, SSHRC. Additional participating institutions: University of Alberta, Université de Montréal, University of Essex.)

 

This project will make use of novel data-mining technology to exploit one of the largest population databases in the world, a vast collection of harmonized 19th and early 20th century census microdata from Britain, Canada, and the United States originally digitized for genealogical research. The goal is to shed light on the impact of economic opportunity and spatial mobility on social structure in Europe and North America.

 

Trading Consequences

 

(Principal Investigators: Ewan Klein, University of Edinburgh, AHRC/ESRC/JISC; Colin M. Coates, York University, SSHRC. Additional participating institutions: University of St Andrews.)

 

This project will examine the economic and environmental consequences of commodity trading during the nineteenth century. The project team will be using information extraction techniques to study large corpora of digitized documents from the nineteenth century. This innovative digital resource will allow historians to discover novel patterns and to explore new hypotheses, both through structured query and through a variety of visualization tools.

Besides NSF, the other funders included the Arts & Humanities Research Council (AHRC), United Kingdom; the Economic & Social Research Council (ESRC), United Kingdom; the Institute of Museum and Library Services, Washington, DC; the Joint Information Systems Committee (JISC), United Kingdom; the National Endowment for the Humanities (NEH), Washington, DC; the Netherlands Organization for Scientific Research (NWO); and the Social Sciences and Humanities Research Council (SSHRC), Canada.

To learn more about the Digging Into Data Challenge, visit the competition’s official website.

Interestingly, the Digging Into Data Challenge dovetails nicely with key themes in the Mosaic Report that NSF published in early December — which emphasizes “an interdisciplinary, data-intensive, and collaborative vision for the future of SBE research” that necessitates new partnerships and synergies between social scientists and computer scientists.

(Contributed by Erwin Gianchandani, CCC Director)

“Digging Into Data Challenge” Winners Announced