Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


LSST Science Requirements

June 17th, 2008 / in big science / by Peter Lee

NSF has an account for Major Research Equipment and Facilities Construction (MREFC), to support the development of very large research instruments. Typically, the goal of these instruments, which may cost hundreds of millions of dollars to build and tens of millions of dollars annually to operate, is to find answers to some of the most fundamental questions in science today.

For example, LIGO (Laser Interferometer Gravitational-wave Observatory) is designed to detect ripples in space-time caused by changes in very large masses (e.g., a star exploding). Such observations, if made successfully, would finally confirm Einstein’s prediction of the existence of gravitational waves. LIGO has a construction cost of about $300M and annual operating costs of more than $30M. Justifying the construction of such an instrument requires an exceptionally compelling science case and a disciplined approach to construction management. Just as important is the need for the scientific research community (astrophysics, in the case of LIGO) to show deep understanding and broad support for the investment.

Is the computing research community in need of such large-scale instrumentation? For the CCC, this question has been a major topic of discussion, instigated initially by the development of the GENI (Global Environment for Network Innovations) concept. The current concept of GENI involves a global experimental network that would support Internet-scale experimentation with new transport technologies, networking protocols, and security mechanisms. GENI, if successful, would not only answer fundamental scientific questions about the behavior of global-scale networks, but also provide design guidance for the future Internet.

Time will tell whether GENI or other computing research concepts will develop into viable MREFC candidate projects. But in the mean time, it is instructive to study how other research communities develop the broad support that is needed in order to make a case to the NSF and the National Science Board.

One of the cornerstones of the whole process is a document that lays out the science case, or science requirements, for the instrument. It is instructive, then, to take some time to study such documents. One of the most recent successful MREFC proposals is for the LSST (Large Synoptic Survey Telescope), a new telescope projected to have a construction cost between $250M and $350M and scheduled to become operational in 2014. In my view, for anyone in the computing research community, it is well worth the time to study the LSST Science Requirements Document. It provides a window into the kind of audacious yet focused investigation that is used to justify such huge science investments. LSST also involves significant data management and computing problems which may be of strong relevance to computing research.

A brief excerpt from one of the founding papers on LSST explains the goal of the instrument as follows:

We describe the most ambitious survey currently planned in the visible band, the Large Synoptic Survey Telescope (LSST). The LSST design is driven by four main science themes: probing dark energy and dark matter, taking an inventory of the Solar System, exploring the transient optical sky, and mapping the Milky Way. LSST will be a large, wide-field ground-based system designed to obtain multiple images covering the sky that is visible from Cerro Pachon in Northern Chile. The current baseline design, with an 8.4m (6.5m effective) primary mirror, a 9.6 sq. deg. field of view, and a 3.2 Gigapixel camera, will allow about 10,000 sq.deg. of sky to be covered using pairs of 15-second exposures in two photometric bands every three nights on average, with typical 5-sigma depth for point sources of r=24.5. The system is designed to yield high image quality as well as superb astrometric and photometric accuracy. … These data will result in databases including 10 billion galaxies and a similar number of stars, and will serve the majority of science programs.

The document lays out the science case and from this derives the requirements on the instrument. The science case is built around four themes. The first theme, on dark matter and dark energy, directly addresses what the National Academies recently identified as one of the “most important scientific questions of our time.” Two other themes, to “explore the transient optical sky” and to “map the Milky Way”, speak to general facilities needs of the astronomy research community. And the fourth theme, on “taking an inventory of the Solar System”, addresses the practical problem of keeping track of asteroids that might “ultimately strike the Earth’s surface.”

The small number of themes and their clear, concise explanation (each one is described in about a single page of text) makes it possible to derive a clear set of requirements on the telescope. Importantly, it also gives a clear basis for disseminating the concepts to the research community, thereby encouraging more informed debate and consensus-building.

In the LSST Science Requirements Document, the high-level requirements are given in terms of a series of design specifications, with both minimum and stretch goals in each case. The level of specificity, particularly in a rather short (about 30 pages) document, is impressive.

Interestingly, one area where the document falls short is in the final section on “Data Processing and Management Requirements”. This is left essentially as a stub, for a yet-to-be-published separate document. The LSST is projected to produce about 100TB/week of image data, and the design requirement is for “snapshots” of the data to be fixed and published annually, to support repeatability of experiments. Yet to be specified is the manner in which up-to-the-minute data is disseminated, organized, and accessed. Certainly these are issues that computer scientists will be interested in and are likely to be well-equipped to answer. We should all look forward to contributing to this part of the LSST effort.

So what does this all say about efforts such as GENI or other future computing-related instrumentation proposals? For one thing, it is probably important to have a set of crisply stated science questions. What is GENI’s analogue to “constraining dark energy and dark matter”? Writing the science case with a level of focus and simplicity that the entire computing research community can understand and accept is also crucial. And, finally, it must be possible to derive, fairly directly, at least a high-level set of design requirements from the statement of the science case.

Computing research is both important and wonderful because it combines fundamental science, hard-core engineering, and practically useful technology all together to an extent that is unique in academic research today. Whether we will find compelling needs for MREFC-scale instrumentation is still an open question, but I have no doubt that if/when we do, that a successful case can be made to fund it.

LSST Science Requirements

3 comments

  1. Ed Lazowska says:

    The database community is actively working with LSST scientists on a new approach to databases for science. Mike Stonebraker and David DeWitt are coordinating this effort from the database side; folks at SLAC (Jacek Becla) and others are coordinating from the LSST side. The Computing Community Consortium is likely to play a role.

    Information on the first workshop is here:

    http://www-conf.slac.stanford.edu/xldb07/

    Information on the second workshop is here:

    http://xldb.slac.stanford.edu/display/XLDB/xldb2