LSST Data in the Clouds

The Large Synoptic Survey Telescope (LSST) is an upcoming sky survey that aims to conduct a 10 year long survey from which we hope to answer questions about dark matter, dark energy, hazardous asteroids and the formation and structure of the Milky Way. To find these answers LSST images the entire night sky every three nights. It is estimated that in the 10 years of operations LSST will deliver 500 petabytes (PB) of data – largest, to date, released astronomical dataset.

Science catalogs, on which most of the science will be performed, are produced by image reduction pipelines that are a part of LSST’s code base called Science Pipelines. While LSSTs Science Pipelines adopt a set of image processing algorithms and metrics that cover as many science goals as possible, and while the LSST will set aside 10% of their compute power to be shared by the collaboration members, enabling processing of the underlying pixel data by scientists remains a very challenging problem. The largest obstacle to wide-spread data processing is the sheer data volume that will be produced by LSST, which requires large compute infrastructure. If pixel data re/processing were accessible to more astronomers it would undoubtedly repeatability, reproducibility and would, in general, increase the type and quantity of science that can be done with the data.

The tech industry, which has in a lot of cases significantly surpassed LSST’s data volumes, has adopted cloud based solutions because of their ability to scale up and down dependent on the size and complexity of the data. LSST Data Management (DM) commissioned an Amazon Web Services (AWS) Proof of Concept (PoC) group to determine whether a cloud deployment of the LSST codebase is feasible (to measure its performance and determine the cost of cloud-native options).

The first results of this work were presented at the Petabytes to Science conference in Boston where Dino Bektesevic and colleagues from LSST and Amazon demonstrated how the LSST Science Pipelines can be run on Amazon’s cloud, scaling up to thousands of compute cores. The preliminary tests indicate that the cloud definitely has the potential for significant scaling while still remaining affordable. 

Read more here.