Prev: O10.1 Next: O10.3

O10.2: Zecevic, Petar
Petar Zecevic (University of Washington DIRAC Institute, University of Zagreb)
Colin T. Slater (University of Washington)
Mario Juric (University of Washington)
Sven Loncaric (University of Zagreb)



Time: Tue 09.45 - 10.00
Theme: Databases and Archives: Challenges and Solutions in the Big Data Era
Title: AXS: Making end-user petascale analyses possible, scalable, and usable

We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a state-of-the-art industry-standard engine for big data processing. In the age when the most challenging questions of the day demand repeated, complex processing of large information-rich tabular datasets, scalable and stable tools that are easy to use by domain practitioners are crucial. Building on capabilities present in Spark, AXS enables querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statements. AXS supports complex analysis workflows with astronomy-specific operations such as spatial selection or on-line cross-matching. Special attention has been given to usability, from conda packaging to enabling ready-to-use cloud deployments. AXS is regularly used within the University of Washington's DIRAC Institute, enabling the analysis of ZTF (Zwicky Transient Facility) and other datasets. As an example, AXS is able to cross-match Gaia DR2 (1.8 billion rows) and SDSS (800 million rows) in 2 minutes, with the data of interest (photometry) being passed to Python routines for further processing. Here, we will present current AXS capabilities, give an overview of future plans, and discuss some implications to analysis of LSST and similarly sized datasets. The long-term goal of AXS is to enable petascale catalog and stream analyses by individual researchers and groups.

Link to PDF (may not be available yet): O10-2.pdf