Conference Event Invited Contributed Focus Demo/Tutorial Lightning/BOF Break/Posters
All Invited and Contributed Talks are in Salon AB. Tutorials are in Salons A and EFG. BOFs are in Salons AB and G. Lunch is in Salon CDE.
12.00–13.00 | Tutorials Check-in |
13.00–15.00 | Sebastien Derriere All-sky Astronomy with HiPS and MOCs |
See Tutorials Page
13.00–15.00 | Michael Young Creating Astronomical Web Applications From Scratch: Introduction to modern full-stack MEAN development |
See Tutorials Page
15.00–15.30 | Break |
15.30–17.30 | Hendrik Heinl A comprehensive use case scenario of VO standards and protocols |
See Tutorials Page
15.30–17.30 | Ivelina Momcheva Working with the Hubble Space Telescope Public Data on Amazon Web Services |
See Tutorials Page
16.00–18.00 | Registration & Demo Booth/Poster Setup |
18.00–20.00 | Opening Reception |
08.00–08.30 | Registration & Demo Booth/Poster Setup |
07.00–08.30 | Buffet Breakfast |
08.30–08.40 | Opening Remarks |
08.40–09.05 | Amitabh Varshney Astronomy-inspired Visual Computing |
Astronomy has inspired mankind for generations to develop theories, tools, and technologies that have dramatically advanced our knowledge and practice of science. In this talk, I will give an overview of how the field of astronomy has helped advance the field of visual computing. My talk will span the advances in line integral convolutions, ambient occlusion, geometry-aware lighting, and much more. These techniques, with their origins and motivations in astronomy, are changing how we design aircraft, to rational drug design, to computer-generated games and movies.
09.05–09.35 | Brian Kent 3D Data Visualization in Astrophysics |
We present unique methods for rendering astronomical data - 3D galaxy catalogs, planetary maps, data cubes, and simulations. Using tools and languages including Blender, Python, and Google Spatial Media, a user can render their own science results, allowing for further analysis of their data phase space. We aim to put these tools and methods in the hands of students and researchers so that they can bring their own data visualizations to life on different computing platforms.
09.35–09.50 | Christopher Zapart An introduction to FITSWebQL |
The JVO ALMA WebQL web service - available through the JVO ALMA FITS archive - has been upgraded to include legacy data from other telescopes, for example Nobeyama NRO45M in Japan. The updated server software has been renamed FITSWebQL. In addition, a standalone desktop version supporting Linux, macOS and Windows 10 Linux Subsystem (Bash on Windows) is also available for download from http://jvo.nao.ac.jp/~chris/ . The FITSWebQL server enables viewing of even 100GB-large FITS files in a web browser running on a PC with a limited amount of RAM. Users can interactively zoom-in to selected areas of interest with the corresponding frequency spectrum being calculated on the server in near real-time. The client (a browser) is a JavaScript application built on AJAX, WebSockets, HTML5, WebGL and SVG. There are many challenges when providing a web browser-based real-time FITS data cube preview service over high-latency low-bandwidth network connections. The upgraded version tries to overcome the latency issue by predicting user mouse movements with a Kalman Filter in order to speculatively deliver the real-time spectrum data at a point where the user is likely to be looking at. The new version also allows one to view multiple FITS files simultaneously in an RGB composite mode (NRO45M FUGIN only), where each dataset is assigned one RGB channel to form a colour image. Spectra from multiple FITS cubes are shown together too. The talk gives a brief tour of the FITSWebQL main features. We also touch on some of the recent developments, such as an experimental switch from C/C++ to Rust (see https://www.rust-lang.org/) for improved stability, better memory management and fearless concurrency, or attempts to display FITS data cubes in the form of interactive on-demand video streams in a web browser.
09.50–10.05 | Angus Comrie An HDF5 Schema for SKA Scale Image Cube Visualization |
In this paper, we describe work that has been performed to create an HDF5 schema to support the efficient visualization of image data cubes that will result from SKA Phase 1 and precursor observations. The schema has been developed in parallel to a prototype client-server visualization system, intended to serve as a testbed for ideas that will be implemented in replacements for the existing CyberSKA and CASA viewers. Most astronomy image files are currently packaged using the FITS standard, however this has a number of shortcomings for very large images. The HDF5 technology suite provides a data model, file format, API, library, and tools, which are are all open and distributed without charge. This enables structured schemas to be created for different applications. We will show how these can be beneficial to packaging radio astronomy (RA) data. In particular, our interest is in supporting fast interactive visualization of data cubes that will be produced by the SKA telescope. Existing HDF5 schemas developed for RA data were unable to meet our requirements. The LOFAR HDF5 schema did not meet performance requirements, due to the approach of storing each 2D image plane in a separate group. The HDFITS schema serves as a starting point for an HDF5 schema that maintains round-trip compatibility with the FITS format, but lacks the additional structures required for pre-calculated and cached datasets. Therefore, we have created a new schema designed to suite our application, though this may be advantageous for other processing and analysis applications. The schema is similar to that of HDFITS, but extensions have been added to support a number of features required for efficient visualization of large data sets. We will discuss these extensions and provide details on performance improvements with commonly used access patterns. We will also describe real-world performance when used with our prototype visualization system.
10.05–10.20 | Emanuel Ramirez Analysis of Astronomical Data using VR: the Gaia catalogue in 3D |
Since 2016, the ESAC Science Data Centre have been working on a number of Virtual Reality projects to visualise Gaia data in 3D. The Gaia mission is providing unprecedented astrometric measurements of more than 1 billion stars. Using these measurements, we can estimate the distance to these stars and therefore project their 3D positions in the Galaxy. A new application to analyse Gaia DR2 data will be publicly released for Virtual Reality devices during 2018. In this presentation we will give a demo of the latest version of the Oculus Rift application and will show specific use cases to analyse Gaia DR2 data as well as a demonstration on how can Virtual Reality be integrated into a data analysis workflow. We will also show how can new input techniques such as hand-tracking can bring new levels of freedom in how we interact with data.
10.20–11.15 | Break/ Poster Session |
10.45–11.15 | Alexandar Mechev Building LOFAR As A Service: Processing Petabytes with just a click |
The LOFAR Radio Telescope produces PetaBytes of data each year. Processing such volumes is not possible at clusters provided by academic institutions, and thus needs to be launched and managed at a High Throughput Cluster. With increasing complexity of LOFAR workflows, building and maintaining new scientific workflows on a distributed architecture becomes prohibitively time consuming. To make pipeline development and deployment easy and data processing fast, we integrate cluster middleware and LOFAR software with a leading workflow orchestration software, Airflow. The result is a flexible application that can launch and manage LOFAR processing. With Airflow, we can easily create a service for LOFAR users to process their data transparently.
11.15–11.45 | James Bosch An Overview of the LSST Image Processing Pipelines |
In this talk, I'll walk through LSST's Data Release and Alert Production pipelines, highlighting how we produce various important datasets. I'll also call attention to the algorithms where LSST will need to push the state of the art or operate qualitatively differently from previous surveys.
11.45–12.00 | Amanda Kepley Auto-multithresh: A General Purpose Automated Masking Algorithm for Clean |
Generating images from radio interferometer data requires deconvolving the point spread function of the array from the initial image. This process is commonly done via the clean algorithm, which iteratively models the observed emission. Because this algorithm has many degrees of freedom, producing an optimal science image typically requires the scientist to manually mask regions of real emission while cleaning. This process is a major hurdle for the creation of the automated imaging pipelines necessary to process the high data rates produced by current and future interferometers like ALMA, the JVLA, and the ngVLA. In this talk, we present a general purpose masking algorithm called ‘auto-multithresh’ that automatically masks emission during the cleaning process. This algorithm was initially implemented within the tclean task in CASA 5.1. The tclean implementation significant performance improvements in CASA 5.3. The ‘auto-multithresh’ algorithm is in production as part of the ALMA Cycle 5 and 6 imaging pipelines. It has also been shown to work with data from telescopes like the VLA and ATCA. We describe how this algorithm works, provide a variety of examples demonstrating a success of the algorithm, and discuss the performance of the algorithm. Finally, we close with some future directions for producing science ready data products that build on this algorithm.
12.00–12.15 | Daniele Tavagnacco Performance-related aspects in the Big Data Astronomy Era: architects in software optimization |
In the last decades the amount of data collected by Astronomical Instruments and the evolution of computational demands have grown exponentially. Today it is not possible to obtain scientific results without prodigious amounts of computation. For this reason, the software performance plays a key role in modern Astronomy data analysis. Scientists tend to write code with the only goal of implementing the algorithm in order to achieve a solution. Code modifications to gain better performance always come later. However, as computing architectures evolve to match the performance that is demanded, the coding task has to encompass the exploitation of the architecture design, the single-processor performance and parallelization. To facilitate this task, programming languages are progressing and introducing new features to fully make use of the hardware architecture. Designing a software that meets performance, memory efficiency, maintainability, and scalability requirements is a complex task that should be addressed by the software architect. The complexity stems from the existence of multiple alternative solutions for the same requirements, which make tradeoffs inevitable. In this contribution we will present part of the activity done at the Italian Science Data Center for the ESA’s cosmological space mission Euclid which regards the software performance optimization. In particular, considering the programming languages selected for the development of the Euclid scientific pipelines, we will present some C++ and Python examples focusing on the main aspects of human contribution in the optimization of the code from the performance, memory efficiency and maintainability point of view.
12.15–12.30 | Lightning Talks |
12.30–14.00 | Buffet Lunch |
14.00–14.15 | Nadia Dencheva GWCS - A General Approach to Astronomical World Coordinates |
GWCS is a package for managing the World Coordinate System (WCS) of astronomical data. It takes a general approach to the problem of expressing arbitrary transformations by supporting a data model which includes the entire transformation pipeline from input coordinates (detector by default) to world coordinates (standard celestial coordinates or physical quantities). Transformations from the detector to a standard coordinate system are combined in a way which allows for easy manipulation of individual components. The framework handles discontinuous models (e.g. IFU data) and allows quantities that affect transforms to be treated as input coordinates (e.g. spectral order). It provides flexibility by allowing access to intermediate coordinate frames. The WCS object is serialized to a file using the language independent Advanced Scientific Data Format (ASDF).Alternatively the ASDF object can be encapsulated in a FITS extension. The package is written in python and is based on astropy. It is easy to extend by adding new models and coordinate systems.
14.15–14.30 | Cheuk Yin Lam Data-Driven Pixelisation with Voronoi Tessellation |
In modern Astrophysics, Voronoi Tessellation is a rarely used as a pixelisation scheme. While it exists, it is almost exclusively used in signal enhancements and simulations. In Observational Astronomy, with Gaia, ZTF, DES etc. data becoming available, LSST and Euclid coming online in the next decade, this branch of science is becoming more and more data-driven. HEALPix, HTM and Q3C offer excellent ways to pixelise the celestial sphere, the implementations completely separate the background information from the signal. There are excellent use cases to have them independent from each other, but there are also cases when this becomes a burden in computation when we have to process more pixels than necessary or require post-hoc calculations to group pixels at different resolution levels to form larger segments. With Voronoi Tessellation, it can generate a one-to-one mapping of data points to Voronoi cells where anywhere inside the cell is the closest to the “governing” data point. We illustrate the application of Voronoi Tessellation in a set of magnitude and proper motion-limited data how it can simplify the survey properties of the 3pi Steradian Survey from the Pan-STARRS 1, where the footprint area is imaged by the 60 CCDs at ~10^5 pointings over 3.5 years.
14.30–15.00 | Raymond Plante The BagIt Packaging Standard for Interoperability and Preservation |
BagIt is a simple, self-describing format for packaging related data files together which is gaining traction across many research data systems and research fields. One of its great advantages is in how it allows a community to define and document a profile on that standard--that is, additional requirements on top of the BagIt standard that speaks to the needs of that community. In this presentation, I will summarize the key features of the standard, highlight some important profiles that have been defined by communities, and talk about how this standard is being used as part of the NIST Public Data Repository. I will compare and contrast the use of BagIt for enabling interoperability (e.g. for transfering data between two systems) and its use for preseravation. I will then give an overview of the NIST BagIt profile for preservation as well as introduce a general-purpose MultiBag profile which addresses issues of evolving data and scaling to large datasets.
15.00–16.00 | Break/ Poster Session |
15.30–16.00 | Michael Raddick SciServer: Collabroative data-driven science |
The SciServer team is pleased to announce its final production system of SciServer, offered free to the scientific community for collaborative scientific research and education. SciServer is an online environment for working with scientific big data, and specifically datasets hosted within our ecosystem. Researchers, educators, students, and citizen scientists can create a free account to get 10 GB of file storage space and access to a virtual machine computing environment. Users can run Python, R, or Matlab scripts in that environment through Jupyter notebooks. Scripts can be run either in Interactive mode, which displays results within the notebook, or in Batch mode, which writes results to the user’s personal database and/or filesystem. SciServer hosts a number of datasets from various science domains; within astronomy, it features all data releases of the Sloan Digital Sky Survey (SDSS), as well as other datasets from GALEX, Gaia, and other projects. The SciServer system also incorporates the popular SkyServer, CasJobs, SciDrive, and SkyQuery astronomy research websites, meaning that SciServer Compute offers APIs to read and write from these resources. All these features ensure that computation stays close to data, resulting in faster computation therefore faster science. SciServer also allows users to create and manage scientific collaborations around data and analysis resources. You can create groups and invite collaborators from around the world. You and your collaborators can use these groups as workspaces to share datasets, scripts, and plots, leading to more efficient collaboration. We will present a highly interactive demo of SciServer, highlighting the latest features and with an emphasis on science use cases. Please bring your questions – let us know what you have not yet been able to do with SciServer, and we will help you do it. We are actively looking for new collaborations, feature requests, and scientific use cases. Please let us know how we can help you do your science!
16.00–16.15 | Rosa Diaz Adding Science Validation to the JWST Calibration Pipeline. |
The JWST Calibration Pipeline is a set of steps separated into three main stages; looking to provide the best calibration for all JWST instruments, observing modes, and a wide range of science cases. Careful scientific validation and verification are necessary to determine consistency and quality of the data produced by the calibration pipeline. With this goal in mind, the scientist at STScI have supported validation testing for most of the major builds. Our experience with HST and the realization of the effort it would take to consistently and reliably test after each build, even after launch, motivated us to think about streamlining the process. We started building unit tests to verify that the calibration pipeline produced the expected results. However, the need for a more in-depth scientific validation of the wide range of science cases that will be observed by JWST requires a different strategy; one that not only validates the accuracy of the data but that also provides with reliable metrics for all science cases. We are working on defining a more complete set of science validation tests cases and simulated data that can be integrated within an automated building and testing framework; allowing full science verification and validation of the calibration pipeline in short time scales as well as quality assurance of the calibration products. Archiving this goal has been an arduous task, not only limited by the state of development of the software and the availability of accurate data for testing, but also by the diversity of ideas coming from a large group of scientist from different teams, resources, and conflicting schedules. In this talk, I will present the integration of the science validation testing framework within the build process. I will also discuss the challenges we faced to make this possible, the steps we took, and how this work will help us support the development of the JWST Calibration Pipeline after launch.
16.15–16.30 | Gilles Landais Quality assurance in the ingestion of data into the CDS VizieR catalogue and data services |
VizieR is a reference service provided by the CDS for astronomical catalogues and tables published in academic journals, and also for associated data. Quality assurance is a key factor that guides the operations, development and maintenance of the data ingestion procedures. The catalogue ingestion pipeline involves a number of validation steps, which must be implemented with high efficiency to process the ~1200 catalogues per year from the major astronomy journals. These processes involve an integrated teams of software engineers, specialised data librarians (documentalists) and astronomers, and various levels of interaction with the original authors and data providers. Procedures for the ingestion of associated data have recently been improved with semi-automatic mapping of metadata into the IVOA ObsCore standard, with an interactive tool to help authors submit their data (images, spectra, time series etc.). We present an overview of the quality assurance procedures in place for the operation of the VizieR pipelines, and identify the future challenges of increasing volumes and complexity of data. We highlight the lessons learned from implementing the FITS metadata mapping tools for authors and data providers. We show how the quality assurance is an essential part of making the VizieR data comply with FAIR (Findable, Accessible, Interoperable and Re-useable) principles, and the necessity of quality assurance in for the operational aspects of supporting more than 300,000 VizieR queries per day through multiple interactive and programmatic interfaces.
16.30–16.45 | François Bonnarel ProvTAP: A TAP service for providing IVOA provenance metadata |
In the astronomical Virtual Observatory, provenance metadata provide information on the processing history of the data. This is important to assert quality and truthfulness of the data, and to be potentially able to replay some of the processing steps. The ProvTAP specification is a recently proposed IVOA Working draft defining how to serve IVOA provenance metadata via TAP, the Table Acces Protocol, which allows to query table and catalog services via the Astronomical Data Query Language (ADQL). ProvTAP services should allow finding out all activities, entities, or agents that fulfil certain conditions. Several implementations and developments will be presented. The CDS ProvTAP service describes provenance metadata for HiPS generation. The CTA ProvTAP service will provide access to metadata describing the processing of CTA event lists. GAVO prototyped specialised query functions that could facilitate accomplishing the goals of ProvTAP users.
16.45–17.00 | Lightning Talks |
17.15–18.15 | Jessica Mink Data Formats |
See BoF Page
17.15–18.15 | Eric Tollerud Open Source/Development Software Projects and Large Organizations/Missions: Recommendations and Challenges |
See BoF Page
18.15–19.15 | August Muench Data Citation: from Archives to Science Platforms |
See BoF Page
18.15–19.15 | Kai Polsterer Beginners Guide to Machine Learning in Astronomy |
See BoF Page
07.00–08.30 | Buffet Breakfast |
08.55–09.00 | Morning Announcements |
09.00–09.30 | Felix Stoehr Astronomical archives: Serving up the Universe |
This talk first briefly reviews some of the current context of storing and making astronomical data discoverable and available. We then discuss the challenges ahead and look at the future data-landscape when the next generation of large telescopes will be online, at the next frontier in science archives where also the content of the observations will be described, at the role machine-learning can play as well as at some general aspects of the user-experience for astronomers.
09.30–09.45 | Clara Brasseur AstroCut: A cutout service for TESS full-frame image sets |
The Transiting Exoplanet Survey Satellite (TESS) launched this past March and will have its first data release near the end of this year. Like that of the Kepler mission, the TESS data pipeline will return a variety of data products, from light curves and target pixel files (TPFs) to large full frame images (FFIs). Unlike Kepler, which took FFIs relatively infrequently, TESS will be taking FFIs every half hour, making them a large and incredibly valuable scientific dataset. As part of the Mikulski Archive for Space Telescope's (MAST) mission to provide high quality access to astronomical datasets, MAST is building an image cutout service for TESS FFI images. Users can request image cutouts in the form of TESS pipeline compatible TPFs without needing to download the entire set of images (750 GB). For users who wish to have more direct control or who want to cutout every single star in the sky, the cutout software (python package) is publicly available and installable for local use. In this talk we will present the use and design of this software, in particular how we were able to optimize the cutout step. The main barrier in writing performant TESS FFI cutout software is the number of files that must be opened and read from. To streamline the cutout process we performed a certain amount of one-time work up front, which allows individual cutouts to proceed much more efficiently. The one-time data manipulation work takes an entire sector of FFIs and builds one large (~45 GB) cube file for each camera chip, so that the cutout software need not access several thousand FFIs individually. Additionally we transpose the image cube, putting time on the short axis, thus minimizing the number of seeks per cutout. By creating these data cubes up front we achieved a significant increase in performance. We will show examples of this tool using the currently available simulated TESS data, and discuss use cases for the first data release. We will finish by discussing future directions for this software, such as generalizing it beyond the TESS mission.
09.45–10.00 | Petar Zecevic AXS: Making end-user petascale analyses possible, scalable, and usable |
We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a state-of-the-art industry-standard engine for big data processing. In the age when the most challenging questions of the day demand repeated, complex processing of large information-rich tabular datasets, scalable and stable tools that are easy to use by domain practitioners are crucial. Building on capabilities present in Spark, AXS enables querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statements. AXS supports complex analysis workflows with astronomy-specific operations such as spatial selection or on-line cross-matching. Special attention has been given to usability, from conda packaging to enabling ready-to-use cloud deployments. AXS is regularly used within the University of Washington's DIRAC Institute, enabling the analysis of ZTF (Zwicky Transient Facility) and other datasets. As an example, AXS is able to cross-match Gaia DR2 (1.8 billion rows) and SDSS (800 million rows) in 2 minutes, with the data of interest (photometry) being passed to Python routines for further processing. Here, we will present current AXS capabilities, give an overview of future plans, and discuss some implications to analysis of LSST and similarly sized datasets. The long-term goal of AXS is to enable petascale catalog and stream analyses by individual researchers and groups.
10.00–10.15 | Nicolas Buchschacher No-SQL databases: An efficient way to store and query heterogeneous astronomical data in DACE. |
Data production is growing every day in all domains. Astronomy is particularly concerned with the recent instruments. While SQL databases have proven their performances for decades and still performs in many cases, it is sometimes difficult to store, analyse and combine data produced by different instruments which do not necessarily use the same data model. This is where No-SQL databases can help to solve our requirements: how to efficiently store heterogenous data in a common infrastructure ? SQL database management systems can do a lot of powerful operations like filtering, relation between tables, sub-queries etc. The storage is vertically scalable by adding more rows in the tables but the schema has to be very well defined. In the opposite, No-SQL databases are not restrictive. The scalability is horizontal by adding more shards (nodes) and the different storage engines have been designed to easily modify the structure. This is why it is well suited in the big data era. DACE (Data and Analysis Center for Exoplanets) is a web platform which facilitates data analysis and visualisation for the exoplanet research domain. We are collecting a lot of data from different instruments and we regularly need to adapt our database to accept new data sets with different models. We recently decided to do a major change in our infrastructure after using PostgreSQL to use CASSANDRA for the storage and Apache Solr as an indexer to do sophisticated queries among a huge number of parameters. This recent change accelerated our queries and we are now ready to accept new data sets from futur instruments and combine them with older data to do better science. DACE is funded by the Swiss National Centre of Competence in Research (NCCR) PlanetS, federating the Swiss expertise in exoplanet research.
10.15–11.15 | Break/ Poster Session |
10.45–11.15 | Emmanuel Joliet Visualization in IRSA Services using Firefly |
NASA/IPAC Infrared Science Archive (IRSA) curates the science products of NASA's infrared and submillimeter missions, including many large-area and all-sky surveys. IRSA offers access to digital archives through powerful query engines (including VO-compliant interfaces) and offers unique data analysis and visualization tools. IRSA exploits a re-useable architecture to deploy cost-effective archives, including 2MASS, Spitzer, WISE, Planck, and a large number of highly-used contributed data products from a diverse set of astrophysics projects. Firefly is IPAC's Advanced Astronomy WEB UI Framework. It was open sourced in 2015, hosted at GitHub. Firefly is designed for building a web-based front end to access science archives with advanced data visualization capabilities.The visualization provide user with an integrated experience with brushing and linking capabilities among images, catalogs, and plots. Firefly has been used in many IPAC IRSA applications, in LSST Science Platform Portal, and in NED’s newly released interface. In this focus demo, we will show case many data access interfaces and services provided by IRSA based on Firefly. It will demonstrate the reusability of Firefly in query, data display, and its visualization capabilities, including the newly released features of HiPS images display, MOC overlay, and the interactions between all those visualization components.
11.15–11.45 | Kirk Borne Massive Data Exploration in Astronomy: What Does Cognitive Have To Do With It? |
There has been a tendency for astronomers to avoid unsupervised data exploration, due to the characterization of this approach as a non-scientific fishing expedition. But, a cognitive approach to massive data exploration has the potential to amplify hypothesis formulation and question generation for greater astronomical discovery. The incorporation of contextual data from other wavelengths and other surveys provides the basis for seeing interestingness in the multi-dimensional properties of sources that might otherwise appear uninteresting in a single survey database. Some suggested methods for cognitive exploration will be presented, including computer vision algorithms that are used in robotics to see patterns in the the world, but these can be used to see emergent patterns in the multi-dimensional parameter space of astronomical data.
11.45–12.00 | Ignacio Toledo Data Science =! Software Engineering. Exploring a workflow for ALMA operations. |
In the last few years Data science has emerged as a discipline of its own to address problems where data is usually heterogeneous, complex and abundant. In a nutshell, data science allows to provide answers to situations where a hypothesis can be formulated and later can be either confirmed or rejected following standard scientific methodology using data as raw material. Data science has been called differently depending of the domain (business intelligence, operational management, astroinformatics) and it has been recently in the center of a hype related to artificial intelligence and machine learning. It has been quickly adopted by the digital industry as the tool to distill information of massive operational data sets. Among the many tools data science requires (mathematics, statistics, domain knowledge of the data sets, …), IT infrastructure and software is by far the most visible and there is at present a whole ecosystem available as open source projects. The downside of this is data science is commonly confused with IT and software development, which creates conflicts between engineering- and scientific- mindsets, and leads to wrongly applying software development methodologies to it neglecting the experimental nature of the problem. In summary, creating the data lab becomes more important than answering questions with it. In the domain of ALMA operations, there are many instances that can be identified and described as data science cases or projects ranging from monitoring array elements to understand performances and predict faults for engineering operations to routine monitoring of calibrators for science operations purposes. We have identified already around 30 different initial questions (or data science cases) and found that several of them have been addressed through individual efforts. In parallel, several enabling platforms or frameworks have appear in the ecosystem that provides data scientists with both the “laboratory equipment” to conduct their “experiments” as well as enabling tools for collaboration, versioning control, and deploying results in production with a quick turnaround. This talk aims to summarize the results of our exploration to apply data science workflows to resolve ALMA operations issues, identify suitable platforms that are already in use by the industry, share our experience in addressing specific ALMA operations data cases, and discuss the technical and sociological challenges we encountered along the way.
12.00–12.15 | Takeshi Nakazato New Synthesis Imaging Tool for ALMA based on the Sparse Modeling |
A new imaging tool for radio interferometry has been developed based on the sparse modeling approach. It has been implemented as a Python module operating on Common Astronomy Software Applications (CASA) so that the tool is able to process the data taken by Atacama Large Millimeter/submillimeter Array (ALMA). In order to handle large data of ALMA, the Fast Fourier Transform has been implemented with gridding process. The concept of the sparse modeling for the image reconstruction has been realized with two regularization terms: L1 norm term for the sparsity and Total Squared Variation (TSV) term for the smoothness of the resulting image. Since it is important to adjust the size of the regularization terms appropriately, the cross-validation routine, which is a standard method in statistics, has been implemented. This imaging tool runs even on a standard laptop PC and processes ALMA data within a reasonable time. The interface of the tool is comprehensible to CASA users and the usage is so simple that it consists of mainly three steps to obtain the result: an initialization, a configuration, and a processing. Remarkable feature of the tool is that it produces the solution without human intervention. Furthermore, the solution is robust in the sense that it is less affected by the processing parameters. For the verification of the imaging tool, we have tested it with two extreme examples from ALMA Science Verification Data: the protoplanetary disk, HL Tau as a typical smooth and filled image, and the lensed galaxy, SDP.81 as a sparse image. In our presentation, these results will be presented with some performance information. The comparison between our results and those of traditional CLEAN method will also be provided. Finally, our future improvement and enhancement plan to make our tool competitive with CLEAN will be shown.
12.15–12.30 | Lightning Talks |
12.45–14.00 | Buffet Lunch |
14.00–14.30 | Maggie Lieu Deep learning of astronomical features with big data. |
In Astronomy, there is a tendency to build machine learning codes for very specific object detection in images. The classification of asteroids and non-asteroids should be no different than the classification of asteroids, stars, galaxies, cosmic rays, ghosts or any other artefact found in astronomical data. In computer science, it is not uncommon for machine learning to train on hundreds of thousands of object categories, so why are we not there yet? I will talk about image classification with deep learning and how we can make use of existing tools such as the ESA science archive, ESAsky and citizen science to help realise the full potential of object detection and image classification in Astronomy.
14.30–14.45 | Megan Ansdell Automatic Classification of Planet Candidates using Deep Learning |
We present results from a NASA Frontier Development Lab (FDL) project to automatically classify candidate transit signals identified by the Kepler mission and the Transiting Exoplanet Survey Satellite (TESS) using deep learning techniques applied with compute resources provided by the Google Cloud Platform. NASA FDL is an applied artificial intelligence research accelerator aimed at implementing cutting-edge machine learning techniques to challenges in the space sciences. The Kepler and TESS missions produce large datasets that need to be analyzed efficiently and systematically in order to yield accurate exoplanet statistics as well as reliably identify small, Earth-sized planets at the edge of detectability. Thus we have developed a deep neural network classification system to rapidly and reliably identify real planet transits and flag false positives. We build on the recent work of Shallue & Vanderburg (2018) by adding "scientific domain knowledge" to their deep learning model architecture and input representations to significantly increase model performance on Kepler data, in particular for the lowest signal-to-noise transits that can represent the most interesting cases of rocky planets in the habitable zone. These improvements also allowed us to drastically reduce the size of the deep learning model, while still maintaining improved performance; smaller models are better for generalization, for example from Kepler to TESS data. This classification tool will be especially useful for the next generation of space-based photometry missions focused on finding small planets, such as TESS and PLATO.
14.45–15.00 | Bojan Nikolic Acceleration of Non-Linear Minimisation with PyTorch |
Minimisation (or, equivalently, maximisation) of non-linear functions is a widespread tool in astronomy, e.g., maximum likelihood or maximum a-posteriori estimates of model parameters. Training of machine learning models can also be expressed as a minimisation problem (although with some idiosyncrasies). This similarity opens the possibility of re-purposing machine learning software for general minimisation problems in science. I show that PyTorch, a software framework intended primarily for training of neural networks, can easily be applied to general function minimisation in science. I demonstrate this with an example inverse problem, the Out-of-Focus Holography technique for measuring telescope surfaces, where a improvement in time-to-solution of around 300 times is achieved with respect to a conventional NumPy implementation. The software engineering effort needed to achieve this speed is modest, and readability and maintainability are largely unaffected.
15.00–16.00 | Break/ Poster Session |
15.30–16.00 | John Good Image Processing in Python With Montage |
The Montage image mosaic engine (http://montage.ipac.caltech.edu; https://github.com/Caltech-IPAC/Montage) has found wide applicability in astronomy research, integration into processing environments, and is an examplar application for the development of advanced cyber-infrastructure. It is written in C to provide performance and portability. Linking C/C++ libraries to the Python kernel at run time as binary extensions allows them to run under Python at compiled speeds and enables users to take advantage of all the functionality in Python. We have built Python binary extensions of the 59 ANSI-C modules that make up version 5 of the Montage toolkit. This has involved a turning the code into a C library, with driver code fully separated to reproduce the calling sequence of the command-line tools; and then adding Python and C linkage code with the Cython library, which acts as a bridge between general C libraries and the Python interface. We will demonstrate how to use these Python binary extensions to perform image processing, including reprojecting and resampling images, rectifying background emission to a common level, creation of image mosaics that preserve the calibration and astrometric fidelity of the input images, creating visualizations with an adaptive stretch algorithm, processing HEALPix images, and analyzing and managing image metadata. The material presented here will be made freely available as a set of Jupyter notebooks posted on the Montage GitHub page. Montage is funded by the U. S. National Science Foundation (NSF) under Grant Number ACI-1642453.
16.00–16.15 | Sankalp Gilda Importance of Feature Selection in ML models |
Importance of Feature Selection in ML models An ever looming threat to astronomical applications of ML, and especially DL, is the danger of overfitting data. In particular, we refer to the problem of stellar parameterization from low-mid resolution spectra. The preferred method to deal with this issue is to develop and use spectral indices - this requires careful measurements of equivalent widths of blended spectral lines. This is prone to use error, and does not often result in very accurate results wrt the output parameters. In this work, we tackle this problem using an iterative ML algorithm to sequentially prune redundant features (wavelength points) to arrive at an optimal set of features with the strongest correlation with each of the output variables (stellar parameters) - T_eff, log(g) and [Fe/H]. We find that even at high resolution with tens of thousands of pixels (wavelength values), most of them are not only redundant, but actually decrease the mean absolute errors (MAEs) of the model output wrt the true values of the parameters. Our results are particularly significant in this era of exploding astronomical observational capabilities, when we will undoubtedly be faced with the 'curse of dimensionality'. We illustrate the importance of feature selection to reduce noise, improve model predictions, and best utilize limited computational and hardware resources on various downsampled and degraded synthetic PHOENIX spectra, by convolving the raw high res (500,000) sources to low and mid res (2,000 - 15,000).
16.15–16.30 | Cong Dai A method to detect radio frequency interference based on convolutional neural networks |
Along with the rapid development of telecommunication, radio frequency interference (RFI) generated from diverse human produced sources like electronic equipment, cell phones, GPS and so on can contaminate the weak radio band data. Therefore, RFI is an important challenge for radio astronomy. RFI detection can be regarded a special task of image segmentation. As for RFI signals, they appears in the form of point, vertical or horizontal lines. However, most existing convolution neural networks (CNNs) perform classification tasks, where the output is the single classification label of an image. The U-Net enables classification of each pixel within the image, which is suitable and competitive for image segmentation. Thus, in this paper, we implement the U-Net of 14 layers with framework of Keras to detect RFI signals. The U-Net can perform the classification task of clean signal and RFI. Also, the U-Net is a kind of extended CNN with symmetric architecture, which consists of a contracting path to capture context information and extract features and an expanding path to get precise localization. It extracts the features of RFI for learning RFI distribution pattern and then calculates the probability value of RFI for each pixel. Then we set a threshold to get the results flagged by RFI. We train the parameter of the U-Net with “Tianlai” data(A radio telescope-array, the observing time is from 20:15:45 to 24:18:45 on 27th of September 2016, the frequency is from 744MHz to 756MHz and the number of baseline is 18528). The experimental results show that, compared with the traditional RFI flagging method, this approach performs better with satisfying accuracy and takes into account the relationship between different baselines, which contributes to correctly and effectively flag RFI.
16.30–17.00 | Erik Kuulkers Coordinating observations among ground and space-based telescopes in the multi-messenger era |
The emergence of time-domain multi-messenger (astro)physics asks for new and more efficient ways of interchanging information, as well as collaboration. Many space- and ground-based observatories have web pages dedicated to showing information about the complete observations and planned observation schedule. The aim would be to standardise the exchange of information about observational schedules and set-ups between facilities and in addition, to standardise the automation of visibility checking for multiple facilities. To reach this, we propose to use the VO protocols (ObsTAP-like) to write services to expose these data to potential client applications and to develop cross facilities visibility servers.
17.15–18.15 | Simon O'Toole How do you get the most out of your teams? |
See BoF Page
17.15–18.15 | Peter Shawhan Data analysis challenges for multi-messenger astrophysics |
See BoF Page
19.00–22.00 | Conference Banquet |
07.00–08.30 | Buffet Breakfast |
08.55–09.00 | Morning Announcements |
09.00–09.15 | Beatriz Martinez Data-driven Space Science at ESA Science Data Centre |
For many scientists nowadays, the first step in doing science is exploring the data computationally. New approaches to data-driven science are needed due to the big increase of space science mission’s data in volume, heterogeneity, velocity and complexity. This applies to ESA space science missions, whose archives are hosted at the ESA Science Data Centre (ESDC). Some examples are the Gaia archive -whose size is estimated to grow up to 1PB and 6000 billion of objects-, the Solar Orbiter archive -which is expected to handle several time series with more than 500 millions of records- and the Euclid archive, which shall be able to handle up to 10PB of data. The ESDC aims, as a major objective, to maximize the scientific exploitation of the archived data. Challenges are not limited to manage the large volume of data, but also to allow collaboration between scientists, to provide tools for exploring and mining the data, to integrate data (the value of data explodes when it can be linked with other data), or to manage data in context (track provenance, handle uncertainty and error). ESDC is exploring solutions for handling those challenges in different areas. Specifically: storage of big catalogues through distributed databases (ex. Greenplum, Postgres-XL,…); storage of long time series in high resolution via time series oriented databases (TimeScaleDB); fulfil data analysis requirements via Elasticsearch or Spark/Hadoop; and enabling scientific collaboration and closer access to data via JupyterLab, Python client libraries and integration with pipelines using containers. In this presentation we are going to take a tour of these approaches.
09.15–09.30 | Simon O'Toole Bringing together the Australian sky - coordination and interoperability challenges of the All-Sky Virtual Observatory |
The Australian All-Sky Virtual Observatory (ASVO) consists currently of 5 nodes. There are 2 nodes with optical astronomical data; Data Central (MQ) and Skymapper (ANU). There are 2 nodes with radio data; Murchison Wide Field Array (MWA, Curtin) and CSIRO ASKAP Science Data Archive (CASDA, CSIRO). The last node is the Theoretical Astrophysical Observatory (TAO, Swin). These 5 nodes work together under the unified ASVO. The Australian astronomical user community is driving multi-node and multi-wavelength use cases, for example, querying Data Central spectroscopic data with Skymapper imaging data. Meeting the user requirements of the community comes with complexities and challenges. Some of the challenges we are facing include a single sign-on (unified authorisation/authentication) and the querying and representation of very different remote data, such as, overlaying GaLactic and Extragalactic All-sky MWA Survey (GLEAM) data stored in Western Australia with imaging data stored in Eastern Australian states. This presentation will discuss the challenges and successes in both co-ordinating the Australian ASVO and providing interoperability across the 5 nodes.
09.30–09.45 | Juan Gonzalez-Nuñez Driving Gaia Science from the ESA Archive: DR2 to DR3 |
Released 25th April, Gaia DR2 hosted in the ESA Gaia archive is leading a paradigm shift in the way astronomers access and process astronomical data in ESA archives. An unprecedented active community of thousands of scientists is making use of the latest IVOA protocols and services (TAP, DataLink) in this archive, benefitting of remote execution and persistent, authenticated, server side services to speed up data exploration and analysis. The availability of a dedicated Python library for this purpose is connecting the archive data to new data processing workflows. The infrastructure serving this data has been upgraded from DR1, now making use of replication, clustering, high performance hardware and scalable data distribution systems in new ways for ESA astronomical archives. VO orientation of the archive has been strengthened by the provision of Time Series in DR2 through use of a VO aware format and protocol. In order to cover the overwhelming data volume of DR3, new services will be offered to the general astronomical community. Remote execution of code, with notebook services and access to data mining infrastructure as a service are the topics under development. In this talk, it will be described how the current archive does enable to analyse Gaia data more effectively linked to how this is changing data analysis workflows. The infrastructure created for this purpose will be described, and the architecture and plans under implementation for DR3.
09.45–10.00 | Alice Allen Receiving Credit for Research Software |
Though computational methods are widely used in many disciplines, those who author these methods have not always received credit for their work. This presentation will cover recent changes in astronomy, and indeed, in many other disciplines, that include new journals, policy changes for existing journals, community resources, changes to infrastructure, and availability of new workflows that make recognizing the contributions of software authors easier. This talk will include steps coders can take to increase the probability of having their software cited correctly and steps researchers can take to improve their articles by including citations for the computational methods that enabled their research.
10.00–10.30 | Conference Photo |
10.30–11.15 | Break/ Poster Session |
10.45–11.15 | Karan Vahi Workflows Management using Pegasus |
Workflows are a key technology for enabling complex scientific applications. They capture the interdependencies between processing steps in data analysis and simulation pipelines, as well as the mechanisms to execute those steps reliably and efficiently in a distributed computing environment. They also enable scientists to capture complex processes to promote method sharing and reuse and provide provenance information necessary for the verification of scientific results and scientific reproducibility. Application containers such as Docker and Singularity are increasingly becoming a preferred way for bundling user application code with complex dependencies, to be used during workflow execution. The use of application containers ensures the user scientific code is executed in a homogenous environment tailored for application, even when executing on nodes with widely varying architecture, operation systems and system libraries. This demo will focus on how to model scientific analysis as a workflow and execute them on distributed resources using the Pegasus Workflow Management System (http://pegasus.isi.edu). Pegasus is being used in a number of scientific domains doing production grade science. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first ever detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called Cybershake to generate hazard maps for the Southern California region. In March 2017, SCEC conducted a CyberShake study on DOE systems ORNL Titan and NCSA BlueWaters to generate the latest maps for the Southern California region. Overall, the study required 450,000 node-hours of computation across the two systems. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses. Pegasus allows users to design workflows at a high-level of abstraction, that is independent of the resources available to execute them and the location of data and executables. It compiles these abstract workflows to executable workflows that can be deployed onto distributed and high-performance computing resources such as DOE LCFs like NERSC, XSEDE, local clusters, and clouds. During the compilation process, Pegasus WMS does data discovery, locating input data files and executables. Data transfer tasks are automatically added to the executable workflow. They are responsible for staging in the input files to the cluster, and for transferring the generated output files back to a user-specified location. In addition to the data transfers tasks, data cleanup (cleanup data that is no longer required) and data registration tasks (catalog the output files) are be added to the pipeline. For managing user’s data, Pegasus interfaces with a wide variety of backend storage systems (with different protocols). It also has variety of reliability mechanisms in-built ranging from automatic job retries, workflow-checkpointing to data reuse. Pegasus also performs performance optimization as needed. Pegasus provides both a suite of command line tools and a web-based dashboard for users to monitor and debug their computations. Over the years, Pegasus has also been integrated into higher level domain specific and workflow composition tools such as Portals, HUBzero and Wings. We also recently have added support for Jupyter notebooks, that allows users to compose and monitor workflows in a Jupyter notebook.
11.15–11.30 | Hackathon Prizewinner TBD |
TBD
11.30–12.00 | Ada Nebot Data challenges of the VO in Time Domain Astronomy |
Surveys specifically designed to monitor the transient sky have opened the window for discovery and exploration through time domain. Source classification and transmission of the alerts for further follow-up as well as analysing possible periodicity in variable sources poses a challenge with the huge amounts of data synoptic missions are providing. We will review some of the challenges of Time Domain data and we will share some of the tools and services that are being built within the Virtual Observatory to discover, access, visualise and analyse Time Domain data, focusing in particular on Time Series.
12.00–12.15 | Mario Juric The ZTF Alert Stream: Lessons from the first six months of operating an LSST precursor |
The Zwicky Transient Facility (ZTF) is an optical time-domain survey that is currently generating about one million alerts each night for transient, variable, and moving objects. The ZTF Alert Distribution System (ZADS; Patterson et al.) packages these alerts, distributes them to the ZTF Partnership members and community brokers, and allows for filtering of the alerts to objects of interest, all in near-real time. This system builds on industry-standard real-time stream processing tools: the Apache Avro binary serialization format and the Apache Kafka distributed streaming platform. It leverages concepts and tools being developed for LSST (Python client libraries), with the source code publicly available on GitHub. This talk will give an overview of the ZTF alert distribution system. We will examine lessons learned from ~six months of operating an LSST precursor alert stream (both from the operator and end-user perspective), discuss opportunities for standardization, and implications for the LSST.
12.15–12.30 | Lightning Talks |
12.30–14.00 | Buffet Lunch |
14.00–14.30 | Matthew Holman The Minor Planet Center Data Processing System |
The Minor Planet Center (MPC) is the international clearing house for all ground-based and space-based astrometric and radar observations of asteroids, comets, trans-Neptunian objects, and outer satellites of the giant planets. The MPC assigns designations, provides up-to-date ephemerides, and coordinates follow-up observations for these objects. To meet the needs of the community, the MPC currently receives and processes over two million observations per month and maintains a catalog of orbits of more than 700K objects. Although the MPC processes observations of all minor solar system bodies, its focus is near-Earth objects (NEOs). All MPC operations are organized around this central function. The MPC is required to warn of NEOs approaching within 6 Earth Radii within the coming 6 months. Thus, the main components of the MPC's data processing system enable real-time identification of candidate NEOs, with possible impact trajectories, within a much larger volume of asteroids and other solar system objects. A few such alerts are issued each year, including that for ZLAF9B2/2018 LA. In addition, The MPC facilitates follow up observations and the coordination of observing assets for efficient recovery searches for NEOs. We anticipate that the data volumne will increase to a factor of 10 to 100 over the next decade as surveys such as LSST and NEOCam come online, augmenting the already-large volume from programs such as Pan-STARRS, the Catalina Sky Survey, NEOWISE, and ZTF. Thus, we are in the process of building and testing a new MPC data processing system. The goals are to maximize accuracy, data accessibility, automation, and uptime while minimizing latency and maintaining dependable archives of all data received. In this talk I will highlight the challenges faced by the MPC, demonstrate the key components of our data processing system, and describe a number of algorithmic advances that support a much more efficient and reliable system. The MPC operates at the Smithsonian Astrophysical Observatory, part of the Harvard-Smithsonian Center for Astrophysics (CfA), under the auspices of the International Astronomical Union (IAU). The MPC is 100% funded by NASA as a functional sub-node of the Small Bodies Node (SBN) of the NASA Planetary Data System at U. Maryland.
14.30–14.45 | Elena Racero ESASky: A New Window for Solar System Data Exploration |
Allowing the solar system community fast and easy access to the astronomical data archives is a long-standing issue. Moreover, the everyday increasing amount of archival data coming from a variety of facilities, both from ground-based telescopes and space missions, leads to the need for single points of entry for exploration purposes. Efforts to tackle this issue are already in place, such as the ‘Solar System Object Image Search by the Canadian Astronomy Data Centre’ (CADC), plus a number of ephemeris services, such as Horizons (NASA-JPL), Miriade (IMCCE) or the Minor Planet & Comet Ephemeris Service (MPC). Within this context, the ESAC Science Data Centre (ESDC), located at the European Space Astronomy Centre (ESAC) has developed ESASky (http://sky.esa.int), a science driven discovery portal to explore the multi-wavelength sky providing a fast and intuitive access to all ESA astronomy archive holdings. Released in May 2016, ESASky is a new web application that sits on top of ESAC hosted archives, with the goal of serving as an interface to all high-level science products generated by ESA astronomy missions. The data spans from radio to x-ray and gamma-ray regimes, with Planck, Herschel, ISO, HST, XMM-Newton and Integral missions. We present here the first integration of the search mechanism for solar system objects through ESASky. Based on IMCCE Eproc software for ephemeris precomputation, it allows fast discovery of photometry observations from ESA missions that potentially contain those objects within their Field Of View. In this first integration, the user is able to input a target name and retrieve on-the-fly the results for all the observations from the above-mentioned missions that match the input provided, that is, that contains within the exposure time frame the ephemerides of such objects. Finally, we will also discuss current developments and future plans in strong collaboration with some of the main actors in the field.
14.45–15.00 | Anne Raugh The PDS Approach to Science Data Quality Assurance |
The Planetary Data System (PDS) has been mandated by NASA not merely to preserve the bytes returned by its planetary spacecraft, but to ensure those data are usable through generations - 50-100 years into the future. When PDS accepts data for archiving, it must be complete, thoroughly documented, and as far as possible autonomous within the archive (that is, everything needed to understand and use the data must be in the archive as well). Two pillars support the PDS mission: The PDS4 Information Model, and the mandatory External Peer Review. The PDS4 Information Model codifies metadata not just for structure, but for provenance, interpretation, and analysis as well. The XML document structures defined for the current implementation of the model and its various constituent namespaces define minimum requirements and present best practices for describing all these aspects of the archival data. The schematic enforcement of these requirements provides a simple, automated approach to ensuring the metadata are present and well-formed. The PDS External Peer Review is required for all data prior to acceptance for archiving. Equivalent to the refereeing process for journal articles, The PDS External Peer Review presents the candidate data to discipline experts unaffiliated with the creation of the data. These reviewers exercise the data in its archival form by reproducing published results, doing comparative analysis between the candidate data and similar or correlated results, and so on, using only the archival resources. These reviewers then determine if the data are of archival quality, and where needed, formulate a list of corrections and additions required prior to archiving. Together, the Information Model guides data preparers to producing well-formatted, well-documented data products while the External Peer Review ensures the archive submission is complete, usable, and of sufficient quality to merit permit preservation - and support - as part of the Planetary Data System archives.
15.00–16.00 | Break/ Poster Session |
16.00–16.30 | Ivelina Momcheva Hubble in the Cloud: A Prototype of a Science Platform at STScI |
The availability of high-quality, highly-usable data analysis tools is of critical importance to all astronomers as is easy access to data from our archives. In this talk I will describe the approach to developing the prototype of a new cloud-based data management environment for astronomical data reduction and analysis at STScI. I will examine the decisions we made, I will demonstrate the prototype and I will discuss what new areas of scientific exploration and discovery are opened by this platform.
16.30–16.45 | Michael Fitzpatrick The NOAO Data Lab: Design, Capabilities and Community Development |
We describe the NOAO Data Lab, a new science platform to efficiently utilize catalog, image and spectral data from large surveys in the era of LSST. Data Lab provides access (through multiple interfaces) to many current NOAO, public survey and external datasets to efficiently combine traditional telescope image/spectral data with external archives, share results and workflows with collaborators, experiment with analysis toolkits and publish science-ready results for community use. The architecture, science use-case approach to designing the system, its current capabilities and plans for community-based development of analysis tools and services are presented. Lessons learned in building and operating a science platform, challenges to interoperability with emerging platforms, and scalability issues for Big Data science are also discussed.
16.45–17.00 | Tom Donaldson Astropy and the Virtual Observatory |
The International Virtual Observatory Alliance (IVOA) has been defining standards for interoperable astronomical data exchange since 2002. Many of these standards are being used successfully and extensively by archives and end user tools to enable data discovery and access. Nevertheless a skepticism persists in parts of the community about the utility and even relevance of these standards, as well as the processes by which they were written. By contrast, the Astropy Project, with its very different processes (and somewhat different goals), has been widely embraced by the community for the usefulness and usability of its interoperable Python packages. In this talk I will discuss what these projects might learn from each other, and how more collaboration might benefit both projects and the community in general.
17.15–18.15 | Alice Allen Unconference Session: I want to talk about... |
See BoF Page
07.00–08.30 | Buffet Breakfast |
08.55–09.00 | Morning Announcements |
09.00–09.30 | Rocio Guerra Noguero DevOps: the perfect ally for Science Operations for a large and distributed astronomy project. |
The Gaia Science Operations Centre (SOC) is an integral part of a large consortium responsible for Gaia data processing. Serving terabytes of processed data on a daily basis to other Processing Centres across Europe makes unique demands on the processes, procedures, as well as the team itself. In this talk I will show how we have embraced the DevOps principles to achieve our goals on performance, reliability and teamwork.
09.30–09.45 | Marcel Loose Agile and DevOps from the trenches at ASTRON |
A few years ago the Software development teams at ASTRON decided to adopt the Agile/Scrum software development method. We are building instruments and software that push technological boundaries. Requirements often lack sufficient detail and are subject to constant change, whilst the first data from a new instrument or early prototype become available. The unknown unknowns largely outnumber the known unknowns. Agile/Scrum has proven to be successful in situations like these. We stumbled and fell, but gained a lot of experience in how Agile development techniques can be used in the scientific arena. We learned what works, and what does not work. We became more and more convinced that Agile/Scrum can be very effective in the area of Scientific Software development. In this presentation I would like to take you by the hand and revisit the journey we have made, in the hope that you will learn from the mistakes that we have made, and the lessons that we have learned.
09.45–10.00 | Jeffrey Smith Lilith: A Versatile Instrument and All-Sky Simulator for use with Space-Based Astrophysics Observatories |
To help facilitate the development of the Transiting Exoplanet Survey Satalite (TESS) data analysis pipeline, it was necessary to produce simulated flight data with sufficient fidelity and volume to exercise all the capabilities of the pipeline in an integrated way. As a generator of simulated flight data, Lilith, was developed for this purpose. We describe the capabilities of the Lilith software package, with particular attention to the interaction between the implemented features and the pipeline capabilities that it exercises. Using a physics-based TESS instrument and sky model, Lilith creates a set of raw TESS data which includes models for the CCDs, readout electronics, camera optics, behavior of the attitude control system (ACS), spacecraft orbit, spacecraft jitter and the sky, including zodiacal light, and the TESS Input Catalog. The model also incorporates realistic instances of stellar astrophysics, including stellar variability, eclipsing binaries, background eclipsing binaries, transiting planets and diffuse light. This simulated data is then processed through the TESS pipeline generating full archivable data products. Full instrumental and astrophysics ground truth is available and can be used as a training set for TESS data analysis software, such as when training a machine learning classifier for planet candidates. Our intention is to continue to tune Lilith as real TESS flight data becomes available, allowing for an up-to-date simulated set of data products to complement the mission flight data products, thereby aiding researchers as they continue to adapt their tools to the TESS data streams. We discuss the execution performance of the resulting package, and offer some suggestions for improvements for instrument and sky simulators to be developed for other missions.
10.00–10.15 | Malik Olivier Boussejra aflak: Pluggable Visual Programming Environment with Quick Feedback Loop Tuned for Multi-Spectral Astrophysical Observations |
In the age of big data and data science, some may think that artificial intelligence would bring analytical solution to every problem. However, we argue that there is still ample room left for human insight and exploration thanks to visualization technologies. New discoveries are not made by AI (yet!). This is true in all scientific domains, including astrophysics. With the improvements of telescopes and proliferation of sky surveys there is always more data to analyze, but not so many astronomers. We present aflak, a visualization environment to open astronomical datasets and analyze them. This paper’s contribution lies in that we leverage visual programming techniques to conduct fine-grained, astronomical transformations, filtering and visual analyses on multi-spectral datasets with the possibility for the astronomers to interactively fine-tune all the interacting parameters. By visualizing the computed results in real time as the visual program is designed, aflak puts the astronomer in the loop, while managing data provenance at the same time.
10.15–11.15 | Break/ Poster Session |
11.15–11.45 | Michael Wise Establishing the SKA Regional Centre Network: Mesh Management and Culture Change |
The Square Kilometre Array (SKA) is an ambitious project to construct the world’s most powerful radio telescope and enable transformational scientific discoveries across a wide range of topics in physics and astronomy. With two telescopes sites located in the deserts of South African and West Australia, an operational headquarters based in the UK, and 12 different member countries contributing to the design and construction, the SKA is truly a global endeavor. Once operational, the SKA is expected to produce an archive of science data products with an impressive growth rate on the order of 700 petabytes per year. Hosting the resulting SKA archive and subsequent science extraction by users will require a global research infrastructure providing additional capacity in networking, storage, computing, and support. This research infrastructure is currently foreseen to take the form of a federated, global network of SKA Regional Centres (SRCs). These SRCs will be the primary interface for researchers in extracting scientific results from SKA data and, as such, are essential to the ultimate success of the telescope. The unprecedented scale of the expected SKA data stream, however, requires a fundamental change in the way radio astronomers approach extracting their science. Efforts are already underway in various countries around the world to define and deploy the seeds of what will grow into a community-provided research infrastructure that can deliver SKA science. In this talk, I will give an update on these initial efforts as well as the various technological, management, and sociological challenges associated with establishing the SKA Regional Centre network.
11.45–12.00 | Anastasia Alexov Hit the Ground Running: Data Management for JWST |
As the launch of James Webb Space Telescope (JWST) approaches a team of engineers and scientist is hard at work developing the Data Management Subsystem (DMS) for JWST with its cadre of complex imaging and spectral instruments. DMS will perform receipt of science and engineering telemetry data; will perform reformatting, quality checking, calibration, data processing; will archive the data; will have tools for retrieving the data; will have the capacities for reprocessing the data; will have external/public calibration tools; will provide user notification, search, and access tools for JWST science and engineering data; will distribute data to the end user; provide extensive user analysis/visualization tools; and, will provide support for contributed data products from the community. We will give an overview of the software components, the hardware they run on, the programming languages/systems used, the complexity of the tested end to end science data flow, the current functionality of the system and what's to come for the JWST Data Management Subsystem in preparation for launch.
12.00–12.15 | Maurizio Tomasi Towards new solutions for scientific computing: the case of Julia |
This year marks the consolidation of Julia (https://julialang.org/), a programming language designed for scientific computing, as the last version before 1.0 has just been released (0.7). Among its main features, expressiveness and high execution speeds are the most prominent: the performance of Julia code is similar to statically compiled languages, yet Julia provides a nice interactive shell and fully supports Jupyter; moreover, it can transparently call external codes written in C, Fortran, and even Python without the need of wrappers. The usage of Julia in the astronomical community is growing, and a GitHub organization named JuliaAstro takes care of coordinating the development of packages. In this talk, we will provide an overview of Julia and JuliaAstro. We will also provide a real-life example by discussing the implementation of a Julia-only simulation pipeline for a large-scale CMB experiment.
12.15–12.30 | BOF Lightning Summaries |
12.30–12.45 | Final Remarks |
12.45–14.00 | Box Lunch |