Towards HDF5: Encapsulation of Large and/or Complex Astronomical Data

Wise, Michael

The size and complexity of astronomical data are growing at relentless rates. This increase is especially apparent in the radio community as evidenced by the data challenges faced by many of the SKA pathfinders and other major new radio telescopes such as LOFAR, EVLA, ALMA, ASKAP, MeerKAT, MWA, LWA, and eMERLIN. Enormous data rates are also becoming a challenge for large optical projects that are currently ramping up including Pan-Starrs and LSST. As progress towards meeting these challenges, ASTRON and the LOFAR project are currently exploring the use of the Hierarchical Data Format (HDF5) for LOFAR radio data encapsulation. HDF5 is a data model, library, and file format for storing and managing data. From its inception, it has been designed for the flexible and efficient I/O of high volumes of complex data. The HDF5 technology suite includes a 20-year history of tools and applications for managing, manipulating, viewing, and analyzing data. Most of LOFAR's standard data products will be stored using the HDF5 format including Radio Sky Images, Beam-Formed Time Series Data, Transient Buffer Board Data, and Dynamic Spectra.

In this session, we hope to bring together scientists and developers struggling with large and complex datasets as well as those groups currently exploring HDF5 implementations. Topics of discussion will include: 1) input from the community on their experiences with HDF5; 2) data on I/O performance and storage structure efficiency; 3) toolsets which exist and/or can be adapted for HDF5; and 4) general lessons learned from working with large and complex data. The organizers welcome contributions to the BoF agenda on HDF5 and other file formats that can accommodate streaming data encapsulation on the order of 25TB/hr. Ultimately, we hope these efforts pave the way towards a modern standard of astronomical data encapsulation for future ground and space-based projects.

Return to poster list