Efficient and scalable cross-matching of (very) large catalogues
Whether it be for building multi-wavelength datasets from independent surveys, studying changes in objects luminosity or detecting moving objects (stellar proper motions, asteroids), cross-catalogue matching is a technique widely used in astronomy. The need for efficient, reliable and scalable cross-catalogue matching is becoming even more pressing with forthcoming projects which will produce huge catalogues in which astronomers will dig for rare objects, perform statistical analysis and classification, or real-time transients detection.
We have developed a formalism and the corresponding technical framework to address the challenge of fast cross-catalogue matching. Our formalism supports more than simple nearest-neighbour search, and handles elliptical positional errors. Scalability is improved by partitioning the sky using the HEALPIX scheme, and processing independently each sky cell. The use of multi-threaded two-dimensional kd-trees adapted to managing equatorial coordinates enables efficient neighbour search.
The whole process can run on a single computer, but could also use clusters of machines to cross-match future very large surveys such as GAIA or LSST in reasonable times. We already achieve performances where the 2MASS (~470M sources) and SDSS DR7 (~350M of sources) can be matched on a single machine in less than 25 minutes.
We aim at providing astronomers with a catalogue cross-matching service, available on-line and leveraging on the catalogues present in the VizieR database. This service is to allow users both to access to pre-computed cross-matches across some very large catalogues, and to run customized cross-matching operations. It also is to support VO protocols for synchronous or asynchronous queries.
Return to oral presentation list