Two Years Before the Mast: Fermi LAT Computing Two Years After Launch - Eight to Go!

Dubois, Richard

The Fermi Observatory was launched on June 11, 2008 and the Large Area Telescope (LAT) was activated on June 25. Some 13 GB of data is downlinked daily, transformed into 500 GB in the event reconstruction process, spread out over approximately 8 contacts per day. Each data run is farmed out to several hundred computing cores and results merged back together in our processing pipeline. The pipeline is designed to execute complex processing trees defined in xml and to handle multiple tasks simultaneously, including prompt data processing, simulations and data reprocessings. Our system has a pair of Oracle servers at its core to maintain all the state and dataset bookkeeping. Batch processing is centrally dispatched to the SLAC LSF and Lyon (France) batch farms with more than 5000 shared cores. The xrootd cluster filesystem is used for high throughput and management of large disk pools. Nagios and Ganglia are used for problem alerts and tracking resource usage. The HEP-like instrument event reconstruction lives in a Root world, while high level science is done in FITS. LAT Collaboration users have access to the data via web query engines that slice and dice the data to their needs, also executing the queries in the processing pipeline. The data is now public, so we have the issues of new development vs stability for an outside user base. Two years later, we are dealing with the issues of long term support - how to keep a complex operation alive and vital for 10 years, and how to deal with dependencies on external packages whose support is out of our control.

Return to poster list