Data Services Newsletter

Volume 2 : No 3 : September 2000

Crop Rotation in the FARM

Although IRIS’ mission is to protect the entire archive of continuous data, event-related data holdings are the most valuable and frequently accessed. To support this high event-related activity, the DMC has routinely extracted the relevant portions of continuous holdings, assembled them into event-focused volumes in SEED format, and placed them onto DMC computing systems where they are easy to access by the general community. We call this event-related data collection the FARM and it is one of the most actively used parts of the data holdings at the IRIS DMC. But, just like many things, certain aspects of the FARM have grown stale and difficult to manage. It is time for a crop rotation of the FARM products.

As the following figure shows, of the 70,000 data shipments that we anticipate making this year, only 20,000 will actually come out of the large mass storage systems at the IRIS DMC. Roughly 3,800 of them will come from SPYDER® SEED volumes built in near-real time and nearly 46,000 will come straight from the FARM (via physical transfers on tape, by ftp, over the internet via WILBER, or the WEED program). Figures like this show us the value of the FARM products.

Data shipments for 2000
Figure 1: Data shipments for 2000,

The figure below shows both the total size of the FARM and the number of event volumes. The current FARM consists of only the GDSN and GSN network data from the years 1977 through May 2000. There are a total of 5,204 full SEED volumes with a total size of 104 gigabytes. Since these are full SEED volumes, it is difficult to ensure that the header information in the SEED volume is current, and in many instances we know the headers are out of date.

Total size of the FARM and the number of event volumes
Figure 2: Total size of the FARM and the number of event volumes

We are currently revamping the way in which the FARM is created. Some of the main changes are as follows:

  1. The new FARM will have data from all available networks. The current FARM includes data only from the GSN Network, and yet the DMC has data from nearly 100 networks (when one includes PASSCAL deployments). Mini-SEED volumes for each network will be stored separately in a Pool of Network Data (POND).
  2. The waveform data will be stored as mini-SEED (data only). The products will not have headers attached to them until the products are requested. In this manner, products will always have the most current metadata available and will never grow stale.
  3. There will be three types of FARM products: SPYDER®-FARM volumes (or simply SPYDER®), FARM volumes, and UV-FARM volumes.
    • The SPYDER®-FARM will be built from real-time, near-real-time, and quality controlled data sources as they arrive at the DMC. The hypocenters for these data will come from the same NEIC source that currently triggers SPYDER®. Data will continue to be added to a SPYDER®-FARM product until a FARM product for the event exists.
    • The FARM products will be built from data that pass through the existing quality control system of the DMS. These are the primary products in the FARM. The NEIC Weekly PDE will be the catalog used to build the FARM (using the Harvard Moment Magnitudes).
    • The UV-FARM will contain all the ultra-long period, and very-long period data channels for a given network. Each UV-FARM product will contain data for a two-week period.
  4. All FARM products will be dynamically updated as new data arrive. (Newly arrived data will flow into the appropriate FARM POND.) Therefore, FARM products will always contain all available data.
  5. The data in the SPYDER®-FARM and the FARM products will be coordinated. As quality controlled FARM products are built, the corresponding data from the SPYDER®-FARM volumes will be removed. If everything works as planned, the SPYDER®-FARM volume should eventually disappear. However if some data never reaches us from the quality-controlled path, it will remain forever in the SPYDER®-FARM volume.
  6. WILBER and WEED will be modified to take full advantage of the new organization of the FARM. These access tools will have to be updated to accommodate the addition of multiple network volumes (or PONDs).
  7. The basic algorithm for the FARM will only change slightly. We will attempt to go down to smaller events (Mw>=5.0) and we will lengthen the pre-event window for the long period channels.

Please be patient, there are several million files that must be produced for this project and so it will take a few months to get the new FARM system in place. In the intervening period we will continue to support the present FARM building process.

by Tim Ahern (IRIS Data Management Center)

10:19:28 v.01697673