Thread: includeavailability

Started: 2014-11-26 21:48:00
Last activity: 2014-12-02 22:29:29
Topics: Web Services
Joachim Saul
2014-11-26 21:48:00
Hello,

how frequently is the availability information updated at the IRIS
server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream

<DataAvailability><Extent start="1998-10-26T20:35:58"
end="2014-11-25T12:00:00"/></DataAvailability>

which ends about a day ago. However, the data are there (of course) and
I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00

In other words the availability information is not in sync with the
actual data holdings. Not just for IU.ANMO but for many more
stations/networks. Is this intentional? It currently prevents me from
making use of the otherwise very useful availability info when
requesting data from a few hours ago.

Regards
Joachim

  • Chad Trabant
    2014-11-26 20:25:57

    Hi Joachim,

    The data availability sub-system at the DMC is updated every 12 hours for archive data. Data is added to the archive (i.e. from the real time collection system) on a regular basis and is usually all in place ~18 hours after arriving, often sooner, but it varies depending on system load. The data availability you see our our fdsnws-station service is determined by the combination of those activities.

    To report the near real-time data as available when setting matchtimeseries=TRUE, our service assumes that data that has been archived within the last 36 hours is still flowing into the data center. This is documented in the Data Availability section of the Help for the service: http://service.iris.edu/fdsnws/station/docs/1/help/

    So yes, the data availably information is not in perfect sync with the actual availability, it never will be of course: (in the extreme) by the time you receive and parse the response from our service even the real time information is likely out of date. So it's really a question of how good is good enough. Frankly, we have found that the 'matchtimeseries' parameter with the 36 hour rule for near real time data is sufficient for most needs.

    If you just want to know what data is available a couple of hours ago, consider using matchtimeseries with a correct time window, and then assume any returned metadata is for channels that have data available. It'll be pretty darn close to correct.

    We would consider putting effort into improving the near real-time data availability information if there is enough need, so I'd be happy to hear from folks that need such information. Possibly, we could find alternatives, like checking our real-time SeedLink export system, for many use cases.

    Chad

    On Nov 26, 2014, at 4:48 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

    Hello,

    how frequently is the availability information updated at the IRIS server side? If I do now an inventory request using e.g.
    http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
    then I get as availability time span for that stream

    <DataAvailability><Extent start="1998-10-26T20:35:58" end="2014-11-25T12:00:00"/></DataAvailability>

    which ends about a day ago. However, the data are there (of course) and I can retrieve them using
    http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00

    In other words the availability information is not in sync with the actual data holdings. Not just for IU.ANMO but for many more stations/networks. Is this intentional? It currently prevents me from making use of the otherwise very useful availability info when requesting data from a few hours ago.

    Regards
    Joachim
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices


    • Joachim Saul
      2014-11-28 00:43:53
      Hi Chad

      Chad Trabant wrote on 11/26/14 21:25:
      The data availability sub-system at the DMC is updated every 12 hours
      for archive data. Data is added to the archive (i.e. from the real time
      collection system) on a regular basis and is usually all in place ~18
      hours after arriving, often sooner, but it varies depending on system
      load. The data availability you see our our fdsnws-station service is
      determined by the combination of those activities.

      To report the near real-time data as available when setting
      matchtimeseries=TRUE, our service assumes that data that has been
      archived within the last 36 hours is still flowing into the data center.
      This is documented in the Data Availability section of the Help for
      the service: http://service.iris.edu/fdsnws/station/docs/1/help/

      Indeed, thanks for pointing that out. It is now clear that this
      behaviour is not a mistake, so I can work around it in my client.

      On the other hand, quoting from the documentation:

      "Extents are not modified in real-time. The archive will likely be out
      of sync by up to a day, meaning:

      The service assumes that if channel data was archived within the
      last 36 hours, then data for the last 36 hours is available."

      Based on these assumptions and considering that any end time within the
      last 36 hours is equivalent to "this stream is probably producing
      near-real-time data now", wouldn't it be safe to set the end time of the
      availability time span to something very far in the future? I do
      understand that it may look odd to claim availability of not yet
      recorded data... but a similar trick is already used for the end times
      in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for
      the IU network. One might also leave the end time unset to indicate
      "open end".

      This would eliminate the need for additional logic at the client code.
      Implementation of such a logic requires some knowledge of the
      operational internals of the particular server, which may also be
      subject to changes.

      As an additional benefit the future end time would only require an
      update in the case of a prolonged data outage.

      Just an idea.

      Cheers
      Joachim


      • Chad Trabant
        2014-12-02 22:29:29

        Hi Joachim,

        On Nov 27, 2014, at 7:43 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

        Hi Chad

        Chad Trabant wrote on 11/26/14 21:25:
        The data availability sub-system at the DMC is updated every 12 hours
        for archive data. Data is added to the archive (i.e. from the real time
        collection system) on a regular basis and is usually all in place ~18
        hours after arriving, often sooner, but it varies depending on system
        load. The data availability you see our our fdsnws-station service is
        determined by the combination of those activities.

        To report the near real-time data as available when setting
        matchtimeseries=TRUE, our service assumes that data that has been
        archived within the last 36 hours is still flowing into the data center.
        This is documented in the Data Availability section of the Help for
        the service: http://service.iris.edu/fdsnws/station/docs/1/help/

        Indeed, thanks for pointing that out. It is now clear that this behaviour is not a mistake, so I can work around it in my client.

        On the other hand, quoting from the documentation:

        "Extents are not modified in real-time. The archive will likely be out of sync by up to a day, meaning:

        The service assumes that if channel data was archived within the last 36 hours, then data for the last 36 hours is available."

        Based on these assumptions and considering that any end time within the last 36 hours is equivalent to "this stream is probably producing near-real-time data now", wouldn't it be safe to set the end time of the availability time span to something very far in the future? I do understand that it may look odd to claim availability of not yet recorded data... but a similar trick is already used for the end times in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for the IU network. One might also leave the end time unset to indicate "open end".

        Not a bad idea. But then we could not communicate what we are certain to be a valid latest time, i.e. the archive holdings.

        This would eliminate the need for additional logic at the client code. Implementation of such a logic requires some knowledge of the operational internals of the particular server, which may also be subject to changes.

        If the station service is used by a client to prepare for a time series request, which I believe is very common, the 'matchtimeseries' solves the issue. In effect, our fdsnws-station service is already doing the time window matching logic when using that option, the client does not need to do this again. In fact, the client does not even need to request the availability information, just set matchtimeseries=true and assume whatever comes back intersects with data availably (this is exactly what our internal data extraction routine is doing).

        Of course there are other use cases for data availability where you might need to know the actual date-times. In my opinion, your proposed change would be handier for some but at the cost of others. If you described what you were doing we can keep it in mind for consideration.

        Chad




14:55:07 v.01697673