[sac-dev] Subroutine interface to SAC XML datasets
George Helffrich
george at gly.bris.ac.uk
Thu Jan 31 01:59:10 PST 2008
Dear All -
The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.
Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is more
concise in the latter case.
On 31 Jan 2008, at 09:34, James Wookey wrote:
> Hi Rob, George;
>
> I can see the point that the data format currently proposed is a terse
> one, basically a minimalist description of a set of SAC traces. This
> does have some significant advantages: it is efficient in file size,
> and it provides a direct connection to the header variables which,
> after all, SAC users (as well as programmers) still have to refer to
> by their short name. I don't think we should go the KML route, which
> as George says, makes my eyes water with all the detail that is
> required. As a 'consultation' format for SACML it is well designed, as
> it is simple to understand and is conceptually very close to the
> binary SAC format, and George has done sterling work implementing it.
>
> However, in the longer term, I can also see the value in a limited
> expansion of the structure of SACML, if it is going to represent a
> large step forward in the SAC file format. If we are going to pay the
> price of adopting a verbose format like XML (and I think we should) we
> might as well try to reap some of the rewards, and also build enough
> flexibility into the format to allow incorporation of future things
> (even if they are currently ignored by the current input routines -
> the ability to do that is one of the advantages of XML). It seems to
> me that one thing worth considering is structuring the header. So one
> possible format might look like:
>
> <sacdataset>
> <trace>
> <header>
> <station>
> <kstnm>TEST</kstnm>
> <stla>40</stla>
> ...
> </station>
> <event>
> <evla>-20</evla>
> ...
> </event>
> <trace_info>
> <delta>0.05</delta>
> ...
> </trace_info>
> </header>
> <data>
> ...
> </data>
> </trace>
> </sacdataset>
>
> This has the advantage of still being easy to 'one-sweep' read with
> event-driven parsers like SAX (because you simply ignore the container
> elements), plus providing a more object-oriented format for use with
> parsers like DOM or xpath. We might also want to include/allow a
> subgrouping of traces within the file: a <tracegroup> container
> element for example.
>
> Cheers,
>
> James
>
> On 31 Jan 2008, at 08:38, George Helffrich wrote:
>
>> Dear Rob -
>>
>> The XML DTD is versioned, and one could imagine defining a new DTD
>> with alternative element groupings that would reflect the data
>> structure. One can obsess in describing data details, and one view
>> won't necessarily coincide with another person or community's view.
>> Google Earth's KML comes to mind -- unbelievably baroque for putting
>> points on a map for a seismologist, but probably glibly expressive
>> for GISers.
>>
>> Another view to take of the data is programming semantics, however.
>> Programmers see 1) header variables that are peeked and poked at; 2)
>> data. That was the view I took of the present DTD definition.
>>
>> On 31 Jan 2008, at 00:12, Robert Casey wrote:
>>
>>>
>>> Hi George-
>>>
>>> An interesting effort with SAC XML and you've made a lot of
>>> progress from the looks of it. I hate to comment too harshly on
>>> something that may already be an established standard, so my
>>> comments are only meant as an observation:
>>>
>>> It seems to me that the SAC XML format only half-divorces itself
>>> from fixed-format files due to the naming scheme for the header
>>> elements you've provided. Essentially, you've got just two entity
>>> names inside of <trace> that have no semantic quality to them: 'h'
>>> and 'd'.
>>>
>>> The nature of XML is such that the entities tend to me more grouped
>>> and self descriptive as far as their names go. So instead of having
>>> <h name="STEL">, why could it not instead be <STEL> ? To indicate
>>> this as a subcomponent of a SAC header of a SAC trace, you'd form a
>>> hierarchy:
>>>
>>> <sacdataset>
>>> <trace>
>>> <header>
>>> <stel>
>>>
>>> Even better would be to just call it <elevation>, maybe with a
>>> reference to its SAC field abbreviation as an attribute: <elevation
>>> id="STEL">.
>>>
>>> The reason for the comment is not just for human readability, but
>>> for the notion that many XML parsers will treat such elements as
>>> objects, carrying the element name with them. Having your fields
>>> broken down into meaningful names means that your objects will be
>>> more independent and have stronger encapsulation properties.
>>>
>>> If your example format is already set in stone, then please
>>> continue with what works. I just felt the floor was open to address
>>> some naming aesthetics for consideration. I can imagine that
>>> writing a parsing engine for XML in Fortran is headache enough, so I
>>> don't want to cause you a migraine on top of it.
>>>
>>> Cheers,
>>>
>>> -Rob
>>>
>>> On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
>>>
>>>> Dear All -
>>>>
>>>> I designed and implemented a subroutine interface to SAC XML
>>>> datasets in the latest release of MacSAC. This message is to make
>>>> you aware of the design ideas for architectural comment. I think
>>>> that it shows the way forward to
>>>> how SAC can move away from from a purely binary data format to one
>>>> that
>>>> embraces current practice in structuring and delivering data.
>>>>
>>>> A test program illustrates the concepts. Here is Fortran source
>>>> code of
>>>> an actual program used for testing during development:
>>>>
>> George Helffrich
>> george at geology.bristol.ac.uk
>>
>>
George Helffrich
george at geology.bristol.ac.uk
More information about the sac-dev
mailing list