[sac-dev] Subroutine interface to SAC XML datasets
Brian Savage
savage at uri.edu
Thu Jan 31 06:46:08 PST 2008
George
I think moving away from the original header names might be a good idea.
The header names such at stlo, stla and kstnm can be cryptic upon
first encounter.
I would suggest more readable names
<station latitude="" longitude="" elevation="" name="" network=""></
station>
or
<station>
<latitude></latitude>
<name></name>
....
</station>
Also, would it be possible to include references to the sac binary
files, as they are now, into the xml format.
<sacdataset>
<trace file="PAS.CI.BHZ.SAC" />
<trace file="PAS.CI.BHE.SAC" />
<trace file="PAS.CI.BHN.SAC" />
<trace id="HRV.IU.BHZ.SAC">
<header >
...
</header>
<trace>
...
</trace>
</trace>
</sacdataset>
Which would allow for storing the data in either the current format
(binary) or in the xml file.
Cheers,
Brian
On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:
> Dear All -
>
> The key idea here, which is a good one, is subgroupings of the
> information in the header: 1) station information; 2) event
> information; 3) data characteristics. A fourth item, not well-
> served by the present SAC file structure, is more complete response
> information.
>
> Whether you express header information by <stel>500</stel> or <h
> name="stel">500</h> is a stylistic choice. The DTD description is
> more concise in the latter case.
>
> On 31 Jan 2008, at 09:34, James Wookey wrote:
>
>> Hi Rob, George;
>>
>> I can see the point that the data format currently proposed is a
>> terse one, basically a minimalist description of a set of SAC
>> traces. This does have some significant advantages: it is
>> efficient in file size, and it provides a direct connection to the
>> header variables which, after all, SAC users (as well as
>> programmers) still have to refer to by their short name. I don't
>> think we should go the KML route, which as George says, makes my
>> eyes water with all the detail that is required. As a
>> 'consultation' format for SACML it is well designed, as it is
>> simple to understand and is conceptually very close to the binary
>> SAC format, and George has done sterling work implementing it.
>>
>> However, in the longer term, I can also see the value in a limited
>> expansion of the structure of SACML, if it is going to represent a
>> large step forward in the SAC file format. If we are going to pay
>> the price of adopting a verbose format like XML (and I think we
>> should) we might as well try to reap some of the rewards, and also
>> build enough flexibility into the format to allow incorporation of
>> future things (even if they are currently ignored by the current
>> input routines - the ability to do that is one of the advantages
>> of XML). It seems to me that one thing worth considering is
>> structuring the header. So one possible format might look like:
>>
>> <sacdataset>
>> <trace>
>> <header>
>> <station>
>> <kstnm>TEST</kstnm>
>> <stla>40</stla>
>> ...
>> </station>
>> <event>
>> <evla>-20</evla>
>> ...
>> </event>
>> <trace_info>
>> <delta>0.05</delta>
>> ...
>> </trace_info>
>> </header>
>> <data>
>> ...
>> </data>
>> </trace>
>> </sacdataset>
>>
>> This has the advantage of still being easy to 'one-sweep' read
>> with event-driven parsers like SAX (because you simply ignore the
>> container elements), plus providing a more object-oriented format
>> for use with parsers like DOM or xpath. We might also want to
>> include/allow a subgrouping of traces within the file: a
>> <tracegroup> container element for example.
>>
>> Cheers,
>>
>> James
>>
>> On 31 Jan 2008, at 08:38, George Helffrich wrote:
>>
>>> Dear Rob -
>>>
>>> The XML DTD is versioned, and one could imagine defining a new
>>> DTD with alternative element groupings that would reflect the
>>> data structure. One can obsess in describing data details, and
>>> one view won't necessarily coincide with another person or
>>> community's view. Google Earth's KML comes to mind --
>>> unbelievably baroque for putting points on a map for a
>>> seismologist, but probably glibly expressive for GISers.
>>>
>>> Another view to take of the data is programming semantics,
>>> however. Programmers see 1) header variables that are peeked and
>>> poked at; 2) data. That was the view I took of the present DTD
>>> definition.
>>>
>>> On 31 Jan 2008, at 00:12, Robert Casey wrote:
>>>
>>>>
>>>> Hi George-
>>>>
>>>> An interesting effort with SAC XML and you've made a lot of
>>>> progress from the looks of it. I hate to comment too harshly on
>>>> something that may already be an established standard, so my
>>>> comments are only meant as an observation:
>>>>
>>>> It seems to me that the SAC XML format only half-divorces
>>>> itself from fixed-format files due to the naming scheme for the
>>>> header elements you've provided. Essentially, you've got just
>>>> two entity names inside of <trace> that have no semantic quality
>>>> to them: 'h' and 'd'.
>>>>
>>>> The nature of XML is such that the entities tend to me more
>>>> grouped and self descriptive as far as their names go. So
>>>> instead of having <h name="STEL">, why could it not instead be
>>>> <STEL> ? To indicate this as a subcomponent of a SAC header of
>>>> a SAC trace, you'd form a hierarchy:
>>>>
>>>> <sacdataset>
>>>> <trace>
>>>> <header>
>>>> <stel>
>>>>
>>>> Even better would be to just call it <elevation>, maybe with a
>>>> reference to its SAC field abbreviation as an attribute:
>>>> <elevation id="STEL">.
>>>>
>>>> The reason for the comment is not just for human readability,
>>>> but for the notion that many XML parsers will treat such
>>>> elements as objects, carrying the element name with them.
>>>> Having your fields broken down into meaningful names means that
>>>> your objects will be more independent and have stronger
>>>> encapsulation properties.
>>>>
>>>> If your example format is already set in stone, then please
>>>> continue with what works. I just felt the floor was open to
>>>> address some naming aesthetics for consideration. I can imagine
>>>> that writing a parsing engine for XML in Fortran is headache
>>>> enough, so I don't want to cause you a migraine on top of it.
>>>>
>>>> Cheers,
>>>>
>>>> -Rob
>>>>
>>>> On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
>>>>
>>>>> Dear All -
>>>>>
>>>>> I designed and implemented a subroutine interface to SAC XML
>>>>> datasets in the latest release of MacSAC. This message is to
>>>>> make you aware of the design ideas for architectural comment.
>>>>> I think that it shows the way forward to
>>>>> how SAC can move away from from a purely binary data format to
>>>>> one that
>>>>> embraces current practice in structuring and delivering data.
>>>>>
>>>>> A test program illustrates the concepts. Here is Fortran
>>>>> source code of
>>>>> an actual program used for testing during development:
>>>>>
>>> George Helffrich
>>> george at geology.bristol.ac.uk
>>>
>>>
> George Helffrich
> george at geology.bristol.ac.uk
>
> _______________________________________________
> sac-dev mailing list
> sac-dev at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/sac-dev
>
More information about the sac-dev
mailing list