[sac-dev] Subroutine interface to SAC XML datasets

George Helffrich george at gly.bris.ac.uk
Thu Jan 31 07:04:48 PST 2008


Dear Brian -

	Indeed, header field names are cryptic and they could be improved on.  
With a clearer idea of the intended users and/or viewers of the 
elements, the name scheme should clarify.

	I'm not sure about <trace file="PAS.CI.BHN.SAC" />, but it is an 
interesting idea.  It is analogous to a Unix symbolic link, with 
similar semantic confusion (stat/lstat).  Here it raises the issues of 
1) whether a dataset is entirely self-contained; and 2) what it means 
to "write" a dataset that contains links as well as trace data.

On 31 Jan 2008, at 14:46, Brian Savage wrote:

> George
>
> I think moving away from the original header names might be a good 
> idea.
> The header names such at stlo, stla and kstnm can be cryptic upon 
> first encounter.
> I would suggest more readable names
> <station latitude="" longitude="" elevation="" name="" 
> network=""></station>
> or
> <station>
> 	<latitude></latitude>
> 	<name></name>
> 	....
> </station>
>
> Also, would it be possible to include references to the sac binary 
> files, as they are now, into the xml format.
> <sacdataset>
> 	 <trace file="PAS.CI.BHZ.SAC" />
> 	 <trace file="PAS.CI.BHE.SAC" />
> 	 <trace file="PAS.CI.BHN.SAC" />
> 	 <trace id="HRV.IU.BHZ.SAC">
> 		<header >
> 			...
> 		</header>
> 		<trace>
> 			...
> 		</trace>
> 	</trace>
> </sacdataset>
> Which would allow for storing the data in either the current format 
> (binary) or in the xml file.
>
>
> Cheers,
> Brian
>
> On Jan 31, 2008, at  4:59 AM , George Helffrich wrote:
>
>> Dear All -
>>
>> 	The key idea here, which is a good one, is subgroupings of the 
>> information in the header:  1) station  information; 2) event 
>> information; 3) data characteristics.  A fourth item, not well-served 
>> by the present SAC file structure, is more complete response 
>> information.
>>
>> 	Whether you express header information by <stel>500</stel> or <h 
>> name="stel">500</h> is a stylistic choice.  The DTD description is 
>> more concise in the latter case.
>>
>> On 31 Jan 2008, at 09:34, James Wookey wrote:
>>
>>> Hi Rob, George;
>>>
>>> I can see the point that the data format currently proposed is a 
>>> terse one, basically a minimalist description of a set of SAC 
>>> traces. This does have some significant advantages: it is efficient 
>>> in file size, and it provides a direct connection to the header 
>>> variables which, after all, SAC users (as well as programmers) still 
>>> have to refer to by their short name. I don't think we should go the 
>>> KML route, which as George says, makes my eyes water with all the 
>>> detail that is required. As a 'consultation' format for SACML it is 
>>> well designed, as it is simple to understand and is conceptually 
>>> very close to the binary SAC format, and George has done sterling 
>>> work implementing it.
>>>
>>> However, in the longer term, I can also see the value in a limited 
>>> expansion of the structure of SACML, if it is going to represent a 
>>> large step forward in the SAC file format. If we are going to pay 
>>> the price of adopting a verbose format like XML (and I think we 
>>> should) we might as well try to reap some of the rewards, and also 
>>> build enough flexibility into the format to allow incorporation of 
>>> future things (even if they are currently ignored by the current 
>>> input routines - the ability to do that is one of the advantages of 
>>> XML). It seems to me that one thing worth considering is structuring 
>>> the header. So one possible format might look like:
>>>
>>> <sacdataset>
>>>    <trace>
>>>       <header>
>>>          <station>
>>>            <kstnm>TEST</kstnm>
>>>            <stla>40</stla>
>>> 	   ...
>>>          </station>
>>> 	 <event>
>>>             <evla>-20</evla>
>>>             ...
>>>          </event>
>>>          <trace_info>
>>>             <delta>0.05</delta>	
>>>             ...
>>>          </trace_info>
>>>       </header>
>>>       <data>
>>>          ...
>>>       </data>
>>>    </trace>
>>> </sacdataset>
>>>
>>> This has the advantage of still being easy to 'one-sweep' read with 
>>> event-driven parsers like SAX (because you simply ignore the 
>>> container elements), plus providing a more object-oriented format 
>>> for use with parsers like DOM or xpath. We might also want to 
>>> include/allow a subgrouping of traces within the file: a 
>>> <tracegroup> container element for example.
>>>
>>> Cheers,
>>>
>>> James
>>>
>>> On 31 Jan 2008, at 08:38, George Helffrich wrote:
>>>
>>>> Dear Rob -
>>>>
>>>> 	The XML DTD is versioned, and one could imagine defining a new DTD 
>>>> with alternative element groupings that would reflect the data 
>>>> structure.  One can obsess in describing data details, and one view 
>>>> won't necessarily coincide with another person or community's view. 
>>>>  Google Earth's KML comes to mind -- unbelievably baroque for 
>>>> putting points on a map for a seismologist, but probably glibly 
>>>> expressive for GISers.
>>>>
>>>> 	Another view to take of the data is programming semantics, 
>>>> however.  Programmers see 1) header variables that are peeked and 
>>>> poked at; 2) data.  That was the view I took of the present DTD 
>>>> definition.
>>>>
>>>> On 31 Jan 2008, at 00:12, Robert Casey wrote:
>>>>
>>>>>
>>>>> 	Hi George-
>>>>>
>>>>> 	An interesting effort with SAC XML and you've made a lot of 
>>>>> progress from the looks of it.  I hate to comment too harshly on 
>>>>> something that may already be an established standard, so my 
>>>>> comments are only meant as an observation:
>>>>>
>>>>> 	It seems to me that the SAC XML format only half-divorces itself 
>>>>> from fixed-format files due to the naming scheme for the header 
>>>>> elements you've provided.  Essentially, you've got just two entity 
>>>>> names inside of <trace> that have no semantic quality to them: 'h' 
>>>>> and 'd'.
>>>>>
>>>>> 	The nature of XML is such that the entities tend to me more 
>>>>> grouped and self descriptive as far as their names go.  So instead 
>>>>> of having <h name="STEL">, why could it not instead be <STEL>  ?  
>>>>> To indicate this as a subcomponent of a SAC header of a SAC trace, 
>>>>> you'd form a hierarchy:
>>>>>
>>>>> <sacdataset>
>>>>> 	<trace>
>>>>> 		<header>
>>>>> 			<stel>
>>>>>
>>>>> 	Even better would be to just call it <elevation>, maybe with a 
>>>>> reference to its SAC field abbreviation as an attribute:  
>>>>> <elevation id="STEL">.
>>>>>
>>>>> 	The reason for the comment is not just for human readability, but 
>>>>> for the notion that many XML parsers will treat such elements as 
>>>>> objects, carrying the element name with them.  Having your fields 
>>>>> broken down into meaningful names means that your objects will be 
>>>>> more independent and have stronger encapsulation properties.
>>>>>
>>>>> 	If your example format is already set in stone, then please 
>>>>> continue with what works.  I just felt the floor was open to 
>>>>> address some naming aesthetics for consideration.  I can imagine 
>>>>> that writing a parsing engine for XML in Fortran is headache 
>>>>> enough, so I don't want to cause you a migraine on top of it.
>>>>>
>>>>> 	Cheers,
>>>>>
>>>>> 	-Rob
>>>>>
>>>>> On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
>>>>>
>>>>>> Dear All -
>>>>>>
>>>>>> 	I designed and implemented a subroutine interface to SAC XML 
>>>>>> datasets in the latest release of MacSAC.  This message is to 
>>>>>> make you aware of the design ideas for architectural comment.  I 
>>>>>> think that it shows the way forward to
>>>>>> how SAC can move away from from a purely binary data format to 
>>>>>> one that
>>>>>> embraces current practice in structuring and delivering data.
>>>>>>
>>>>>> 	A test program illustrates the concepts.  Here is Fortran source 
>>>>>> code of
>>>>>> an actual program used for testing during development:
>>>>>>
>>>> 								George Helffrich
>>>> 								george at geology.bristol.ac.uk
>>>>
>>>>
>> 								George Helffrich
>> 								george at geology.bristol.ac.uk
>>
>> _______________________________________________
>> sac-dev mailing list
>> sac-dev at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/sac-dev
>>
>
> _______________________________________________
> sac-dev mailing list
> sac-dev at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/sac-dev
>
								George Helffrich
								george at geology.bristol.ac.uk



More information about the sac-dev mailing list