[webservices] A question of location ID, how to represent empty IDs in XML?

Chad Trabant chad at iris.washington.edu
Tue Aug 12 17:45:22 PDT 2014


On Aug 12, 2014, at 4:12 PM, Doug Neuhauser <doug at seismo.berkeley.edu> wrote:

> On 08/12/2014 02:31 PM, Chad Trabant wrote:
>> 
>> Hi Doug,
>> 
>> Thanks for your 2 cents.
>> 
>> Regarding only certain software being the problem with blank
>> location, I guess you did not like any of the others pointed out
>> here?
>> http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
> 
> Most of these arguments are not related directly to stationxml, but to the
> empty location code.  However, those that are related to empty location code
> appear to be the inability to distinguish between an attribute that is not
> supplied vs an empty string attribute.  If you make the LocationCode optional,
> it seems like you are in the same boat.  If it is not specified, what do
> you use for location code?  blank-blank?  Then that is the same logic you
> use if your query does not return a location code.

I completely agree, mostly the same boat.  The points were: a) empty IDs have challenges that are not limited to any esoteric software and b) if an value is unset why represent it with anything at all, playing devils advocate: why is location special?

> Your example of:
> 	%{net}{sta}{loc}{chan} = "some lvalue"
> is not a good one, because of no separation between components.
> How do you distinguish between
> 	net = G, sta = ABCD
> and	net = GA, sta = BCD?

Those are completely distinct values in a nested hash (they are not a concatenated string).  {G}{ABCD} is a different path than {GA}{BCD}.

>> If you want a non-Oracle database example, the 'ltree' data type in
>> Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
>> take a blank identifier either.  I do not see how the number of pain
>> points with empty identifiers will not grow over time.
>> 
>> The proposal for a "--" location ID was to change SEED by starting
>> with StationXML as a transition. The first step could be done without
>> changing all the miniSEED in all the archives, the next step could be
>> done with a future revision in miniSEED. This would required mapping,
>> which we are already doing for requests and will continue to do
>> indefinitely. For sure this would be non-trivial change over time,
>> the question is whether it is worth it or not.
> 
> I don't see anything in your original proposal about changing SEED.
> I only see a proposal to change the SEED representation in StationXML.

Indeed the 2nd step was not explicitly described, it would be another proposal.

>> If we are going to continue to shoot ourselves in the foot with unset
>> location IDs let's do so with clear eyes, the problems are not
>> limited to esoteric software or use cases. Also, a blank string is
>> not the only choice, more on that next.
> 
> I agree with the above statement.  However, trying to address the issue
> just within StationXML I think is just another bandaid, and I don't see
> why the StationXML needs this bandaid.

Well, the "bandaid" would be a first step away from empty location IDs.  You agree they are problematic but the solution is not radical enough?  You prefer all the changes proposed at once, fair enough.

> Since StationXML does not appear to need this bandaid, I don't understand
> the need.
> 
> IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

It depends on what you mean by SEED.  StationXML IS SEED for most intents and purposes.  Changing all aspects of SEED at once is a much larger can of worms, and this would be an opportune time to change just the StationXML representation of SEED.  Over the next couple of months and years as folks convert from dataless SEED to StationXML, an opportunity exists to make such low level changes, which will get much harder once the adoption is farther along.

Chad

> - Doug N
> 
>> Chad
>> 
>> PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs.  The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too.  Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
>> 
>> On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug at seismo.berkeley.edu> wrote:
>> 
>>> I've been following this thread, and thought it was time to chime in.
>>> 
>>> IMHO, the FDSN web services should follow the SEED convention.
>>> The SEED convention states that station, network, channel, and location
>>> are all blank-padded fields of fixed lengths.
>>> To me, this means that that we should either use the full blank-padded
>>> fields for ALL of these identifiers, or for none of them.
>>> 
>>> eg:
>>> 
>>> <Network code="G " >
>>> <Station code="KIP  ">
>>> <Channel locationCode="  " code="BHZ">
>>> 
>>> or
>>> 
>>> <Network code="G" >
>>> <Station code="KIP">
>>> <Channel locationCode="" code="BHZ">
>>> 
>>> Personally I think the latter (blank trimmed) is better.
>>> 
>>> I agree that the blank location code is a pain when dealing with
>>> Oracle, white-space delimited fields such as command lines, etc,
>>> but unless we change the SEED convention, I don't see that making
>>> an aliases of "-" or "--" in FDSN station XML improves the situation.
>>> 
>>> AFAIK, the ONLY reason that we struggle with the two-blank issue is
>>> that certain software (eg Oracle) cannot distinguish between the
>>> the empty string (string of length 0) and NULL.  Therefore, the DMC,
>>> NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
>>> for the location code.
>>> 
>>> Unless we propose to change the SEED standard, all of our data in
>>> our archives, and all of our current acquisition systems, I think
>>> that we have to live with "emtpy" location codes.
>>> 
>>> I have not seen any compelling argument for representing a blank (empty)
>>> location code in FDSN station XML as anything but the empty string.
>>> 
>>> If you want to have "" and "  " be equivalent in FDSN station XML,
>>> you can simply change the schema definition of the field to be a "token"
>>> rather than a "string", in which case any representation with blanks will
>>> be reduced to the empty string.  Problem solved?
>>> 
>>> I note that the NCEDC implementation currently uses 1 blank " "
>>> for empty location code.  I have no problem changing this if we can
>>> agree on a convention.
>>> 
>>> I also note ironically that the TA network run by IRIS is one of the
>>> largest networks in terms of stations, and uses blank location codes.
>>> 
>>> My 2 cents...
>>> 
>>> - Doug N
>>> 
>>> On 07/23/2014 10:30 AM, Chad Trabant wrote:
>>>> 
>>>> Hello WS users and developers,
>>>> 
>>>> A recent discussion between FDSN data centers is centered on
>>>> representation of empty location IDs in StationXML, the default
>>>> format returned by the fdsnws-station web service. The DMC may be
>>>> changing how it represents location ID in XML and text formats based
>>>> on these discussions. We are asking for input as any such change will
>>>> effect users of our metadata service.
>>>> 
>>>> Some background: In the SEED channel naming scheme there is a
>>>> hierarchy of network, station, location and channel identifiers. Of
>>>> these, it is only the location ID that is commonly accepted to be
>>>> empty. In the SEED format the location ID is a two-character field,
>>>> where the value is left justified and padded with spaces if needed.
>>>> When the value is empty the field is simply two spaces of padding.
>>>> 
>>>> Historically, and presumably to avoid having an empty location ID,
>>>> the DMC has represented “empty” location IDs as a string of two
>>>> spaces. Following this practice, we express this in StationXML by
>>>> setting the locationCode attribute to a string of two spaces. We have
>>>> done this so long we sometimes forget that it is not compliant with a
>>>> strict reading of SEED, at best it falls into the vagaries of SEED,
>>>> on the other hand we have been doing it for years with no apparent
>>>> problems (in fact it has helpfully avoided an empty core
>>>> identifier).
>>>> 
>>>> There now exists another fdsnws-station implementation that returns
>>>> StationXML with the locationCode attribute set to an empty string
>>>> when the SEED value is empty. The justification is that this follows
>>>> the SEED rules of trimming the padding spaces from the values.
>>>> 
>>>> Unfortunately this means there are now flavors of StationXML that are
>>>> incompatible in the core channel name identifiers. In other words,
>>>> two StationXML documents for the same SEED channel appear, without
>>>> extra field translation, to be different channels.
>>>> 
>>>> As most of you are users of SEED and StationXML metadata (at some
>>>> level) and some of you have written code to parse these formats and
>>>> manage the data returned by the DMC and other FDSN data centers, we
>>>> are asking for your input regarding the potential solutions.
>>>> 
>>>> Here are the options being considered for mapping an empty location
>>>> ID in SEED to StationXML:
>>>> 
>>>> 1) Set locationCode to two spaces. While the DMC and users have been
>>>> using this for a long while, it is not precisely the SEED value (but
>>>> the mapping could be formalized). Also, whitespace in attributes does
>>>> have some theoretical challenges: the wonky rules for XML attributes
>>>> related to whitespace handling require removal of spaces in some
>>>> cases (we have never heard of problems though).
>>>> 
>>>> 2) Set locationCode to an empty string. This would match the strict
>>>> value present in SEED, an empty identifier.
>>>> 
>>>> 3) Set locationCode to “--“ (two dashes). This avoids issues with
>>>> whitespace in XML attribute values and avoids issues with an empty
>>>> identifier. Also, this matches the request mechanisms where “--“ is
>>>> accepted as a synonym for an empty location ID.
>>>> 
>>>> All of these solutions are viable in that we can make them work in
>>>> code, it is a matter of choosing one for future FDSN metadata, pick
>>>> your poison so to speak.
>>>> 
>>>> In my personal opinion, an empty location ID is an unfortunate quirk
>>>> of SEED that we should rectify in StationXML. An empty identifier can
>>>> be confused for “unknown” if the programmer is not careful, which is
>>>> semantically very different than “set to empty”. The two-space
>>>> strings that the DMC is currently using are also not ideal, they are
>>>> hard for humans to read and potentially weird with XML rules. The
>>>> dashed location ID avoids these issues but requires the most change.
>>>> I also think requiring all readers of StationXML to translate (e.g.
>>>> remove padding) is a bad idea, the values in SEED should be uniquely
>>>> mapped to values in StationXML.
>>>> 
>>>> Thanks for reading this far.  Your opinion and input is appreciated.
>>>> 
>>>> regards,
>>>> Chad
>>>> 
>>>> 
>>>> _______________________________________________
>>>> webservices mailing list
>>>> webservices at iris.washington.edu
>>>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>>> 
>>> 
>>> --
>>> ------------------------------------------------------------------------
>>> Doug Neuhauser                  University of California, Berkeley
>>> doug at seismo.berkeley.edu        Berkeley Seismological Laboratory
>>> Office: 510-642-0931            215 McCone Hall # 4760
>>> Fax:    510-643-5811            Berkeley, CA  94720-4760
>>> Remote: 530-752-5615 (Wed,Fri)
>>> 
>>> 
>>> _______________________________________________
>>> webservices mailing list
>>> webservices at iris.washington.edu
>>> http://www.iris.washington.edu/mailman/listinfo/webservices
>> 
>> 
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>> 
> 
> -- 
> ------------------------------------------------------------------------
> Doug Neuhauser                  University of California, Berkeley
> doug at seismo.berkeley.edu        Berkeley Seismological Laboratory
> Office: 510-642-0931            215 McCone Hall # 4760
> Fax:    510-643-5811            Berkeley, CA  94720-4760
> Remote: 530-752-5615 (Wed,Fri)
> 
> 
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.iris.washington.edu/pipermail/webservices/attachments/20140812/99174a52/attachment-0001.html>


More information about the webservices mailing list