Thread: Re: A question of location ID, how to represent empty IDs in XML? (Chad Trabant)

Started: 2014-07-28 20:35:16
Last activity: 2014-07-28 20:35:16
Topics: Web Services
Anthony, I agree entirely

It seems to me that with modern languages a string that is empty or has
1-N spaces is the same thing.....A null string is not the same. So an empty
or blank string is the same, valid location code, and null is undefined or
uninitialized location code.

With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string?...

Just my 0.02EUR = $0.0268

What ever may be used to display the data for the location to the user - '
', '__', '--' seems kind of irrelevant. The empty location code is just
that, an empty string and no program, or programmer, should need to change
an empty string into something else. -- is not an empty string and has a
different value to an empty string.

I can't personally see what the problem is with empty, as opposed to NULL.

Best regards
David

-----Original Message-----
From: webservices-bounces<at>iris.washington.edu
[webservices-bounces<at>iris.washington.edu] On Behalf Of
webservices-request<at>iris.washington.edu
Sent: 28 July 2014 13:06
To: webservices<at>iris.washington.edu
Subject: webservices Digest, Vol 40, Issue 14

Send webservices mailing list submissions to
webservices<at>iris.washington.edu

To subscribe or unsubscribe via the World Wide Web, visit
http://www.iris.washington.edu/mailman/listinfo/webservices
or, via email, send a message with subject or body 'help' to
webservices-request<at>iris.washington.edu

You can reach the person managing the list at
webservices-owner<at>iris.washington.edu

When replying, please edit your Subject line so it is more specific than
"Re: Contents of webservices digest..."


Today's Topics:

1. Re: A question of location ID, how to represent empty IDs in
XML? (Anthony Lomax)
2. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
3. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)


----------------------------------------------------------------------

Message: 1
Date: Mon, 28 Jul 2014 09:59:18 +0200
From: Anthony Lomax <alomax<at>free.fr>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D602D6.7060604<at>free.fr>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Hello all,

Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?

It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit
trim() function hiding in a processing pipeline. A null string is not the
same. So an empty or blank string is the same, valid location code, and
null is undefined or uninitialized location code.

With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.

Just my 0.02EUR = $0.0268

Best regards to all,

Anthony


On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,

Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.

I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.

If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.

If the empty string ID representation is adopted it would would, in
effect, mean that the DMC would need to change its metadata service and
(more importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.

regards,
Chad


----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?

Hi Philip and All,

I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.

Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.

with my best regards,

Marcelo Bianchi
--


2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it
will never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.

It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.

Philip


On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de>
wrote:
Hello Rob,

Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To
quote from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout
.html>
(my language of choice):


"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."

I'd add "Compatible is better than incompatible." :)


Number 2 is especially relevant here:
"Explicit is better than implicit."

My favorite would be:

"Special cases aren't special enough to break the rules."

Quoted whitespace and nulls are painful. Code what you mean, and
mean what you code. It's easier for everyone.

But what if we simply *mean* "empty string"?

The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new
data format cannot accommodate existing channel naming, then the new
format is flawed.
But that's not even the case here...

An XML document that contains

<Channel locationCode="" ...

is not malformed. There's an attribute that *explicitly* contains an
empty string and a parser has to produce it as such. Not as null,
nil or none, but as an empty string. Otherwise the parser is broken
and needs to be fixed, not the data!

Again: It's not about beauty. We all agree that current channel
naming is not particularly beautiful and has limitations. But our
business is not to try to solve that issue now and here.

Cheers
Joachim

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices


--
Sent from my iClayTablet

------------------------------------------------------------------------

*Anthony Lomax*
*161 All?e du Micocoulier, 06370 Mouans-Sartoux, France*
*tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net
<anthony<at>alomax.net> web: http://www.alomax.net
http://www.alomax.net/ *

*Twitter: * *@ALomaxNet http://twitter.com/ALomaxNet* *Science & Special
Topics: * *http://www.alomax.net/science*
*Software: * *http://www.alomax.net/software* *- updates: *
*https://twitter.com/ALomaxNet*
------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://www.iris.washington.edu/pipermail/webservices/attachments/20140728/2
207b17e/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 28 Jul 2014 13:51:27 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D6393F.1050005<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed

Hi Chad

Chad Trabant wrote on 27.07.2014 04:52:
The answer that empty strings are technically possible and it all
works in Python/SeisComP is less than satisfying. The observations
from Python, ObsPy and SeisComP are a few of many that need to be
taken into account.

Please name a few. Not abstract claims or hearsay. Point us to client code
that cannot parse an empty location code; only then someone can take a
closer look at the matter and quite possibly provide help.

Yes, we would need to treat empty location IDs and "--" as synonyms
for a very long time. Empty strings in XML mean you will need to map
empty IDs to empty strings, NULL and whatever an XML parser might or
might not produce for a long time as well (think beyond Python and
SeisComP). Either is possible, only one of them is a unique mapping.

I don't accept the parser issues unless you provide examples; see above.

In general mappings are not the problem and are widely used anyway. Can you
name a single software that when reading (Mini)SEED does *not* map the
location code from " " to ""? Even libmseed does!

So why not be consistent and do the same when parsing XML? It would solve
the current issues. You can then keep your two spaces as long as you like.
;)

If the main considerations are for the least amount of disruption the
the answer is obvious to me: the FDSN can sanction that the two-space
string is the XML synonym for the empty SEED location ID and we adjust
the schema to make sure a string of whitespaces is preserved.
Then SeisComP can change its relatively new StationXML implementation
and ALL existing clients will be compatible with all metadata and,
mostly importantly, we would have consistent metadata.

Chad, this whole discussion started back in early January with your
complaint about the SeisComP fdsnws server implementation. You were alleging
that 'The resulting StationXML includes empty location IDs
(locationCode=??), this is not allowed in SEED and therefore not allowed in
StationXML.' If the SeisComP server were indeed producing wrong XML it would
have been corrected long ago. But that's not the case! It's actually
SeisComP that produces the more correct FDSN StationXML compared to IRIS
XML, not only w.r.t. locationCode.

Don't you think it is now time to roll up the sleeves and make your client
codes work with standard compliant FDSN StationXML rather than doctoring an
FDSN standard?

If the empty string ID representation is adopted it would would, in
effect, mean that the DMC would need to change its metadata service
and (more importantly) all users of the DMC's metadata service would
need to transition to a new metadata channel naming scheme. This is
certainly not out of the question, but it is not something we would do
without careful consideration. I do not find the two-space strings
all that great, but they are here and something the DMC and users of
the DMC have dealt with. Issues have been identified with empty
location IDs by us and our users. If DMC is going to change, and push
the change on all users of the DMC's StationXML, it would be much more
compelling to have a solution that addresses the low level issues.

Did you read my email of Thursday, 18:43 UTC? Following the ideas I outlined
there, you are technically *not* required to change any of your servers.
Only a few client codes are actually affected and even I was able to make
the changes in one of those in 10 minutes. Of course, in total it will take
longer, but if specific problematic cases related to parsing are identified
and discussed, I am sure solutions can be found quickly. We have this list,
we have skilled and enthusiastic people working on this, so why not use this
as a platform even for more technical discussions? Or how about creating a
"developer's corner"
webservices-devel or so?

Cheers
Joachim


------------------------------

Message: 3
Date: Mon, 28 Jul 2014 14:05:30 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D63C8A.8090609<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed

Philip Crotwell [07/25/14 15:35]:
It sounds like you are saying "change is hard, so we shouldn't do it".

That depends very much on the kind of change I would say. The change that is
currently being discussed is a hack that might help XML parser developers,
with hefty repercussions otherwise.

If that is the change, it indeed shouldn't be done.

What I would highly welcome and support is a mature, future-proof channel
naming concept (involving network codes, too!) with a clear implementation
roadmap. There have been attempts in this direction, led by the USGS and the
ISC, but they are not reflected in current FDSN StationXML.

Cheers
Joachim


------------------------------

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices


End of webservices Digest, Vol 40, Issue 14
*******************************************


02:28:46 v.01697673