Hi, Philip.<div><br></div><div>Thanks for all of the info. I'm working on a set of rules on handling such updates and would like your thoughts on them when I'm done. It seems clear that there will always be exceptions, so I think EMERALD should include a way to automatically disseminate corrections when needed. </div>
<div><br></div><div>Incidentally, I'm a big believer in numeric surrogate primary keys on database tables and use them throughout EMERALD.</div><div><br></div><div>Thanks!</div><div><br clear="all"> -- John<br>
<br><br><div class="gmail_quote">On Tue, Jun 14, 2011 at 6:08 PM, Philip Crotwell <span dir="ltr"><<a href="mailto:crotwell@seis.sc.edu">crotwell@seis.sc.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi John<br>
<br>
I have had more than a few headaches along the lines of what you are<br>
describing. There is good news and bad news from my experiences. The<br>
good news is that mostly you can use the network code alone for<br>
permanent networks and network code and begin year for temporary<br>
networks, ie BK and XA2007 are mostly unique and fixed. The bad news<br>
is that even this is only "mostly" a unique identifier. In general I<br>
think the permanent network codes are single and unique and temporary<br>
network codes are issued for a given begin year while they may be<br>
extended, ie end date change, it would be really weird for the begin<br>
date to change.<br>
<br>
You should NOT use the begin date as part of the key for permanent<br>
networks as those have changed over the years. A some point in the<br>
past the begin time for permanent networks was dynamically determined<br>
from the earliest data at the DMC, not sure if that is still the case.<br>
So some networks were in the database with some data and then later<br>
they sent in additional "old" data, causing the begin times to move<br>
backwards. For example BK used to start in the 80s I think, but now<br>
starts in the 30s?<br>
<br>
More bad news is that the AF network (I think I am remembering<br>
correctly), a single permanent network, at some point split into two<br>
networks due to issues related to some data being restricted and some<br>
not. So my software started having real problems because it was coded<br>
to assume that the 2 char network code was unique for permanent<br>
networks and suddenly there were 2 distinct networks (at least at the<br>
software level) with the same code. I think there is work at the DMC<br>
to redo the notion of restricted data so that this bifurcation of that<br>
network will no longer be an issue in the future, but just pointing it<br>
out as an example of how limited the options are for creating a unique<br>
ID based on anything data "in" a network. Basically all fields are<br>
subject to change, meaning nothing can be assured to be a unique id.<br>
Big :(<br>
<br>
I think this is the argument given way back when people were creating<br>
database normalization theories and arguing for meaningless integer<br>
database ids, because any ID based on real world data is subject to<br>
change and so can not be counted on for a good id.<br>
<br>
One more peice of bad news, the same problems that exist in the<br>
network level also exist at the station and channel level, except that<br>
they are even more likely to change.<br>
<br>
I should also say that this is not a fault of the DMC, they don't<br>
control when or how networks make changes to their metadata. But it is<br>
a problem none the less as we simply do not have a globally unique,<br>
non-changing identifier for any of our metadata. You do the best you<br>
can and try to put code in to catch when things change. I have had<br>
very limited success and grumble with regularity about how hard it is<br>
to keep a metadata database in sync with the upstream one. It is just<br>
a really really hard problem with no good solutions as far as I can<br>
see. If you come up with a good answer please, please let me know.<br>
<br>
Good luck...<br>
<font color="#888888">Philip<br>
</font><div><div></div><div class="h5"><br>
On Tue, Jun 14, 2011 at 8:45 PM, Chad Trabant <<a href="mailto:chad@iris.washington.edu">chad@iris.washington.edu</a>> wrote:<br>
><br>
> Got it. The network start/end dates don't change often but on occasion they<br>
> do. I think the most common case is when a temporary network code is<br>
> extended to match an extended experiment time window. The only other useful<br>
> identifier that I can think of is the network description contained in the<br>
> <Description> tags, although that is subject to change as well but also<br>
> doesn't change often. Perhaps by checking the description you can figure<br>
> out when it's the same network versus something new more often than not.<br>
> Chad<br>
> On Jun 14, 2011, at 5:04 PM, John D. West wrote:<br>
><br>
> That was what I assumed from the output of the web service. The question is:<br>
> can a start date or end date EVER change? If an incorrect date is entered<br>
> and then later corrected, I end up with overlapping networks because network<br>
> code + start date + end date combine to form the unique identifier.<br>
> -- John<br>
><br>
><br>
> On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <<a href="mailto:chad@iris.washington.edu">chad@iris.washington.edu</a>><br>
> wrote:<br>
>><br>
>> Hello.<br>
>><br>
>> In general, networks, like stations and channels, have the notion of a<br>
>> start time and an end time. For permanent networks there are normally not<br>
>> breaks in the continuity. For temporary networks there are often blocks of<br>
>> years allocated for specific experiments, for example XY 2005-2006, XY<br>
>> 2007-2009 and XY 2010-2010. We would not consider those temporary networks<br>
>> to be modifications of an existing network, but instead to be logically<br>
>> different networks. Essentially the network code combined with the start<br>
>> and end time uniquely identifies a "network", when the dates change and the<br>
>> network code is recycled it should be considered a new network. Not sure I<br>
>> understood your question, did that help at all?<br>
>><br>
>> Chad<br>
>><br>
>> On Jun 14, 2011, at 2:00 PM, John D. West wrote:<br>
>><br>
>> > Hello.<br>
>> ><br>
>> > I'm using the station webservice in EMERALD to maintain a local cache of<br>
>> > network, station, and component metadata. In the Network level, reuse of<br>
>> > network codes makes it difficult to differentiate between new and modified<br>
>> > networks, e.g., if a network EndDate changes, my system registers it as a<br>
>> > new usage of the network code instead of modification of an existing<br>
>> > network.<br>
>> ><br>
>> > Is there some unique identifier for each network which can be included<br>
>> > in the web service?<br>
>> ><br>
>> > Thanks!<br>
>> ><br>
>> > -- John<br>
>> > _______________________________________________<br>
>> > webservices mailing list<br>
>> > <a href="mailto:webservices@iris.washington.edu">webservices@iris.washington.edu</a><br>
>> > <a href="http://www.iris.washington.edu/mailman/listinfo/webservices" target="_blank">http://www.iris.washington.edu/mailman/listinfo/webservices</a><br>
>><br>
><br>
><br>
><br>
> _______________________________________________<br>
> webservices mailing list<br>
> <a href="mailto:webservices@iris.washington.edu">webservices@iris.washington.edu</a><br>
> <a href="http://www.iris.washington.edu/mailman/listinfo/webservices" target="_blank">http://www.iris.washington.edu/mailman/listinfo/webservices</a><br>
><br>
><br>
</div></div></blockquote></div><br></div>