![]() |
![]() |
||||||||||
| You are here: IRIS > Software and Manuals > NetDC |
| RELEASE NOTES FOR NETWORKED
DATA CENTERS (NetDC) Robert Casey INTRODUCTION This is perhaps the first major revision of NetDC since it has first been put into circulation. During the initial stages of beta testing, the automatic route table updating system was dropped after experiencing bugs and participating sites made efforts to write code to interface their information systems to NetDC. Since that time, sufficient testing of the initial system has demonstrated NetDCs potential to be fast and stable, while illustrating shortcomings in the abilities offered by initial implementation. It is the attempt to remedy these shortcomings as well as to increase NetDCs utility that version updates such as this will occur. This document is intended to put to words the changes made to this version of NetDC as well as a guide for upgrading code at an installation site. Those who are newly installing NetDC with the latest version will get a more complete picture by getting the updated Technical Manual from IRIS DMC, which can be requested by phone (USA 206-547-0393) or email (webmaster@iris.washington.edu) free of charge. There is also online documentation for administrators and users alike, which can currently be found at www.iris.washington.edu by following the Manuals anchor in the left sidebar. The documentation will be labeled as to the version it refers to, should the latest update not be immediately available. LIST OF CHANGES From Initial Release to v1.0.3
Now data request lines should interpret the dash character - as explicitly meaning the SPACE (ASCII 32) character for a channels location identifier. SPACE-SPACE (two spaces) is the default location identifier for a channel when no specific location identifier is listed. One dash and two dashes are interpreted as meaning the same thing, since a trailing space is assumed for single-character location identifiers. This change is mostly reflected in the process_*.c source code, which implies that all installing sites need to code implementation of this convention on their own. IRIS Data Management Center is implementing this interpretation and the PDCC (Portable Data Collection Center) interface is also implementing this interpretation. There are currently a couple of issues that still need to be worked out at IRIS DMC with regards to this:
The size in kilobytes of each data product shipped from a given site is recorded in netdc_activity.log.
To prevent NetDC from falling into an email cascade, the code contains a check to see of the sending user put a NetDC email address in the .EMAIL line. This initially defaulted to "netdc@". However, some sites have special account names for NetDC operations, so it was decided to expand email blocking to a customizable, space-separated list of email names that the software could check against. If a match comes up, an error is reported and processing is halted. This feature can be exercised through the MAIL_BLOCK variable in the Makefile.
It was decided that too much information was coming back when a request for information on data centers was requested. Therefore, the format of the output was condensed to include just the following fields: DC_NAME EMAIL DATA_CENTER_NAME ADDRESS CONTACT PHONE CONTACT_EMAIL Redundant data center output is now filtered out, so users should get just a single entry for each data center when requesting a list of data centers. A routine called dclist.c was created to perform the function of reporting the data center information back, so this replaces what was formally in process_inventory.c. process_inventory.c now just makes a call to the function in dclist.c to display the data center information. An example can be found in the code provided in the new release. (Download current routing table here)
Request routing now follows these rules:
This is primarily an IRIS DMC concern and represents changes to the IRIS DMC and PDCC iterations of the process_*.c code. When inventory data was requested from IRIS DMC from midnight of day one to midnight of day two, it was found that the user didnt get just a waveform listing for the day in that time span, but instead also got a listing for the previous day and the day after. This was due to the fact that IRIS DMC tends to break its data listings from midnight to midnight, where the semantics are an open interval (the times are between start and end). The NetDC code was doing comparisons as if it were a closed interval, so it was returning matches to the end time boundary of the previous day listing to the very second, as well as the start time boundary of the following day. This has since been corrected so that a match of requested time span Rs to Re will be true for any data time span Ds to De when: Rs < De AND Re > Ds
In numerous places, redundancies in the code were eliminated in the interest of maintainability and extensibility. Some of the code was pulled out of one source and placed in its own source file. The use of a static structure for holding and passing request information between functions was eliminated in favor of a more flexible hash list. An important bug was removed just before code release where NetDC would fail to process considerably large requests. This was due to the code not closing all files that it had opened while inside of the processing loop. The program would halt because too many files were open. Some implementation of external file pointers was also implemented to minimize open and close cycles during request processing.
In favor of predictable filename tags, NetDC will now create tar files for DATA requests only when data merging is requested. If data merging is not requested, data files will arrive from each data center individually, and not as a tar file.
The general rule for ASCII data such as from Inventory and Response requests was to always send the data by email, and to fall back to FTP only if the output was too large. However, if the user requested that FTP data be PUSHed to them, NetDC did not comply but instead fell back to PULL mode. The code has been changed to PUSH large ASCII shipments when specified in the DISPOSITION of the request.
To make it easier to monitor NetDC activity, the default is for NetDC to log activity with local time. A macro has been included in the netdc_request.h file to allow the installer to choose between local time and UTC time in the log messages.
NetDC logs merge activity in netdc_activity,log as it searches for files from multiple sites.
Email sent to users by NetDC contains the version of NetDC being at that site. This will help future tracking of release versions installed at different sites. CORRECTIONS TO TECHNICAL MANUAL For Release Edition dated July 15, 1999 The following are changes for v1.0.3 Edition Starting at the bottom of page 13, change to: NETWORK is the FDSN network code for the data requested, consisting of one or two characters, which may contain wildcards. This field may also contain multiple network code identifiers, which are space-separated and enclosed in double-quotes. An example multiple network string would be: "IU II G?" Network code examples can be found at http://www.iris.washington.edu/stations/networks.txt. STATION is a station name up to five characters in length. This name refers to a geographic location and is equivalent to the station identifier in SEED format. Like the network code field, the station identifiers may be wildcarded, and multiple station names may be requested, space-separated and within quotes. LOCATION is a field that allows users to request data from specific data streams on the instrument identified by network and station. This is in the form of a one or two character string, referring to the location identifier in SEED format. This field may also contain multiple specifiers and each may be wildcarded. The default location identifier is two SPACE characters, when explicit location identifiers are not used, but since spaces are treated as separators in NetDC requests, the convention has been adopted to use a dash - to represent the default location identifier. CHANNEL is a string describing the channel to be retrieved from a given station. Channel names are three characters in length and follow SEED channel-naming conventions. The channel specifiers may be wildcarded and contain multiple entries when space-separated within quotes.
On page 19, third paragraph, change to: There is usually just one PRIMARY site for a given network code. However, there can potentially be many PRIMARY sites in the routing table in cases where it is uncertain which site can best process data for a certain network. Any and all matching PRIMARY sites will receive a copy of the request line for processing. Should there be no PRIMARY sites listed, then any and all SECONDARY sites that match will receive a copy of the request line. The concept is that the PRIMARY sites are considered best-suited to process the request, and the SECONDARY sites represent a fallback with lower probability of successful fulfillment.
On the bottom of page 26 and the top of page 27, change to: In this first example, a query for available data centers comes back with a report with lines extracted from the local routing table. A routine called dclist.c produces the output and is called from process_inv.c. The query is made by placing a wildcard in the DC_NAME field, with no other request fields specified. The output consists of a list of participating data centers with their vital contact information, and the data fields are contained within double-quotes and are space-separated. The field names are indicated just below the title. The <cr> pertains to where a carriage return would be. .INV *<cr> (Note: the other examples should have the reported .INV line followed by two carriage returns as well.) At the top of page 28, the sentence should read: It should be noted that a user wishing to see channel information needs to fill both the location identifier and channel fields in the data request line, as shown in the example below. After the last paragraph on page 29, append the following: If the user has requested .DISPOSITION PUSH, the inventory data in FTP will be pushed to the user as requested. After figure 8.1 on page 33, add the following: Note that tar bundling will only occur if the user has requested that shipment data be merged. If merging has not been requested, then the individual files the user receives will not be in tar format. If merging has been requested, then the return file will be in tar format, regardless of the number of data files merged. The tar file can be identified by the .tar tag attached to the end of the shipment file. For the paragraph on page 34, change to: If any information is needed on the particulars of SEED volumes, please contact the IRIS Data Management Center at (206) 547-0393 or consult the IRIS DMC web site at www.iris.washington.edu. On the paragraph on page 37 that starts with "If a label ", change to: If a label was not provided by the user, the NetDC system generates a random tag and uses that as a label for the shipment. The labeling feature is meant to add a more personal and less cryptic feel to the files that the user receives. Note that if the file is in tar format, containing merged data, the filename pattern will have a .tar tag appended to it. Once the users requested data has been created in FTP, the contents are either sent to the user through email or the user is notified that the data can be retrieved or has been pushed through FTP. Add a new subchapter after Chapter 10.0: 10.5 PORTABLE DATA COLLECTION CENTERSThis subchapter falls under the umbrella of NetDC installation to describe the pre-constructed interface code written for IRISs Portable Data Collection Centers (PDCC) software. PDCC was designed to make it easy for a site to set up a data distribution center from a prepackaged set of database management, analysis, and data product construction tools. NetDC is provided with a set of source code to directly interface with the current version of PDCC, effectively eliminating the need for the installer to write their own interface code to provide data through NetDC. Included in the netdc_req source directory are files specially designed for PDCC. They include: Makefile_pdcc pdcc.h process_data_pdcc.c process_inv_pdcc.c process_resp_pdcc.c process_supp_pdcc.c Installation is performed much like before, with the following changes:
make f Makefile_pdcc After you populate the PDCC database with data, you can begin testing your NetDC installation to see that all features are working properly. Additional information on PDCC can be found at IRIS DMCs web site (www.iris.washington.edu). NOTES ON UPGRADING NetDC TO v1.0.3 The tar file containing the NetDC code will contain the complete installation, but administrators should be careful to not overwrite customized code when they copy the new version in. The safest approach is to save all source code from the older version in a separate directory before installing the new code. Another safe approach is to un-tar the new code into a separate directory to where the current installation is. The code to be most concerned with deleting in the current installation are the process_*.c files and the Makefile. This does not say that there are not other source code files that have been modified for special needs as well. KEEP A BACKUP OF YOUR CURRENT WORK. In general, all files should be replaced when adding in the new code, since a lot of code gets modified from version to version. However, new features that are added to process_inv.c, process_resp.c, and process_data.c can be harder to incorporate, and almost always require manually patching into the current code those new features. An easy way to see things that have changed is to perform a diff on similar source files to look for changes in technique. With the current problem of having to interface with different information systems, there is not a way around this. Future versions should hopefully address this specialization problem to make upgrading much easier and minimize custom work that needs to be done. |
About
IRIS
| Members | Programs
| USArray | Seismic
Monitor | Earthquakes | SeismoArchives |
|||
Send comments to the |