[webservices] default in bulkdataselect

Philip Crotwell crotwell at seis.sc.edu
Tue Sep 4 07:52:51 PDT 2012


Hi Bruce and Chad

Turns out there was a bug in my code, and my tests of the
bulkdataselect were actually using regular old dataselect, hence the
very similar numbers. :(

However, now that I really am using bulkdataselect, I am finding that
it is slower, by about a factor of 1.5 to 2.5. This seems surprising
given your information that bulkdataselect avoided hitting a disk on
your end. For example here are some numbers asking for IU ANMO 00 BHZ
starting at 2012-09-01T00:00:00:

               dataselect                                       bulk
10 min    0.3761 sec    36.75618 kb/s        0.6533 sec          21.160263 kb/s
1 hour    0.5165 sec   153.64957 kb/s       1.1879001 sec    66.80697 kb/s
10 hour  1.0785 sec   732.03894 kb/s       1.8532999 sec   425.99905 kb/s

Each of these is average of 10 runs, each request separated by 2 seconds.

Any thoughts on why this would be the case? Basically I am trying to
decide which of these two to use by default, and from what I can see
dataselect is the winner.

Of course these are only cases where it is a single channel, so maybe
bulkdataselect becomes a winner when there are many channels and time
windows.

thanks
Philip

On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
<bruce at iris.washington.edu> wrote:
> Philip:
>
> The reason bulkdataselect is faster the second time round probably has to do with the  underlying NFS filesystem.
>
> You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
> archive files.  The first time you did the cat it would take much longer than the second time. The NFS filesystem
> reads the data into a buffer and holds it there for some period of time.
>
> -Bruce
>
> On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
>
>> HI Chad
>>
>> That is really useful information, thanks.
>>
>> I am finding one interesting thing. If I request the same data twice,
>> the first time takes about twice as long as the second. I assume there
>> is some caching going on somewhere in your systems.
>>
>> I have done some experiments here and am finding no difference between
>> bulk and dataselect. The biggest difference is with request time
>> window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
>> hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
>> will be contiguous on your system and there is socket and other
>> overhead.
>>
>> Might do some more playing, but seems a wash from the outside
>> perspective for at least single channel single time window requests.
>>
>> thanks
>> Philip
>>
>> On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad at iris.washington.edu> wrote:
>>>
>>> Hi,
>>>
>>> Yes, the default values are as you guessed.  By default there is no limitation based on segment length, the documentation has been updated.
>>>
>>> Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests:  early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors.  Also, there is a difference for the DMC internally.  In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
>>>
>>> A bit more explanation:
>>> For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services.  The user needs to wait for the data to be extracted and cached.  For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly.  The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up.  For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
>>>
>>> Chad
>>>
>>> On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
>>>
>>>> Hi
>>>>
>>>> What are the default values for minimumlength and longestonly? I am
>>>> guessing 0 and false, but the docs don't say.
>>>> http://www.iris.edu/ws/bulkdataselect/
>>>>
>>>> Also, have you found a performance increase with bulkdataselect over
>>>> dataselect for large miniseed downloads?
>>>>
>>>> thanks
>>>> Philip
>>>> _______________________________________________
>>>> webservices mailing list
>>>> webservices at iris.washington.edu
>>>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>>
>>>
>>> _______________________________________________
>>> webservices mailing list
>>> webservices at iris.washington.edu
>>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices



More information about the webservices mailing list