NetworkFasterThanDisk

    Table of contents
    No headers

     

     

  • have the source data on the network
  • do processing locally, without copying to local drive first
  • even reading email and surfing the net while numbers crunch in the background can slow things down markedly.
  • put finished data back on the network as part of the process (i.e. don't copy)
  • For example, we want to take 120 cded .dem files and mosaick them into a single compressed raster. The old way would be: a) copy to local workspace, b) mosaick, c) compress, d) copy final output back to server. The new and improved method is:

     

    mosiack \\server\data\cded\*.dem  d:\giswork\cded\bigassfile.tif
    compress bigassfile.tif \\server\data\mosaicks\bigassfile.tif
    

    Still don't believe me? I don't blame you, I didn't either. What follows is both the theory and the proof in the form of an edited email dialogue between myself and Ed McNierney of http://TopoZone.com/ on the gdal-dev mailing list.

    -- MattWilkie - 27 Oct 2005

    The opening remark, note the fine display of ignorance and delusion.

    Hello,

    In case anyone wonders about the relative performance of gdal on Windows vs Linux:

    Test case: gdal_merge of 154 dems in ArcInfo GRID format totalling 1gb.

    Windows, from local disk to local disk: 1hr 15min Linux, from network to local disk: 15 min

    linux machine is 450MHz 1Ghz & 768mb ram, Windows is 3Ghz & 2Gb ram, both dual xeons

    cheers,

    -- matt wilkie

    Matt -

    I guess you've proven the benefit of translating from network to local disk! There is no particular reason (given the limited info you gave) to think that the OS makes much if any difference.

    Was the Windows copy from one local disk to the same physical local disk? And was that an IDE disk?

    - Ed

    yes same disk, same directory for that matter, and the disk is 15k scsi (on both machines). -M

     

    Matt -

    But you said the Linux test was network to local disk, while the Windows test was local disk to local disk. That's a HUGE difference. If you're reading and writing from the same disk, you're constantly doing track-to-track seeks, jumping back and forth between the input and the output. In the network scenario you have two disk subsystems working simultaneously; not only is one writing while the other is reading, but each can linearly read (or write) the data with no intervening seeks. Since track-to-track seeks are the slowest things a disk can do, that's a very significant difference, particularly for a test like this where the data is read and written in small bites.

    - Ed

    ...and flies in the face of everthing I know, or perhaps I should say think I know, about the relative performance of networks vs disks. This isn't the first time I've been surprised by a network process beating a disk based one. I had always thought there was something wrong with our machine configurations, in spite of my never being able to find anything out of wack.

    Our standard operating procedure on big projects is, and has been for many years, 1) copy to local, 2) Do It, 3) copy to server. Maybe it's time to take a closer look at that. :)

    thanks for your thoughts, --M

     

    Matt -

    No, those are not good assumptions. Here's some food for thought, based on the specs for a Seagate Cheetah 15K SCSI disk drive. You may have a different model but all 15K SCSI drives are about the same.

    Your Ultra320 SCSI interface can move 320Mbytes/sec, but let's call that 200Mbytes/sec as a real-world guess. The drive has an average read/write seek time of 3.5msec/4.0msec with a track-to-track read/write seek time of 0.2msec/0.4msec. Here's a hypothetical use case using that scenario. Your mileage may vary.

    The usage scenario is reading a 500 MB data file in 1 MB chunks. The file is being copied/warped/translated/etc. and you're reading the source file on chunk at a time, processing the chunk, then writing out a chunk of the same size. When you're done you'll have a 500 MB output file. The input and output files are the only two files on the disk, and there's no other disk activity. We're going to assume your CPU processes the data in RAM instantaneously.

    That's 500 read/write pairs. Each pair consists of a read seek to the input file, a 1MB read, a write seek to the output file, and a 1MB write. Since the start of the output file is 500MB from the start of the input file, they're (relatively) long-distance seeks. A read seek takes 3.5msec, the 1MB read takes 1/200th of a second (5msec), the write seek takes 4.0msec, then another 5msec to write the data - a total of 17.5msec. Doing that 500 times will take 8.75 seconds.

    Now let's add the network scenario. If you're running a Gigabit Ethernet network (you didn't say, but you seem to have good gear) you can move 125Mbytes/sec over it in theory - let's call it 100Mbytes/sec. That's half the speed of your internal disk drive. Both systems have the same disks.

    But now you have two disks running in parallel, and each has an easier job to do. The source disk does a track-to-track seek every once in a while but let's assume it happens on every read (it doesn't). So the source disk does a track-to-track read seek (0.2msec), then a 1MB read (5msec), then sends 1MB over the network (10msec), then repeats. The destination disk is receiving the data in 10msec, then does a track-to-track write seek (0.4msec) and a 1MB write (5msec). At that point it's done and it's waiting about 4.6msec for the next batch to arrive. So the network becomes the bottleneck, and the total elapsed time is essentially the time it takes to move the data over the network. That means the total elapsed time is 500 MB moving at 100 MB/sec - five seconds.

    That nasty network bottleneck is over three times faster than the local disk scenario! Obviously a real-world test would be different, but you get the point. You are getting the benefit of (a) two disks running in parallel, each only doing half the job, and (b) each job turning into a linear read or write of the file, rather than constant jumping back and forth across the disk (the most expensive thing your disk can do). The cost is that you are limiting your I/O transfer speed to the speed of the network rather than the speed of the internal disk, but if you use a fast network that's a small cost.

    If you're not using a Gigabit Ethernet connection, go buy one - they're very cheap!

    MY standard operating procedure is to have the source and destination disks be separate disks, and that means they're across a (gigabit) network, that's fine.

    It would be most informative if you could re-run your test so you are using both operating systems in the same mode (disk to disk or disk to network). And while I've continued to reply to your off-list messages, it would be helpful to get this discussion back on the list, perhaps when you have more real-world data rather than my smoke-and-mirrors numbers!

    - Ed

    Hi Ed, here are my more extensive test results


    The Test ---
    source data 114 CDED .dem files totalling 1.12gb
    finished file 615mb
    command line gdal_merge -n -9999 -init -9999 -o dem.tif  %indems%
    Windows
    network local
    09m 18s 16m 12s
    03m 45s 08m 05s
    03m 31s 07m 01s
    04m 12s 05m 39s
    Linux
    network local
    22m 43s 15m 26s
    07m 44s 15m 37s
    14m 35s 16m 02s
    15m 33s n/a
    08m 09s 08m 19s no vnc from here on
    n/a 05m 38s local disk1 to disk2
    n/a 06m 26s local disk1 to disk2

    Environment --- Server: Win 2003 server, dual 2.8ghz, 3gb ram, ultra320 perc raid

    Windows: dual 3ghz Xeon, 2gb ram, 70gb 15k scsi seagate cheetah on ultra320 scsi adapter

    Linux: dual 450MHz 1Ghz Xeon, 768mb ram, 18gb 15k scsi seagate cheetah on ultra160 scsi adapter

    gigabit ethernet all around.

     


    Conclusions ---

    Ed's theory is sound for the Windows machine, having the source data on the network is consistently faster than on the local disk. How much faster though is greatly dependant on what else the computer is doing. The fastest local time was when I stepped away from the computer and didn't do anything else while the test was running. I did leave all my programs open though. At no time was I doing things which required heavy lifting, all I did was read mail and web pages.

    It's much hard to see what was happening with the linux computer. The results are all over the place. The computer was not doing anything besides the test (no reading or surfing). It is however setup as a desktop. Gnome-desktop is up and running along with a two nautilus browse windows and two terminals. For all but the last trial, a VNC connection was open. From the nearly 50% speed improvement there it's pretty clear that vnc is an expensive monitoring device. Also interesting is that once vnc is not running there is virtually no difference between local and network speeds.

    Having CPUs which are six times faster is also clearly an improvement, but not linearly (the 600% percent faster cpu did not yield 600% time savings).

    --matt

    Matt -

    Thanks for the complete set of data! Unfortunately, it's not a very rigorous test. I was surprised you were doing other things on the Windows machine. Remember that the kiss of death is a big disk seek. Reading mail and web pages causes you to constantly write and read small files, and those files aren't going to be where your GDAL output file is, so you're making the local machine do a lot of extra, expensive seeks (even if it's not doing much I/O). That's exactly why you saw such a difference when only doing a "little" activity.

    The CPU portion of the job is relatively small, since the entire process is largely dominated by I/O. If a given machine takes 10 seconds to do a test and 25% of that time is compute-bound, then even an infinitely fast CPU will only speed up the overall process by 25%.

    Looks good - although it's not clean-room benchmark stuff, I think the numbers are informative when accompanied by all the detail you've added. Thanks!

    - Ed

    I know it's bad practice, and the results clearly show why, but the fact is when I finish doing testing, then it's time to actually go to work. At which time I will be doing other things on my computers. We don't have enough of them to devote to single tasks even if it does make the long run a longer haul. smile --matt

     

     


    fixed major mistake in computer vitals, the linux computer is 1Ghz, not 450Mhz as originally stated (that was two workstations ago, not one!) -- MattWilkie - 01 Nov 2005

     

     

     

     

     

     

     

     

     

     

     

     

    WebForm
    TopicCategory TipsAndTricks

     

     

        Send feedback