Monday, February 9, 2009

Benefits of 64-bit Architecture in Geospatial Imaging

I asked Hammad Khan, an ERDAS IMAGINE Software Engineer, to write a discussion of 32- and 64-bit processing in ERDAS IMAGINE. Thank you, Hammad.

Recently, we have noticed a lot of discussion in our community centered on the benefits of 64-bit computing. Having discussed 64-bit processor architecture with customers and product managers here at ERDAS, I felt it may help to share some thoughts on how 64-bit computing impacts our work in the geospatial imaging world.

As opposed to their more common 32-bit counterparts, 64-bit processors have two major advantages: the ability to access more memory, and an increased number of General Purpose Registers (GPRs). Memory is used by programs to store data needed for calculations. Registers, by contrast, are used to store specific values (a single number, for example) for very quick access in calculations. Data is moved from memory to registers prior to calculations being performed. Another advantage of 64-bit processors worth mentioning is the introduction additional SSE/SSE2 registers, which are registers used for highly-optimized, highly-repeated calculations.

What does this mean for software designed for geospatial analysis? By nature, geospatial analysis algorithms must run on imagery, data which are large, and getting larger. As such, geospatial analysis algorithms rarely hold entire datasets in memory, and are almost always written using techniques such as tiling data access to allow the program to run to completion.

For example, an ADS-40 sensor will frequently generate data that is hundreds of gigabytes in size. Further, we are beginning to see multiple-terabyte images pop up, such as the data used by the Oregon Imagery Explorer project at the Oregon State University. Processing data of this magnitude will require tiled access or another clever data access technique; regardless of how many address bits are available, the entire dataset has a good chance of not fitting in memory.

To illustrate, most traditional classification algorithms may be performed by loading a tile of the image into memory, processing the tile, writing the output, and then loading the next tile. This approach may be (and, in ERDAS IMAGINE, is) optimized by loading the next tile to be worked on while the current tile is being processed. Notable exceptions to this tile-based data-access approach include terrain generation, where generation of a TIN cannot be neatly broken up: features in one tile may, and frequently do, influence features in adjacent tiles. This limitation can be circumvented by using a spatial index (say, using a quad-tree) to intelligently access appropriate tiles; regardless, loading the entire image into memory is not a recommended option.

As computation seems to be our focus above, does this imply that geospatial analysis can then benefit from the computational speedup offered by the additional registers available on 64-bit processors? It is true that the ability to have more data directly ready for access will speed up calculations. However, frequently, the processor is not the bottleneck in executing algorithms. Disk access speeds, and data transport speeds over the associated buses, are frequently seen as bottlenecks. Note that the speed here is primarily getting data read from the disk into memory, not the speed of getting the data from memory to processor registers.

A number of techniques are available to speed up algorithms in the 32-bit world. Examples include pipelining data read and processing, where the next tile is accessed while the current one is being processed; multi-threading processing, where the algorithm is split to be performed in parallel; and using multiple processes wherever possible (say, simultaneously processing multiple independent frames of an RPF dataset) to take advantage of dual- and quad-core architectures. These represent just some of the many tools available to optimize geospatial algorithms. Moving to a 64-bit architecture is another available tool.

This is not to say that moving to a 64-bit architecture will have no benefit whatsoever. Optimization of an algorithm must take a holistic view of the algorithm. The appropriate tools for optimization must then be selected to address bottlenecks specific to the process in question. As such, the benefits of a 64-bit architecture must be kept in perspective.

While there is no doubt that in the near future, we will move to a 64-bit processing environment, and while personally, I look forward to the day where a 32-bit architecture presents primary limitations to our processes, we have found that our bottlenecks lie elsewhere. As such, certainly within our problem space, other optimization techniques offer more bang for the proverbial buck.

We continue to work to make ERDAS IMAGINE the strongest package available for working with geospatial imagery. We aggressively remove bottlenecks as we find them; in addition to the significant performance improvements seen over the previous release, scalability over computing resources will be especially notable in the upcoming release. In fact, of all the releases I have been involved with, and I am particularly excited about the upcoming release. We hope to demo some of the new features at ASPRS in March, and hope you find it as exciting as we do.


Anonymous said...

I agree. The most limiting factor in ERDAS Imagine usage is the hard drive performace. I had a while ago extremely good discussion in erdas-l email list about this and we agreed that you need some decent hardware driven disk management to make things be really fast.

This issue probably remain as it is also in 64-bit os. Not even a 64-bit os is possible to have RAM for terabyte imagery and you still need hard drives.

However 64-bit is coming anyway - it is important and it is needed. It may not be the final solution of everything but it will still be a big issue for many cases.


Paul said...

At some point in the next 3 years we will see solid-state hard disks become stable, cheap and very fast. At this point we will see large jumps in performance. Nevertheless, we need to touch the hard disk once for each pixel when reading and once when writing. Let's make sure we do that now, so when solid-state hard disks are ready, we will be ready to fly...

Jarlath O'Neil-Dunne said...

Thanks so much for posting this Paul, very informative. We just finished a rather extensive round of testing all of the major software packages for viewing raster data and IMAGINE was the clear winner in terms of speed and performance. Not to say that we don’t want more :). I don’t recall ever receiving a memory error in IMAGINE so I bet there is not a whole lot to be gained in a move to 64-bit. When performing vector geoprocessing in ArcGIS I continually run into memory issues, but last I heard from ESRI they were not planning on moving to 64-bit. Visualizing LiDAR point clouds is one area where 64-bit support has been a real game changer, so much so that it appears that graphics card memory and not system memory will be the limiting factor.

I am guessing the Apple fans will want you to provide support for the upcoming Snow Leopard OS. It is reported to support 16TB of RAM. In such a situation one could load everything into memory.