Fast Adaptive Storage and Retrieval

Numerical simulation is a valuable tool for modeling diverse physical phenomena, but like "wet lab" experimentation, it does not solve the problem of how to interpret the data. Manual analysis, accompanied by visual aids, is a time consuming, error prone process due to the elaborate time-dependent structures appearing in numerical simulations. We have developed a technique  that enables the worker to clearly identify features of interest  in numerical simulations that employ regular, block-structured grids.  If the application can use an on-line algorithm to identify "interesting" data, and the user is willing to part with the remaining background data, then they can elide storage for the background data. The savings depend on the application, but we have been able to reduce storage requirements by an order of magnitude.

We are developing a library called FASTR (Fast Adaptive Storage and Retrieval) that enables the user to realize some of the benefits of lossless compression in cases where it is possible to elide background data.  Our data sets employ KeLP meta data to enable users to preview data sets remotely without having to access the actual data. These meta data offer compression ratios of several orders of magnitude and can be used to select features for further scrutiny. Our technique employs grid generation technology used in Berger-Oliger-Colella structured adaptive mesh refinement for generating the meta data.

The advantage of our technique is that the amount of storage required to archive large data sets is proportional to the amount of interesting data, rather than the full bulk of raw simulation output.  Moreover, in cases of remote queries, that amount of required communication and disk bandwidth is proportional to the amount of data demanded by the user.

Using our techniques, our colleagues in Mechanical Engineering have obtained a better understanding of how turbulence and turbulent mixing evolve in a stably stratified flow under the influence of a background shear. This phenomenon is of paramount importance to the dynamics of ocean and lake thermoclines and was studied with the aid of volume tracking module based on Silver's technique at Rutgers.  Our automated tracking technique differs from Silvers, employing application-specific knowledge into the identification process. Such knowledge is vital in filtering out spurious information that would otherwise interfere with data analysis.

We are currently applying FASTR to another CFD application in collaboration with colleagues in the UCSD MAE Dept: vortex pairs in a stratified turbulent environment. This application has application to air traffic conrol, and seeks to improve the understanding of aircraft wake vortices.

With increasing computational power to conduct large scale simulations of complex physical phenomena, it is vital to manage both the amount of data produced and the time required to access the data. Users are beginning to accept that they can no longer archive large datasets, but rather, must maintain large data sets on-line (see for example DataCutter at Ohio State and U. Maryland). The methodology offered by our techniques support a changing technological climate, and offers the possibility to provide adaptive storage and retrieval of large scientific data sets.

 This work involves CSE graduate student William Kerney, as well as Greg Balls. The appliction work involves Prof. Keiko Nomura (UCSD MAE Dept), Dr. Peter Diamessis (USC Dept of Aerospace and Mechanical Engineering) and Daniel Mahoney (UCSD MAE)

If you are interested in this capability, please send email to kelp@cs.ucsd.edu.

This work is currently supported by the National Partnership for Advanced Computational Infrastructure (NPACI) under NSF contract ACI9619020, and was supported in part by UC MICRO program award number 99-007 and by Sun Microsystems.


For further information


Last updated 03/26/2002 11:25:36 AM