A Virtual Sea

Submarine Topograhy -- Database Development


Most Recent Bathymetry Dataset For The Sea Of Cortez And Surrounding Pacific Ocean
Including GEODAS Data, Satellite Altimetry Below 1000 meters, and DBDBV Interpolated Data

Introduction

I am attempting to develop a master database of bathymetric and terrestrial heightfield data for the Sea of Cortez and surrounding waters. This page contains notes on this work. The picture at the top of this page is a current visualization of the bathymetric and terrestrial database. The current geographical boundaries for the database are 34N 117.5W, 19N 104W. This region includes terrestrial watersheds that are important to the Gulf of California.

Notes

Master Database Design

  • Fundamental Considerations
    • Since marine life is ultimately keyed to space and time, the master database must use space and time in its fundamental structure.
    • The database must be dimensionally expandable to accommodate data for any number of biological and physical processes.
    • The database must be useable by both internally developed visualization software and certain commercial products.
      • GIS products
      • IDL
      • DEM viewers like Dlgv32 .
    • The database must be useable over the Internet.
  • Basic Structure
    • The use of a geometric grid to contain spatial data is an efficient and compact data structure
      • The geographic location of a data element is represented by its position on the grid.
      • Therefore the actual geographic position does not have to be stored.
        • Only the starting position and the grid resolution need be provided (usually in a header).
    • The grid structure can be used to store heightfield (bathymetric and terrestrial) data.
      • Each grid cell represents a given latitude and longitude.
      • Each grid cell can either supply data to visualization software or the entire grid can become a bitmap.
    • The grid structure can also be used to store data on biological and physical processes.
      • One grid is used for each data type (e.g. biomass, water temperature)
  • Macro Structure
    • The use of a spatial grid structure for any one data element permits "stacking" of grids into an expandable and multidimensional database.
    • Only one DEM style header needs to be used for all data grids because the header defines spatial information for all grids.
    • A metadata header is also required to describe the data contained in each grid plane.
  • Utilization
    • Since arrays are a fundamental construct of most programming languages, a grid structured database is easy to program without using a lot of sophisticated and/or proprietary database libraries.
    • Common Digital Elevation Model (DEM) datasets use the grid structure approach
      • A combined marine and terrestrial grid structured database can be used by many common DEM viewers.
    • The DEM grid model can be incorporated into the HDF format.
      • The HDF format can be used by some commercial visualization software including IDL.
      • Since the HDF model has metadata capabilities, it can be used over the Internet.
    • The grid model, which is a raster database, can be accessed by commercial GIS software.
    • Quadcode mapping can be accomplished with the grid model
      • Territories can be represented at different resolutions in different areas.
      • Important in marine research because data come from a wide variety of sources with different spatial resolutions.

Bathymetric and Terrestrial Heightfield Data (Fundamental data plane)

  • Basic Database Design
    • Fundamental grid design is based on the GTOPO30 format
      • 30 arc second (about 900 meter) resolution grid.
      • Height data are signed short integers (2 bytes)
      • Row major ordering.
      • Binary data file (*.dem).
        • Data are stored in Motorola byte format ("big endian") which stores the most significant byte first.
        • DEC Alpha and most PCs use the Intel byte order ("little endian") which stores the most significant byte first.
        • We'll continue using the big-endian binary format for storage
          • Makes our data available to Macs and Unix machines.
          • Most DEM viewers (e.g. Dlgv32) require big-endian.
          • Can easily convert the data as it's brought into our own viewers.
      • Header file (*.hdr)
        • Holds information on the fundamental data format and layout
          • Byte order
          • Number of rows and columns
          • NaN value
            • GTOPO30 uses -9999 as a numerical indicator for an unknown (NaN) value
            • Since some depths might be greater that -9999, the NaN value for extracted datasets will be 32767.
          • Latitude and longitude of the center of the left uppermost cell.
          • X and Y dimension (precision) of a cell.
  • Bathymetric and Terrain Data Sources
    • Digital Bathymetric Database - Variable (DBDB-V) From The Naval Oceanographic Office (NAVOCEANO)
      • A digital bathymetric database that provides ocean depths at various gridded resolutions. This online database may be queried by specifying point location, an arc of a great circle, or a bounding rectangle. Information and specifications about the database can be found here .
      • I extracted their bathymetry data for our geographic range. Their data are derived by digitizing bathymetric contours of hard copy maps. Their grid resolution (1, 2, and 5 arc minutes) varies by geographic area. Missing values are filled in using a multi-stage minimum curvature spline algorithm which interpolates the digitized values to derive a depth value for each node. The Web site permits one to receive interpolated data down to 0.5 arc minute spacing. Shoreline discrepancies are resolved by creating a land mask using the World Vector Shoreline, or higher resolution shorelines.
      • After downloading the data at a 30 arc-second interpolated resolution, I created a GTOPO30 DEM of their data and used this DEM as the basis for building our database
        • There appear to be some terrain inconsistencies in the land mask. Since I can resolve these inconsistencies by merging the standard GTOPO30 terrain data into the DEM file, I used DBDB-V to build my first DEM into which the other data sources (terrain and bathymetric) would be merged.
    • GTOPO30 Database
      • This format has world-wide terrestrial coverage but no bathymetric data.
      • There are good terrestrial data for areas that surround the Gulf of California.
      • We need surrounding terrestrial data to portray watersheds and coastal wetlands.
      • For the Sea of Cortez, the GTOPO30 DEM file to use is the W140N40 tile.
      • The data for our geographic area was extracted from this tile and merged into the master GTOPO30 DEM.
    • The GEODAS bathymetry data set comes from the Marine Trackline Geophysics database CD (Version 4.0)
      • Data are a collection of 30 years of single and multibeam echo soundings from various world-wide institutions.
      • A coastline data set is also on the CD. These data were merged into the master database to set sea level points for use in interpolation.
      • Software supplied on the GEODAS CD permits the extraction of data (in ASCII format) for a selected geographical area.
      • CD costs $250.00.
    • Bathymetric Estimates Based On Gravimetric Information -- Sandwell and Smith at Scripps and NOAA.
      • 2 minute resolution.
        • Longitude cell size is 2 minutes
        • Latitude cell size is 2 minutes * cos(latitude).
      • Combines a corrected version of the GEODAS bathymetry data set with estimates generated from gravity data derived from satellite altimetry. According to Smith "The depth data were obtained by screening 6905 surveys from the NGDC (Marine Trackline Geophysics CD-ROM version 3.2), the Scripps Institute of Oceanography and Lamont - Doherty Earth Observatory databanks, and other data, using quality control procedures based on those of Smith [J. Geophys. Res. 98, 9591-9603, 1993] The satellite gravity field combines all data from the ERS-1 and GEOSAT satellites including the data declassified in 1995".
      • In email correspondence with David Sandwell of Scripps, I was cautioned not to use any gravimetric data at depths of less than 1000 meters. David says that "The gravity data provide almost no information in the shallow ocean." He also stated that "We did the margins just to make it look nice." Based on David's comments, I'm using only the gravimetric bathymetry estimates from his dataset and only those values at 1000 meters or deeper.
      • Data are considered "estimates" because derivative processes are used.
        • Because these derivative processes use Fourier techniques, I suspect that the results are realistically smoothed just as in the fractal terrain forgeries that use Fourier methods.
      • Smith provides overviews and detailed descriptions of his process. He also provides a very clear readme file which nicely defines his file format.
      • Scripps provides the full dataset for download. This worldwide file is huge (136 MEG). The download is free but there will apparently be a CD available soon. Cost unknown.
      • While he did his work using the GMT (Generic Mapping Tool) on Unix, Smith provides nice C code snippets that can be used to extract portions of his data. Ergo, we don't have to "reinvent the wheel" for the NT extraction code.
  • Current Data Reliability Issues
    • The database still has numerous unknown data points at a 30 arc second resolution. Most of these unknown points are in wetland areas.
    • In addition, the DBDBV interpolated data does not match well with the GEODAS trackline bathymetry below 29 degrees north. This is probably because the DBDBV database below 29 degrees north uses a 5 minute grid basis for its interpolations instead of the 1 minute grid basis that it uses at and above 29 degrees.
  • Data Extraction Process
    • From data extracted from W140N40, create an GTOPO30 file for the designated geographic area.
    • Extract GEODAS ship track bathymetry data for the designated geographic area and place them in the new GTOPO30 file.
    • Extract GEODAS coastline data and place them in the new GTOPO30 file.
    • Extract satellite generated bathymetry estimates from the Smith file and place them in the new GTOPO30 file.
    • Use the ordinary kriging methods defined above to estimate the remaining unknown points at a 30 arc second resolution
  • Geospatial Interpolation Methods To Improve The Current Dataset
    • There are a number of techniques for interpolating geospatial data that are described in lecture notes and outlines from various institutions.
    • Unless I find something better, ordinary kriging has been chosen to estimate the remaining unknowns.
      • Most other interpolation methods are either too linear (e.g. triangulation) or handle end points very poorly (e.g. splines).
      • Advantages of kriging
        • Estimate is based on the distance the unknown point is from known points. The farther a known point is from the unknown, the less the influence it has on estimation of height. This makes a lot of sense in terrain heightfield estimations because nearby terrain has a much stronger influence on the height estimate for the unknown.
        • Estimate can be obtained using a non-linear random function base model. And, one can choose the base model that is used. Our world is rarely linear.
        • The range of influence can be pre-defined. Kriging permits the exclusion of known data if it exceeds a certain range of influence. This makes sense because known height data 50 km away probably has little influence on a local heightfield.
      • Disadvantages of kriging
        • Like any other stochiastic process, its relationship to the real world is subject to the model that is chosen.
        • It is a mathematically intensive process that can bring a computer to creep speed.
          • There are lots of square roots as distances are being computed.
          • Matrix inversion and multiplication is required.
        • An effective kriging process requires careful thought about the search strategy that is used to find and use known values.
          • A large search window may bring in too many known local data points and drastically slow the computer down as distances are calculated.
          • A poorly defined strategy for moving the search window will result in some unknowns remaining as unknowns.
      • My current kriging process
        • Methods to speed up the process
          • Each search/computation window is limited to a 9x9 cell matrix. This limits the number of data points and the search distance to reasonable values.
          • Precompute a 20x20 distance matrix on the first kriging operation. This process eliminates the need for any further distance calculations.
          • Limit the number of known values in the kriging computation to the 10 closest to the unknown.
        • Search and computation strategy (all using a 9x9 cell window - 8.1km square)
          • Start in the lower left (SW) corner.
          • Within the window, search out and list all known points. If there are fewer than 8 knowns, or if all the knowns are zero (the beach) shift the window block one cell to the right.. (Fewer than 8 unknwns might only happen with the first computation set because subsequent window sets include some of the previously calculated data).
          • Using a predetermined primary computation order for 9 cells (four corner cells, then middle edge cells between the corners, etc.)
            • Sort the list of knowns in order of their distance from the unknown.
            • Using no more than 10 of these knowns, perform a kriging operation to estimate the unknown.
            • Store the result.
            • If a cell defined in the primary computation order contains a known value, don't do the above computation.
            • This computational pass is based on bathymetric data only.
          • Using a predetermined secondary computation order for 16 cells (in between the previously computed primary cells)
            • Sort the list of known bathymetric data and the estimates calculated in the primary process in order of their distance from the unknown.
            • Using no more than 10 of these points, perform a kriging operation to estimate the unknown.
            • Store the result.
            • If a cell defined in the primary computation order contains a known value, don't do the above computation.
          • Using similar procedures, compute estimates for the remaing cells.
          • After finishing the solution process in a window, shift the cell matrix to the right by 8. In shifting by 8, the left column contains data from the previous window.
          • If at the max right cell matrix, reset cell matrix indices to far left and shift up 8 cells. In shifting up by 8, the lower row contains data from the previous window.
          • Continue the process until all areas are completed.
          • Redo the first window if there are missing data
  • Interpolation Of The Dataset At Precisions Greater Than 30 Arcseconds Using Fractal Interpolation
    • At 30 arcseconds resolution, the resulting 3D rendering provides only a general idea of the shape of the submarine terrain. It would be interesting to find a way to estimate more detail as the viewer's eye gets closer.
    • One possible estimation technique is to:
      • Supply heightfield anchor points from the GTOPO30 grid.
      • Create a second GTOPO30 grid that has georeferenced data on the substrate roughness -- which is usually known. This roughness can be espressed as fractal dimension.
      • With the anchor points and fractal dimension, a fractal heightfield can be generated within a grid cell. Using RTIN techniques, this finer resolution could be rendered within the known dataset.