VS: [gdal-dev] Re: Fastest vector format for combining shapefiles

Peter J Halls P.Halls at york.ac.uk
Wed Oct 7 04:18:13 EDT 2009


Perhaps, although not yet supported by GDAL because publication of the format is 
not due until 2010, the new ESRI File geodatabase might be something to explore. 
  This is limited only be any operating system restrictions on size and appears 
to be replacing the shapefile in due course.  However, I would worry about 
storing the sort of volumes of data you describe in these sort of formats 
because the size involved will always introduce manipulation issues.

If you need to be able to 'dump' chunks of your data holdings for customers, do 
these have to be on physical media?  For example, in Oracle (and probably 
PostGIS) you could create a view specific to the customer and then use a web 
service to deliver the data to the customer.  Maybe that is something to ponder.

Best wishes,

Peter

Rahkonen Jukka wrote:
> Hi,
> 
> Maybe I need to clarify a bit my aim. This is huge dataset with hundreds of layers and we do not use actively all of them.  The layers we need we will insert to Oracle database but it is a managed, hosted production system and it is far too expensive to use it as a backyard storage shed.  What I need is a handy storege where I can easily take out the layers I need. A possibility to use a spatial window for the excerpt would be a nice benefit. Up till 4-6 gigabytes file size Spatialite seems to be about optimal solution.  It is a real database that supports queries but still all the data is stored in a one transferable file. It is much more complicated with, let's say Oracle or PostGIS, you can't just write the database on a CD or DVD and send it for your customer.
> 
> Most probably I will split the big layer to two or three chunks and write them to separate Spatialite files.
> 
> -Jukka-
> 
> Guillaume Sueur wrote:
> 
>> Hi,
>>
>> Interesting topic !
>> The most efficient way is the one which will fit your needs the best.
>> Forget the OneShape idea, but I think you can either have a 
>> database approach (either PostGIS or Oracle) or a file 
>> approach (your thousands of shapefiles).
>> It depends on what you have to do with data and how you will 
>> retrieve it. If you plan to do attribute queries, 
>> classifications and filtering, go for a database, as it is a 
>> database job to extract data fastly.
>> If you display/draw/use at once all the content of your data, 
>> the file approach will be the best. Note that you can 
>> optimize Shapefiles by creating a spatial index on them 
>> (shptree command) and a global index of your set (with 
>> ogrtindex, look here 
>> :http://mapserver.org/optimization/tileindex.html). It will 
>> be much easier to handle with such a global file pointing to 
>> your various files.
>> You can even index that one with shptree too. 
>> But it can be hard to manage when your data gets updated...
>>
>> My 2 cents,
>>
>> Guillaume
>>
>> Le mercredi 07 octobre 2009 à 09:48 +0300, Rahkonen Jukka a écrit :
>>> Jukka Rahkonen writes:
>>>
>>>> Hi,
>>>>
>>>> I am combining some GIS data where each layer is divided to around
>>> thousand
>>>> separare shapefiles by mapsheets. Now I would like to 
>> store all the
>>> 35000
>>>> shapefiles to something that is more easy to handle. At first 
>>>> pushing
>>> each layer
>>>> to own Spatialite database feeled perfect, but I have 
>> problems with
>>> one layer
>>>> which has rather lot of data. Appending shapefiles one by one to
>>> Spatialite
>>>> database gets too slow after the database file has 
>> reached a size of
>>> around 6
>>>> gigabytes. Up till 3-4 gigabyte file size appending data to 
>>>> Spatialite
>>> is pretty
>>>> fast and because it is a database I guess I will use that 
>> for small
>>> layers.  But
>>>> what might be the fastest vector format that ogr supports 
>> to collect
>>> the big
>>>> layer (thousand shapefiles with total size of about 10 gigabytes)
>>> together?  I
>>>> would prefer some file based format because data goes to long-time
>>> storage, but
>>>> I can use Oracle or PostGIS in between if it is faster to do the
>>> conversion in
>>>> two steps.  What is recommended? Shapefiles, MapInfo tab, Oracle,
>>> PostGIS or
>>>> something else?
>>> I can tell now that shapefile format is not suitable at 
>> all. The shp 
>>> part can obviously not go over 2 GB limit because after 
>> that ogr2ogr 
>>> throws these error messages:
>>> ERROR 1: Error in psSHP->sHooks.FSeek() or fwrite() writing 
>> object to 
>>> .shp file.
>>>
>>> Dbf seems not to have such size limit because it grew up till 32 
>>> gigabytes.
>>> I will try MapInfo tab before I believe that it is just 
>> best to keep 
>>> the 1000 shapefiles or upload them all to PostGIS.
>>>
>>> -Jukka-
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------


More information about the gdal-dev mailing list