[postgis-users] Best Practices for Configuring Large Data Sets

Fri May 28 08:35:25 PDT 2010

Bill,

   Is your PLSS data lines (yikes!) or areas (hopefully)?   And the shapefiles 
have the same attributes (or can be made that way prior to loading into 
PostGIS)?  Are the mining claims represented by points or areas?  If they are 
areas, I guess you could use the centroid in the query.

   I expect the query to fast enough as long as there is a spatial index on the 
PLSS table, but perhaps some more experienced PostGIS people could comment on 
that...

Best Regards,
Brent Fraser

Bill Thoen wrote:
> I've got a database of Public Land Survey System (PLSS) data  for most  
> of the contiguous United States, and I need to make it accessible to 
> spatial queries using PostgreSQL and PostGIS. My basic question is how 
> to physically configure it so that queries over any area are as 
> efficient as possible.
> 
> Right now it is all stored in Shapefiles,  physically separated by state 
> and within each state it's divided into three layers: township 
> boundaries, section boundaries and quarter-section boundaries. To give 
> you some idea of the scale of the entities, townships are square-ish 
> polygons 6 miles on a side and contain 36 sections. Sections are approx 
> 1 sq mi in size. The files take up some space too  (for example, the 
> data for Colorado is 862Mb and Wyoming is 763Mb.)
> 
> I need to use these data to perform queries on other national data sets 
> to provide results for requests like: "Produce a count of all active 
> mining claims west of the Mississippi (west of longitude 96W is good 
> enough) by PLSS section that intersect any of the Inventoried Roadless 
> Areas (IRAs)."
> 
> I think I know how to do queries like this, but it would be nice to not 
> have to do them for each state. OTOH, if I combine the data so that it 
> doesn't break at the state borders, every query is going to involve 
> whopper-sized tables and the system might be too slow. I've thought 
> about writing the queries starting out by filtering on an "Area of 
> Interest" rectangle first, taking advantage of spatial indexing, but I 
> have no feel yet for whether that will quickly enough reduce the load so 
> that the queries don't take months to execute.
> 
> So if you have experience with this sort of thing, could I get your 
> advice on how to balance files sizes to optimize performance and 
> convenience?  Also if you know of any "red flag" conditions that I 
> should watch out for where things become unstable or performance goes to 
> pot (like MS Access' 2Gb limit. You DON'T want to get near that!) I'd 
> appreciate knowing about them before stepping into them.
> 
> TIA,
> - Bill Thoen
>