[postgis-users] Enormous file geodatabase feature class

Webb Sprague webb.sprague at gmail.com
Fri Mar 7 11:07:06 PST 2008


Hi Dana,

Very interesting -- thanks for sharing.  I am not going to respond
item by item, but I wanted to mention that (1) I don't see how you can
avoid writing your own tools for merging (pretty specialized stuff),
(2) if you know Perl, it has plenty o' stuff to script this, and ...

(3) One pattern that might be interesting for merging all the 100's of
shapefiles (which may turn into 100,000s if you are successful and get
more funding, right?) is a general  script that  calls county specific
filters to standardize the input on its way to the big table; then
when you get a new county's worth of data you just write a new filter
specification (I can imagine a config-like syntax that specifies input
column, output column, and function to convert if necessary, or some
such) and call it in the big script.  You can store the filters specs
systematically in the database too, similar to the way postgis stores
proj command lines in the spatial_ref_sys table.

Postgis doesn't play nice with rasters, so you may have vectorize them
somehow if you are going to do SQL with such data.  Somebody who knows
more than me should chime in here.

My guess is that 1G shapefilse won't be a problem at all,  if there is
enough hardware, esp since you aren't displaying anything in postgis.

If you put up a wiki or anything let the list know, and if you do
something big refractions would probably be interested in a case
study.

-W

On Fri, Mar 7, 2008 at 2:53 AM, dnrg <dananrg at yahoo.com> wrote:
> Hi Webb,
>
>  Thanks for responding.
>
>  > Could you describe what you mean by "standardize"
>  > with an example? And do you mean "standardized
>  > against each other" or "standardized
>  > against a third specification"?
>
>  Sure.
>
>  *Input*: 14 counties of parcel data; PIN number may be
>  named PIN2, ID, etc, etc. And each separate data set
>  has a varying number of attribute columns (some up to
>  30+ columns).
>
>  *Desired output*: a merged parcels data set in
>  PostgreSQL/PostGIS with only 5 selected attribute
>  columns with ~8 column names of my choice. I guess
>  that would be a third specification. Something like:
>  PIN, COUNTY, OWNER, ADDR, ADDR2, CITY, STATE (US
>  data), ZIP. Just enough to identify parcel owners, and
>  for parcel owners to identify their own parcels.
>  Obviously, data types need to jibe; and it will be fun
>  to free attribute columns from wasteful types like 255
>  chars for STATE, or whatever; I've noticed that many
>  of these county parcel data shapefiles are enormous in
>  part  because the creators seem to accept character
>  data type default lengths of 255.
>
>  Once in PostgreSQL/PostGIS, I can perform the analysis
>  I need, adding additional columns for the projected
>  wind, solar, and microhydro energy potential of each
>  parcel.
>
>  Incidentally, if anyone wants to help, the project is
>  called ERMA / NC ERMA - the renewable Energy Resource
>  Mapping Application for North Carolina. No project web
>  site for it yet.
>
>  Helena Mitasova at NC State will be advising us on the
>  solar module (using GRASS--evidently there is already
>  a good module / model for this, and has been used in
>  Europe for assessing solar energy potential). Tobin
>  Bradley from Charlotte, NC has offered to help in some
>  capacity. I'm trying to put together a list of
>  volunteers / advisors.
>
>  And we have wind class rasters for all of North
>  Carolina (the data is a bit old, coming from TrueWind
>  LLC pre-LIDAR). My guess is that they used weather
>  station data + 10 meter DEMs, but who knows. Would
>  love to generate some new wind raster data using the
>  latest LIDAR, but the TrueWind model is proprietary.
>
>  We don't yet have a good algorithm for microhydro
>  potential; although we do have some of the best LIDAR
>  data in the US.
>
>  > I bet you will have to write it yourself. It sounds
>  > like a big project. I could be wrong.
>
>  May end up having to do the merge manually. Not so
>  terrible for 14 counties. Will be painful for all 100
>  counties, and for future updates--obviously parcel
>  data isn't static, and can change weekly.
>
>  ERMA Phase I is for residential wind energy potential,
>  and the greatest wind energy potential is in the
>  Appalachian mounties (~14 Western NC counties).
>  Probably going to use MapServer as a platform to let
>  citizens discover the energy resources of their
>  parcels. Unless someone has a better suggestion.
>
>  > Are you getting shapefiles or geodatabases as source
>  > data?  I see no reason to import a shapefile into a
>  > geodb/ access thing (yuck!) as an
>  > intermediate step.
>
>  All shapefiles. Huge, honkin' shapefiles. One of them
>  is over 1G. This is the one QGIS choked on (but that
>  was QGIS on Windows--I still need to buy a new,
>  dual-core laptop with ~3-4G of RAM and put CentOS on
>  it).
>
>  > Postgresql is very scriptable, and would be my
>  > platform of choice for any big integration of 100's
>  > county tables.  Or is there something I am missing?
>
>  Nope, sounds great.
>
>  > [ ESRI's ] stored procedures and triggers it must
>  use to maintain
>  >  the consistency of the data.
>
>  > Is there a description of these somewhere? It would
>  > be nice if someone developed a standard set of
>  > postgis triggers to maintain topology (at least not
>  > allow inserts of malformed data), etc.
>
>  Nice idea. I personally wouldn't touch it, as this
>  would probably be violating some license agreement;
>  but the source for triggers and stored procedures are
>  probably viewable in SDE on Oracle through the
>  DBA_SOURCES view and other methods.
>
>  > Watch out: with open source whenever you say
>  "someone
>  > could do X", the obvious retort is "why don't you do
>  > it and post the code?" :)
>
>  I'm slowly getting that. Probably takes me at least 3
>  times of reading / hearing something before it sinks
>  in. I know only a small amount of Python (having been
>  a Perl scripter in a previous life as a unix/nt
>  sysadmin), but hope to learn more. Don't think I'll be
>  casually picking up Java or C# any time soon (although
>  that's a goal I have for the distant future).
>
>  If I come up with code to programmatically do what I
>  need on parcels, I will gladly share it--and document
>  it. One of the drivers of ERMA is that we want it to
>  be successful, and built 100% on FOSS GIS, so that it
>  can be adopted by other states / countries / etc. It
>  could be done with ESRI technology, but not every
>  state / country has an ELA with ESRI (the state of
>  North Carolina has a statewide ELA with ESRI; but that
>  could evaporate in 5 years--who knows).
>
>  Another personal driver for me with ERMA is to learn
>  enough about GRASS / QGIS / PostgreSQL/PostGIS /
>  MapServer to contribute to the doc set and write
>  tutorials.
>
>  According to Karl Fogel in his fabulous book Producing
>  Open Source Software (thanks to whoever on this list
>  recommended it to me--I'd recommend it to other noobs
>  as well), documentation--especially examples and
>  tutorials--is particularly weak. If I develop
>  expertise with FOSS GIS, this is one area where
>  (hopefully) I can help. I see ERMA as having some
>  small potential for teaching others about FOSS GIS.
>
>  Karl Fogel's book:
>
>    http://producingoss.com/
>
>  Need to re-read it again, now that I've given it a
>  once over.
>
>  > ESRI doesn't care about making open source better,
>  or
>  > helping people get away from their products and
>  fees.
>  > Period. Why should they?
>
>  Not directly I suppose. They have or had an open
>  source initiative called "52 degrees" (I think). Don't
>  recall what the objective is or was.
>
>  ESRI does at least benefit from open source, don't
>  they? Peter Schweitzer's MP (metadata parser) is
>  embedded in ArcCatalog--didn't realize that until I
>  had lunch with Peter one day and he told me (we were
>  both upset with the new ISO geospatial metadata
>  standard being copyrighted material; having to pay to
>  get documentation about the standard). And ESRI uses
>  GDAL, right?
>
>  That's not contributing in a direct way to reducing
>  its customer base (what for-profit corporation would
>  explicitly set out to do that?), but it does lend
>  additional credibility to what FOSS GIS people are
>  doing.
>
>  We can, and have, and will again, argue about what
>  level of support ESRI is giving to PostgreSQL (and
>  PostGIS). But what is inarguable is that, by doing
>  anything at all with PostgreSQL, they, I think, will
>  open their customers minds to FOSS; at least a
>  teensy-weensy bit. Don't you think? An unintended
>  consequence perhaps, but a consequence of some note.
>
>  Dana
>
>
>
>
>
>
>
>       ____________________________________________________________________________________
>  Never miss a thing.  Make Yahoo your home page.
>  http://www.yahoo.com/r/hs
>  _______________________________________________
>  postgis-users mailing list
>  postgis-users at postgis.refractions.net
>  http://postgis.refractions.net/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list