[postgis-users] Large point table

Kevin Neufeld kneufeld at refractions.net
Fri Sep 28 09:56:48 PDT 2007


Hi Regina,

My only point there was simply this...
If you have a table that is 10GB in physical size due primarily to a 
geometry column, and you add another geometry column to equal in size to 
the first, you will double your table size.  So the question is, what is 
faster to perform a sequential or index scan through? A 10GB table or a 
20GB table. 

The same principle applies which data partitioning.  I had a 10GB table 
that I was often querying against, but I was only ever interested in 
geometries with a certain attribute (which accounted for about 1/2 of 
the table).  Using table inheritance and partitioning, I split my table 
into two, most frequently used and least frequently used geometries, and 
cut my query times in half.  It's takes less time to query a 5GB table 
than a 10GB table.

-- Kevin

-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7

Phone: (250) 383-3022
Email: kneufeld at refractions.net



Obe, Regina wrote:
> Actually this is a side comment to the part the part below.
>
> I have always stored the most commonly used transforms in my table as a
> secondary column, but I was wondering how much it slows down a query if
> you don't select the fields you don't need.  I imagine there must be
> some slow down.  Otherwise column-oriented databases wouldn't be quite
> as useful compared to row-oriented databases as they are -
> http://www.databasecolumn.com/2007/09/stonebraker-comment-response.html
> ..  
>
> Has anyone done any benchmarks on using a view with index on transform
> vs. index plus realized transform?
>
>
> "If you want to squeeze a little bit more performance out of your query,
>
> you can cache the transform as a new column, so it's already 
> precomputed.... but I would not store this in the same table.    If the 
> goal here is speed, then duplicating points in your table will double 
> the table size, slowing your query time.  Create a new table for this.  
> The problem with this approach is that you now have duplicate data and 
> you may have to write triggers to deal with modifications done to your 
> original points. "
>
>
> Thanks,
> Regina
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
> Kevin Neufeld
> Sent: Friday, September 28, 2007 11:39 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Large point table
>
> Unfortunately Shane, I totally disagree.  If one want to do transforming
>
> or grouping in a grid, it's far, far easier to do this in right in
> PostGIS.
>
> Steve,
> First, aggregating the points into a grid is a great approach.  I've 
> been working with a table of ~ 170 million points and using this 
> technique, I've speed up my search queries significantly by grouping 
> these points into a multipoints of about 150-200 points.  As you've 
> probably discovered, this can be done by creating a new table, and 
> simply grouping your points using the snaptogrid function.
> CREATE TABLE new_pt_tbl AS
> SELECT collect(my_pt) AS my_multi_pt FROM pt_tbl
> GROUP BY ST_SnapToGrid(my_pt, <grid_size>);
>
> Second, a view is the same as a query.  A view in postgresql is just a 
> stored query.  If you don't want to always type "ST_Transform...." in 
> your query, then using a view is the way to do this. PostGIS is pretty 
> good in doing transforms on the fly, so using a view is a good 
> approach.  Keep in mind that you may have to create a functional index 
> on your multipoint geometry column if you want to use an index in
> Albers.
> CREATE INDEX new_pt_tbl_idx ON new_pt_tbl USING GIST ON 
> (ST_Transform(my_multi_pt, <Albers SRID>));
>
> If you want to squeeze a little bit more performance out of your query, 
> you can cache the transform as a new column, so it's already 
> precomputed.... but I would not store this in the same table.    If the 
> goal here is speed, then duplicating points in your table will double 
> the table size, slowing your query time.  Create a new table for this.  
> The problem with this approach is that you now have duplicate data and 
> you may have to write triggers to deal with modifications done to your 
> original points.
>
> Hope this helps.
> Kevin
>
> -------------
> Kevin Neufeld
> Software Developer
> Refractions Research Inc.
> 300-1207 Douglas St.
> Victoria, B.C., V8W 2E7
>
> Phone: (250) 383-3022
> Email: kneufeld at refractions.net
>
>
>
> Shane Spencer wrote:
>   
>> stupid mouse..
>>
>> As I was saying....
>>
>> You would be better off pre-transforming them with ogr2ogr.. dump them
>>     
>
>   
>> to a shapefile, write a quicky script to move the points around to 
>> match your grid (if that is what you are indeed doing) then insert 
>> them into the database as the right projection.
>>
>> It sounds like you will have a lot of duplicate points.  I recommend 
>> you only insert distinct points into a table, then make a reference 
>> table with smaller data types and a primary key referencing the point 
>> table.  Python could handle that kind of insert situation pretty
>>     
> easily.
>   
>> On 9/25/07, *Shane Spencer* <shane at bogomip.com 
>> <mailto:shane at bogomip.com>> wrote:
>>
>>     you would be better off pre-transforming them with ogr2ogr.. dump
>>     them to a shapefile, write a q
>>
>>
>>     On 9/25/07, * Stephen Crawford* < src176 at psu.edu
>>     <mailto:src176 at psu.edu>> wrote:
>>
>>         All,
>>
>>         I have a table of about 30 million point observations.  The
>>         geometries are
>>         stored in lon/lat., with a gist index on the column.  For
>>         display and
>>         analysis I want to use an albers projection....mostly I'm
>>         aggregating the
>>         points to a grid.  Currently I just do the transform(geom) in
>>         my queries.
>>         Would it be better for me to perhaps create a view with the
>>         tranformation
>>         instead of in my query....or should I instead create another
>>         geometry column
>>         in the original table, with the tranformation and do another
>>         gist index on
>>         the new column?  What's the best approach?
>>
>>         Thanks,
>>         Steve
>>
>>
>>         Stephen Crawford
>>         Center for Environmental Informatics
>>         The Pennsylvania State University
>>
>>         _______________________________________________
>>         postgis-users mailing list
>>         postgis-users at postgis.refractions.net
>>         <mailto:postgis-users at postgis.refractions.net>
>>         http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>     
> <http://postgis.refractions.net/mailman/listinfo/postgis-users>
>   
>>
>>
>>     
> ------------------------------------------------------------------------
>   
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>   
>>     
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   



More information about the postgis-users mailing list