[postgis-users] Large point table
Kevin Neufeld
kneufeld at refractions.net
Fri Sep 28 09:56:48 PDT 2007
Hi Regina,
My only point there was simply this...
If you have a table that is 10GB in physical size due primarily to a
geometry column, and you add another geometry column to equal in size to
the first, you will double your table size. So the question is, what is
faster to perform a sequential or index scan through? A 10GB table or a
20GB table.
The same principle applies which data partitioning. I had a 10GB table
that I was often querying against, but I was only ever interested in
geometries with a certain attribute (which accounted for about 1/2 of
the table). Using table inheritance and partitioning, I split my table
into two, most frequently used and least frequently used geometries, and
cut my query times in half. It's takes less time to query a 5GB table
than a 10GB table.
-- Kevin
-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7
Phone: (250) 383-3022
Email: kneufeld at refractions.net
Obe, Regina wrote:
> Actually this is a side comment to the part the part below.
>
> I have always stored the most commonly used transforms in my table as a
> secondary column, but I was wondering how much it slows down a query if
> you don't select the fields you don't need. I imagine there must be
> some slow down. Otherwise column-oriented databases wouldn't be quite
> as useful compared to row-oriented databases as they are -
> http://www.databasecolumn.com/2007/09/stonebraker-comment-response.html
> ..
>
> Has anyone done any benchmarks on using a view with index on transform
> vs. index plus realized transform?
>
>
> "If you want to squeeze a little bit more performance out of your query,
>
> you can cache the transform as a new column, so it's already
> precomputed.... but I would not store this in the same table. If the
> goal here is speed, then duplicating points in your table will double
> the table size, slowing your query time. Create a new table for this.
> The problem with this approach is that you now have duplicate data and
> you may have to write triggers to deal with modifications done to your
> original points. "
>
>
> Thanks,
> Regina
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
> Kevin Neufeld
> Sent: Friday, September 28, 2007 11:39 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Large point table
>
> Unfortunately Shane, I totally disagree. If one want to do transforming
>
> or grouping in a grid, it's far, far easier to do this in right in
> PostGIS.
>
> Steve,
> First, aggregating the points into a grid is a great approach. I've
> been working with a table of ~ 170 million points and using this
> technique, I've speed up my search queries significantly by grouping
> these points into a multipoints of about 150-200 points. As you've
> probably discovered, this can be done by creating a new table, and
> simply grouping your points using the snaptogrid function.
> CREATE TABLE new_pt_tbl AS
> SELECT collect(my_pt) AS my_multi_pt FROM pt_tbl
> GROUP BY ST_SnapToGrid(my_pt, <grid_size>);
>
> Second, a view is the same as a query. A view in postgresql is just a
> stored query. If you don't want to always type "ST_Transform...." in
> your query, then using a view is the way to do this. PostGIS is pretty
> good in doing transforms on the fly, so using a view is a good
> approach. Keep in mind that you may have to create a functional index
> on your multipoint geometry column if you want to use an index in
> Albers.
> CREATE INDEX new_pt_tbl_idx ON new_pt_tbl USING GIST ON
> (ST_Transform(my_multi_pt, <Albers SRID>));
>
> If you want to squeeze a little bit more performance out of your query,
> you can cache the transform as a new column, so it's already
> precomputed.... but I would not store this in the same table. If the
> goal here is speed, then duplicating points in your table will double
> the table size, slowing your query time. Create a new table for this.
> The problem with this approach is that you now have duplicate data and
> you may have to write triggers to deal with modifications done to your
> original points.
>
> Hope this helps.
> Kevin
>
> -------------
> Kevin Neufeld
> Software Developer
> Refractions Research Inc.
> 300-1207 Douglas St.
> Victoria, B.C., V8W 2E7
>
> Phone: (250) 383-3022
> Email: kneufeld at refractions.net
>
>
>
> Shane Spencer wrote:
>
>> stupid mouse..
>>
>> As I was saying....
>>
>> You would be better off pre-transforming them with ogr2ogr.. dump them
>>
>
>
>> to a shapefile, write a quicky script to move the points around to
>> match your grid (if that is what you are indeed doing) then insert
>> them into the database as the right projection.
>>
>> It sounds like you will have a lot of duplicate points. I recommend
>> you only insert distinct points into a table, then make a reference
>> table with smaller data types and a primary key referencing the point
>> table. Python could handle that kind of insert situation pretty
>>
> easily.
>
>> On 9/25/07, *Shane Spencer* <shane at bogomip.com
>> <mailto:shane at bogomip.com>> wrote:
>>
>> you would be better off pre-transforming them with ogr2ogr.. dump
>> them to a shapefile, write a q
>>
>>
>> On 9/25/07, * Stephen Crawford* < src176 at psu.edu
>> <mailto:src176 at psu.edu>> wrote:
>>
>> All,
>>
>> I have a table of about 30 million point observations. The
>> geometries are
>> stored in lon/lat., with a gist index on the column. For
>> display and
>> analysis I want to use an albers projection....mostly I'm
>> aggregating the
>> points to a grid. Currently I just do the transform(geom) in
>> my queries.
>> Would it be better for me to perhaps create a view with the
>> tranformation
>> instead of in my query....or should I instead create another
>> geometry column
>> in the original table, with the tranformation and do another
>> gist index on
>> the new column? What's the best approach?
>>
>> Thanks,
>> Steve
>>
>>
>> Stephen Crawford
>> Center for Environmental Informatics
>> The Pennsylvania State University
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> <mailto:postgis-users at postgis.refractions.net>
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
> <http://postgis.refractions.net/mailman/listinfo/postgis-users>
>
>>
>>
>>
> ------------------------------------------------------------------------
>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
More information about the postgis-users
mailing list