[postgis-users] Can I build an index on combinednon-spatial&spatial columns?
M.A. (Thijs) van den Berg
thijs at sitmo.com
Wed Mar 18 06:12:34 PDT 2009
lr,
Thanks for your response.
The dimensions of the problem are relevantly different, something like
this:
* number of building_types: 10.000
* number of buildings: 5 billion
* average number rows returned by the query (function of distance,
building type): 100
(the problems something other that buildings, it's stock exchange
messages & events, but the analogy is 1-on-1 )
I'm currently working of a small sample of that dataset, so other
issues might pop up.
.. so the child table approach (on the large 5 billion rows table) is
probably something I need to do anyway, although my current plan was
to segment on a different key...
The way I wanted it to work was: have a compound index that would
first filter on building type -resulting in spatial datasets of size
500K-, and *then* do distance queries on that resulting in approx 100
rows.
Another idea I just has is to define regions (like city) and use a
combined index on "building_type, city", and then do a filtering on
distance/combining adjacent cities using "building_types per city"
queries
On Mar 18, 2009, at 1:31 PM, Paragon Corporation wrote:
> M.A.,
>
> My gut feeling tells me having 2 partial indexes on the same field
> as you
> have described is just bad. How would the planner know which to
> choose
> unless you always have the filter in there. If you only have 2 sets
> of data
> then breaking it out is even more silly. If you later add more
> building
> types, then you would never do an indexed searches on those.
>
> You'd be better off having a single gist index on the geometry and
> possibly
> a btree index on the building_type. Though if you only have 2
> building
> types, that index would never need to be used. PostgreSQL is quite
> efficient at using bit map indexes. To my knowledge you can't really
> combine a gist and btree type in a single compound index and you
> wouldn't
> want to anyway.
>
> The child table approach is generally the preferred way of doing
> this, but
> even that is not necessary if you have less than a million records
> or if you
> wan that benefit of quickly reloading all buildings or all schools
> (by being
> able to do a TRUNCATE TABLE).
>
> Regardig your query, your query as it stands will not use any
> indexes. You
> should write it using the ST_Dwithin function which does use indexes.
>
> SELECT the_geom, building_name
> FROM buildings
> WHERE building_type = 'School' AND ST_DWithin(the_geom, 'POINT(100312
> 102312)', 5000)
>
>
> Leo
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
> M.A.
> (Thijs) van den Berg
> Sent: Wednesday, March 18, 2009 8:08 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Can I build an index on
> combinednon-spatial&spatial columns?
>
> Thanks, that's true. I think the performance benefit with be good
> because I
> have approx 10.000 building_type's (although, that's a lot of
> indices to
> choose between). A typical distance query for a specific
> building_type will
> result in 10-100 matches. Not using partial indexing will thus
> result in
> appox 500.000 rows for each query (all building types), which I
> would then
> have to filter on building_type, reducing the resultset with a
> factor 10.000
>
> Giving it some more thought, I think splitting the building table into
> individual building_type tables (and attaching spatial indices to
> those) might be the best solution.
>
> From a technical point, I think the postgresql CREATE INDEX should
> support
> multiple USING to allow for mixed patial / non-spatial indices,
> something
> like this:
>
> CREATE INDEX blabla ON buildings (USING HASH(building_type), USING
> GIST(the_geom))
>
> internally that could be rewritten as
> CREATE INDEX blabla_hash_value1 ON buildings USING GIST(the_geom)
> WHERE
> hash(building_type)=value1 CREATE INDEX blabla_hash_value2 ON
> buildings
> USING GIST(the_geom) WHERE hash(building_type)=value2 ..
>
>
> On Mar 18, 2009, at 12:43 PM, Kemal Bayram wrote:
>
>> Well according to EXPLAIN a partial index is taken into account, how
>> much of a performance benefit you gain ultimately depends on how many
>> records lie within your average distance.
>>
>> If your data set is relatively static you may also want to consider
>> clustering your table too.
>>
>>> -----Original Message-----
>>> From: postgis-users-bounces at postgis.refractions.net
>>> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>>> M.A. (Thijs) van den Berg
>>> Sent: 18 March 2009 13:00
>>> To: PostGIS Users Discussion
>>> Subject: Re: [postgis-users] Can I build an index on combined
>>> non-spatial&spatial columns?
>>>
>>> Thanks :)
>>>
>>> Does PostGIS support partial indices?
>>> http://www.postgresql.org/docs/8.3/interactive/indexes-partial.html
>>>
>>> If so.. I could do something like this:
>>>
>>> CREATE INDEX building_type_spindx_school ON buildings USING
>>> GIST(the_geom) WHERE building_type='School';
>>>
>>> CREATE INDEX building_type_spindx_bar ON buildings USING
>>> GIST(the_geom) WHERE building_type='Bar';
>>>
>>> ...
>>> Yet another option would be to partition the buildings table into
>>> child tables per building_type (ie have child table containing all
>>> the schools, another child table having all the bars), and have
>>> individua spatial indices on the child tables. Is that an option?
>>>
>>>
>>>
>>>
>>> On Mar 18, 2009, at 11:36 AM, Kemal Bayram wrote:
>>>
>>>> I don't know about combining indexes but you want to
>>> include a spatial
>>>> operator such as && in addition to Distance so that you can take
>>>> advantage of an index on "the_geom".
>>>>
>>>>> -----Original Message-----
>>>>> From: postgis-users-bounces at postgis.refractions.net
>>>>> [mailto:postgis-users-bounces at postgis.refractions.net] On
>>> Behalf Of
>>>>> M.A. (Thijs) van den Berg
>>>>> Sent: 18 March 2009 11:57
>>>>> To: postgis-users at postgis.refractions.net
>>>>> Subject: [postgis-users] Can I build an index on combined
>>> non-spatial
>>>>> &spatial columns?
>>>>>
>>>>> Suppose I want to speedup the following type of queries with an
>>>>> index:
>>>>>
>>>>> SELECT the_geom, building_name
>>>>> FROM buildings
>>>>> WHERE Distance(the_geom, 'POINT(100312 102312)') < 5000 AND
>>>>> building_type = 'School'
>>>>>
>>>>> Is that possible, and if not, why not? My current option
>>> is to write
>>>>> my own index storage in C++, using a hashmap to filter
>>> building_type,
>>>>> and build separate spatial indices for each building type.
>>> Can I do
>>>>> something similar in postgresql?
>>>>> Would my only option be to split the building_types into separate
>>>>> tables?
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list