[postgis-users] Can I build an index on combinednon-spatial&spatial columns?

Wed Mar 18 06:12:34 PDT 2009

lr,
Thanks for your response.
The dimensions of the problem are relevantly different, something like  
this:

* number of building_types: 10.000
* number of buildings: 5 billion
* average number rows returned by the query (function of distance,  
building type): 100
(the problems something other that buildings, it's stock exchange  
messages & events, but the analogy is 1-on-1 )
I'm currently working of a small sample of that dataset, so other  
issues might pop up.

.. so the child table approach (on the large  5 billion rows table) is  
probably something I need to do anyway, although my current plan was  
to segment on a different key...

The way I wanted it to work was: have a compound index that would  
first filter on building type -resulting in spatial datasets of size  
500K-, and *then* do distance queries on that resulting in approx 100  
rows.
Another idea I just has is to define regions (like city) and use a  
combined index on "building_type, city", and then do a filtering on  
distance/combining adjacent cities using "building_types per city"  
queries

On Mar 18, 2009, at 1:31 PM, Paragon Corporation wrote:

> M.A.,
>
> My gut feeling tells me having 2 partial indexes on the same field  
> as you
> have described is just bad.  How would the planner know which to  
> choose
> unless you always have the filter in there.  If you only have 2 sets  
> of data
> then breaking it out is even more silly.  If you later add more  
> building
> types, then you would never do an indexed searches on those.
>
> You'd be better off having a single gist index on the geometry and  
> possibly
> a btree index on the building_type.  Though if you only have 2  
> building
> types, that index would never need to be used.  PostgreSQL is quite
> efficient at using bit map indexes.  To my knowledge you can't really
> combine a gist and btree type in a single compound index and you  
> wouldn't
> want to anyway.
>
> The child table approach is generally the preferred way of doing  
> this, but
> even that is not necessary if you have less than a million records  
> or if you
> wan that benefit of quickly reloading all buildings or all schools  
> (by being
> able to do a TRUNCATE TABLE).
>
> Regardig your query, your query as it stands will not use any  
> indexes.  You
> should write it using the ST_Dwithin function which does use indexes.
>
> SELECT the_geom, building_name
> FROM buildings
> WHERE  building_type = 'School' AND ST_DWithin(the_geom, 'POINT(100312
> 102312)', 5000)
>
>
> Leo
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of  
> M.A.
> (Thijs) van den Berg
> Sent: Wednesday, March 18, 2009 8:08 AM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Can I build an index on
> combinednon-spatial&spatial columns?
>
> Thanks, that's true. I think the performance benefit with be good  
> because I
> have approx 10.000 building_type's (although, that's a lot of  
> indices to
> choose between). A typical distance query for a specific  
> building_type will
> result in 10-100 matches. Not using partial indexing will thus  
> result in
> appox 500.000  rows for each query (all building types), which I  
> would then
> have to filter on building_type, reducing the resultset with a  
> factor 10.000
>
> Giving it some more thought, I think splitting the building table into
> individual building_type tables (and attaching spatial indices to
> those) might be the best solution.
>
> From a technical point, I think the postgresql CREATE INDEX should  
> support
> multiple USING to allow for mixed patial / non-spatial indices,  
> something
> like this:
>
> CREATE INDEX blabla ON buildings (USING HASH(building_type), USING
> GIST(the_geom))
>
> internally that could be rewritten as
> CREATE INDEX blabla_hash_value1 ON buildings USING GIST(the_geom)  
> WHERE
> hash(building_type)=value1 CREATE INDEX blabla_hash_value2 ON  
> buildings
> USING GIST(the_geom) WHERE hash(building_type)=value2 ..
>
>
> On Mar 18, 2009, at 12:43 PM, Kemal Bayram wrote:
>
>> Well according to EXPLAIN a partial index is taken into account, how
>> much of a performance benefit you gain ultimately depends on how many
>> records lie within your average distance.
>>
>> If your data set is relatively static you may also want to consider
>> clustering your table too.
>>
>>> -----Original Message-----
>>> From: postgis-users-bounces at postgis.refractions.net
>>> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>>> M.A. (Thijs) van den Berg
>>> Sent: 18 March 2009 13:00
>>> To: PostGIS Users Discussion
>>> Subject: Re: [postgis-users] Can I build an index on combined
>>> non-spatial&spatial columns?
>>>
>>> Thanks :)
>>>
>>> Does PostGIS support partial indices?
>>> http://www.postgresql.org/docs/8.3/interactive/indexes-partial.html
>>>
>>> If so.. I could do something like this:
>>>
>>> CREATE INDEX building_type_spindx_school ON buildings USING
>>> GIST(the_geom) WHERE building_type='School';
>>>
>>> CREATE INDEX building_type_spindx_bar ON buildings USING
>>> GIST(the_geom) WHERE building_type='Bar';
>>>
>>> ...
>>> Yet another option would be to partition the buildings table into
>>> child tables per building_type (ie have child table containing all
>>> the schools, another child table having all the bars), and have
>>> individua spatial indices on the child tables. Is that an option?
>>>
>>>
>>>
>>>
>>> On Mar 18, 2009, at 11:36 AM, Kemal Bayram wrote:
>>>
>>>> I don't know about combining indexes but you want to
>>> include a spatial
>>>> operator such as && in addition to Distance so that you can take
>>>> advantage of an index on "the_geom".
>>>>
>>>>> -----Original Message-----
>>>>> From: postgis-users-bounces at postgis.refractions.net
>>>>> [mailto:postgis-users-bounces at postgis.refractions.net] On
>>> Behalf Of
>>>>> M.A. (Thijs) van den Berg
>>>>> Sent: 18 March 2009 11:57
>>>>> To: postgis-users at postgis.refractions.net
>>>>> Subject: [postgis-users] Can I build an index on combined
>>> non-spatial
>>>>> &spatial columns?
>>>>>
>>>>> Suppose I want to speedup the following  type of queries with an
>>>>> index:
>>>>>
>>>>> SELECT the_geom, building_name
>>>>> FROM buildings
>>>>> WHERE Distance(the_geom, 'POINT(100312 102312)') < 5000  AND
>>>>> building_type = 'School'
>>>>>
>>>>> Is that possible, and if not, why not? My current option
>>> is to write
>>>>> my own index storage in C++, using a hashmap to filter
>>> building_type,
>>>>> and build separate spatial indices for each building type.
>>> Can I do
>>>>> something similar in postgresql?
>>>>> Would my only option be to split the building_types into separate
>>>>> tables?
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users