[postgis-users] Search Engine

Paragon Corporation lr at pcorp.us
Thu Aug 5 14:47:22 PDT 2010


Uli,

There are built in trigger functions for tsearch.  We have some articles on
the topic here - look at the cheat sheet for some examples of trigger use.

http://www.postgresonline.com/journal/categories/30-tsearch

There are a lot of ways to do this.  You could get by with just a functional
index on the relevant text tables and not need to store the tvector and just
have a View for the tsvector.  The savings there is that updates will be
less taxing, but searches may be slightly slower depending on which index
you opt for gin/gist since the false positives would have to be reinspected
(similar to how geometry gist works)

You could use table inheritance as well and just make sure the fulltext
field is named the same across all your tables.  That would prevent you the
need of having to create a side table. And closer to what I think Simon was
talking about.

Leo and Regina,
http://www.postgis.us



-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Simon
Greener
Sent: Thursday, August 05, 2010 4:02 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Search Engine

Uli,

> Clearly, if you want to search across different tables you need a way 
> to combine data in a common place. You should not do anything like 
> search the tables one after the other and combine the results.

There may be a software limitation in PostgreSQL that supports this as a
technical decision but it is foolish to suggest that "you should not do
anything like search tables one after another". Hell, I did just this in SQL
Server 2008 with 800 tables and the execution of the FULLTEXT SQL is very,
very fast. That is < 10 seconds.
I filter the tables by MBR first against an enhanced Geometry_Columns table
before using FULLTEXT to do the search. But it may mean that you mean
selecting against the tables using ordinary SQL with all searchable columns
in the where clause as predicates. Yes, this would be very slow and
inflexible in terms of search capability.
That is why specialist search structures and functionalty have been created
for most databases.

> Even if you merge all tables into one, you will need some explicit 
> mechanism (trigger!) to keep your tsvector up to date.

Correct. This is one of the reasons why this approach is unsustainable.

Also, what if someone didn't have the ability to modify the schema at will?
It might be that a lot of the use of PostGIS is as a shapefile replacement,
but real application databases do not afford that sort of happy go lucky
approach to data structuring.

> A system using triggers to build the tsvector is not so hard to 
> maintain. Once you have written the triggers it simply runs and runs.
> Any time data in any relevant table changes, a trigger updates the 
> tsvector, some key (gid or whatever) and maybe other data like 
> bounding boxes in the one and only table that will be searched. Some 
> challenge could arise, if it takes too long to rebuild your index on the
tsvector.
> But normally this is not critical.
>
> We use the trigger approach for our search engine on 
> http://mapmatters.org . The hardest thing there was and still is to 
> optimize the way how data are combined and weighted for the tsvector ( 
> so how you feed the "to_tsvector" function).

I do not know much about tsvector but if you can create it external to the
data and populate it by triggers without otherwise changing the data
structures of the database then this would be a good solution.

In SQL Server 2008 FULLTEXT's indexing of tables as they change can be
"declaratively"
organised. But if this is not possible with tsvector then the use of
triggers may be the way to go.

regards
Simon
--
SpatialDB Advice and Design, Solutions Architecture and Programming, Oracle
Database 10g Administrator Certified Associate; Oracle Database 10g SQL
Certified Professional Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE,
Manifold GIS, FME, Radius Topology and Studio Specialist.
39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, Australia.
Website: www.spatialdbadvisor.com
   Email: simon at spatialdbadvisor.com
   Voice: +61 362 396397
Mobile: +61 418 396391
Skype: sggreener
Longitude: 147.20515 (147° 12' 18" E)
Latitude: -43.01530 (43° 00' 55" S)
GeoHash: r22em9r98wg
NAC:W80CK 7SWP3
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users





More information about the postgis-users mailing list