[postgis-users] Search Engine

uli mueller uli.mueller at gmx.ch
Thu Aug 5 04:57:30 PDT 2010


Simon,

we should not drift too far into a discussion about how you would do it
with SQL Server. This was not Ricardos question. But I admit I have to
specify some of my comments to get clearer.

> decision but it is foolish to suggest that "you should not do anything like
> search tables one after another". Hell, I did just this in SQL Server

The point ist not only speed. The main reason is that you loose the
ability of PostgreSQL's full text search capabilitites to rank your
results. With separate tables you have to pick the n best matches of
each table and combine them afterwards to display the overall n best
matches.

Perhaps another way would be to join the tables in a view and do the
search on the view. But then you can't fully benefit from the index.

I don't know if SQL Server does not face these limitations? Anyway, "<
10 seconds" is not really really fast - at least for a PostgreSQL
database ;-)

>> http://mapmatters.org . The hardest thing there was and still is to
>> optimize the way how data are combined and weighted for the tsvector

I didn't mean anything technical but the problem, how to produce the
most meaningful results and ranking of the results (how to weight values
out of different fields, which words are relevant or only stop words...).

Uli

-- 
geOps
www.geOps.de
D-79098 Freiburg

Am 05.08.2010 10:01, schrieb Simon Greener:
> Uli,
> 
>> Clearly, if you want to search across different tables you need a way to
>> combine data in a common place. You should not do anything like search
>> the tables one after the other and combine the results.
> 
> There may be a software limitation in PostgreSQL that supports this as
> a  technical
> decision but it is foolish to suggest that "you should not do anything like
> search tables one after another". Hell, I did just this in SQL Server
> 2008 with
> 800 tables and the execution of the FULLTEXT SQL is very, very fast.
> That is < 10 seconds.
> I filter the tables by MBR first against an enhanced Geometry_Columns
> table before
> using FULLTEXT to do the search. But it may mean that you mean selecting
> against
> the tables using ordinary SQL with all searchable columns in the where
> clause as
> predicates. Yes, this would be very slow and inflexible in terms of
> search capability.
> That is why specialist search structures and functionalty have been
> created for most
> databases.
> 
>> Even if you merge all tables into one, you will need some explicit
>> mechanism (trigger!) to keep your tsvector up to date.
> 
> Correct. This is one of the reasons why this approach is unsustainable.
> 
> Also, what if someone didn't have the ability to modify the schema at will?
> It might be that a lot of the use of PostGIS is as a shapefile replacement,
> but real application databases do not afford that sort of happy go lucky
> approach
> to data structuring.
> 
>> A system using triggers to build the tsvector is not so hard to
>> maintain. Once you have written the triggers it simply runs and runs.
>> Any time data in any relevant table changes, a trigger updates the
>> tsvector, some key (gid or whatever) and maybe other data like bounding
>> boxes in the one and only table that will be searched. Some challenge
>> could arise, if it takes too long to rebuild your index on the tsvector.
>> But normally this is not critical.
>>
>> We use the trigger approach for our search engine on
>> http://mapmatters.org . The hardest thing there was and still is to
>> optimize the way how data are combined and weighted for the tsvector (
>> so how you feed the "to_tsvector" function).
> 
> I do not know much about tsvector but if you can create it external to
> the data
> and populate it by triggers without otherwise changing the data
> structures of
> the database then this would be a good solution.
> 
> In SQL Server 2008 FULLTEXT's indexing of tables as they change can be
> "declaratively"
> organised. But if this is not possible with tsvector then the use of
> triggers may be the way to go.
> 
> regards
> Simon




More information about the postgis-users mailing list