[gdal-dev] PgSQL and OLCFastFeatureCount

Even Rouault even.rouault at spatialys.com
Sat Jan 17 13:24:53 PST 2015


Le samedi 17 janvier 2015 22:10:23, Paul Ramsey a écrit :
> How accurate is the fast feature count supposed to be? An estimate
> could be pulled from the system stats pretty much instantly, but it
> wouldn't be exact.

From the documentation :

/**

 \fn int OGRLayer::GetFeatureCount( int bForce = TRUE );

 \brief Fetch the feature count in this layer. 

 Returns the number of features in the layer.  For dynamic databases the
 count may not be exact.  If bForce is FALSE, and it would be expensive
 to establish the feature count a value of -1 may be returned indicating
 that the count isn't know.  If bForce is TRUE some implementations will
 actually scan the entire layer once to count objects. 

 The returned count takes the spatial filter into account. 

 Note that some implementations of this method may alter the read cursor
 of the layer.

 This method is the same as the C function OGR_L_GetFeatureCount().

 @param bForce Flag indicating whether the count should be computed even
 if it is expensive.

 @return feature count, -1 if count not known. 

*/

So we could in theory return -1 if bForce = FALSE (the generic implementation 
in OGRLayer does that). However I'm afraid it would break some applications, 
and it would not help for your use case. You would likely need a 
GetApproxFeatureCount() that could default to GetFeatureCount(FALSE) (or 
perhaps just return -1) for the generic implementation, and use PostgreSQL 
stats for the PG driver.

> 
> I  come to this question via the FDW work, where the first step PgSQL
> does is to try and plan the query and get selectivity estimates from
> all the nodes, including the FDW node. That means the FDW node is
> expected to return a guess of how many records the query will return,
> so I end up calling the OGR feature count method. But if it's really
> slow (as SELECT Count(*) can be) then thing will fall apart pretty
> quick.
> 
> P.
> 
> On Fri, Jan 16, 2015 at 12:39 PM, Even Rouault
> 
> <even.rouault at spatialys.com> wrote:
> > Selon Paul Ramsey <pramsey at cleverelephant.ca>:
> >> The PgSQL driver is returning TRUE for OLCFastFeatureCount and then
> >> running "SELECT Count(*)â   to fulfill the request. Since that is
> >> actually going to apply a full table scan, itâ  s not *really* a fast
> >> feature count in my estimation, but perhaps GDAL has a different
> >> standard? Whatâ  s the standard for a fast feature count? Basically
> >> instant (the record count resides in header metadata or something
> >> similar)? Or â  fast enough for small thingsâ  ?
> > 
> > Paul,
> > 
> > The standard for "fast" is not well defined I think. Instant would be
> > ideal, but a number of drivers advertize FastFeatureCount when they have
> > a specialized implementation that is faster than the generic one. In
> > that instance the request is run entirely on server side, so this is
> > much faster than the default implementation. It is a bit surprising that
> > PostgreSQL cannot maintain the feature count without going to a full
> > table scan, but I guess there are technical difficulties related to
> > concurrent updates to do so.
> > 
> > Even
> > 
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list