[postgis-users] Optimizer choosing slower strategy (Bitmap Heap vs Index)?

Dan Lyke danlyke at flutterby.com
Wed Jul 26 10:15:26 PDT 2017


Lest it get lost further down: THANK YOU! This gives me some things to
continue searching on:


On Wed, 26 Jul 2017 12:31:07 +0200
Giuseppe Broccolo <g.broccolo.7 at gmail.com> wrote:
> On old dataset, the index is scanned through the bitmap: there are two
> phases, in the first the index is used to build a bitmap, then the
> query bases to find matching entries through the bitmap itself. This
> strategy is generally chosen if the index scan would involve several
> accesses on same blocks of the indexes.
[snip]

Huh. So a possibility is that somehow the data ended up in disk
proximity in the first database, and is widely spread out in the
second? This assumption based on the query mostly being about:

  geometry &&
  ST_SetSRID(ST_MakeBox2D(ST_MakePoint(...),ST_MakePoint(...)), 4326)

(Basically, we're looking for a set of line segments within a bounding
region).

> Could you provide further details in what could be changed between
> old and new datasets? Is the content significantly changed (bounds of
> geometries, etc.)?

The datasets are just two subsequent quarters of a combination of HERE
and OpenStreetMap data. This particular table is just HERE road
segments. The import process in both cases should be the same.

One weirdness that I'm trying to figure out how to quantify: This
particular table in the quarter's data is 33GB, the new quarter's
data is 70GB[1], and I'm trying to figure out why. The geometry as
ST_AsText is 14G in the old quarter, new one is 16G, so not a huge
change there. Similar difference in number of rows (< 20%, not
enough to account for >2x data size...

[1] SELECT relname, relpages, reltuples::numeric,
           pg_size_pretty(pg_table_size(oid))
    FROM pg_class WHERE oid='...'::regclass;

> Furthermore, it could be useful also to have attached the output of
> the EXPLAIN ANALYSE command, in order to compare what the planner
> estimates with what it actually do.

Here's EXPLAIN ANALYZE, old one :

 Sort  (cost=6802.93..6806.30 rows=1346 width=128) (actual time=132.710..133.295 rows=477 loops=1)
   Sort Key: (st_distance_sphere('...'::geometry, st_geometryn(way, 1))), osm_id
   Sort Method: quicksort  Memory: 62kB
   ->  Bitmap Heap Scan on line  (cost=50.85..6732.98 rows=1346 width=128) (actual time=0.976..131.077 rows=477 loops=1)  
         Recheck Cond: (way && '...'::geometry)
         Heap Blocks: exact=261
         ->  Bitmap Index Scan on line_index  (cost=0.00..50.52 rows=1346 width=0) (actual time=0.418..0.418 rows=477 loops=1)  
               Index Cond: (way && '...'::geometry)
 Planning time: 15.924 ms
 Execution time: 133.925 ms
(10 rows)


New one:

 Sort  (cost=2595.88..2597.16 rows=509 width=127) (actual time=220.955..221.449 rows=505 loops=1)
   Sort Key: (st_distance_sphere('...'::geometry, st_geometryn(way, 1))), osm_id
   Sort Method: quicksort  Memory: 64kB
   ->  Index Scan using line_index on line  (cost=0.55..2573.00 rows=509 width=127) (actual time=3.407..219.551 rows=505 loops=1)  
         Index Cond: (way && '...'::geometry)
 Planning time: 0.469 ms
 Execution time: 222.014 ms
(7 rows)


More information about the postgis-users mailing list