<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">We see significant performance boost
when clustering a large PostGIS table on the spatial index. This
will rewrite the table in spatial index order.<br>
</div>
<div class="moz-cite-prefix">CLUSTER <table> USING
<spatial_index></div>
<div class="moz-cite-prefix">My understanding is that you need to
periodically recluster the table when updated, which is an atomic
exclusive rewrite of the complete table i.e. will take time.</div>
<div class="moz-cite-prefix">You may ask the Postgre team about
this, e.g. Magnus Hagander.</div>
<div class="moz-cite-prefix">You may ask Sandro Santilli (PostGIS
topology) about topological aspects.<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<blockquote type="cite"
cite="mid:CANhDX=bDnrMnb92A8XJ9Z3fbD1tehVCzJ---Jgxw8C7Eh8FbmA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>It would probably be possible to get substantial gains
on existing formats that have spatial indexing by making
them more optimal/balanced however I see the following
major issues:</div>
<div><br>
</div>
<div>* Not sure how by convention deal with non-updatable
indexes (packing/sorting on hilbert or sort-tile-recursive
seem inherently static) on formats that are assumed to be
mutable</div>
<div>* GeoPackage cannot benefit from clustering on spatial
index (so it will not be possible to "cloud" optimize)</div>
<div>* Shapefile has several other drawbacks as a format
perhaps making it less interesting to invest into</div>
<div><br>
</div>
<div>I have wondered several times about these things in
relation to PostgreSQL and PostGIS... first thought is why
there is no way to mark a table static/immutable in
PostgreSQL, it seems to be there would be several general
optimization possibilities if it was known that a table
cannot be changed, not the least that it would make it
possible to construct similar packed R-trees as discussed
above. Second thought is how the R-tree is
constructed/implemented via gist index (I have not had the
brain power/interest to dive into the source to understand
it yet) - unless it can be static it should probably
optimally be a R*-tree with topological split but I've got
indications that it is not.</div>
<div><br>
</div>
<div>/Björn</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Den mån 25 feb. 2019 kl
11:06 skrev Darafei "Komяpa" Praliaskouski <<a
href="mailto:me@komzpa.net" moz-do-not-send="true">me@komzpa.net</a>>:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Hi Björn,
<div><br>
</div>
<div>Would it make sense to implement it not as a
separate format, but as a special convention on
top of any of already existing ones?<br>
What happens if you sort by Hilbert curve a
shapefile or geopackage, and tweak the internals
to build indexes in flattish manner?</div>
<div><br>
</div>
<div>For example, here is a branch of Postgres
where GiST index is being built in flat manner.
\</div>
<div>It does not require any change on user's
side, but gets you all the benefits by plainly
building 10x faster.</div>
<div><a
href="https://github.com/glukhovn/postgres/tree/gist_btree_build"
target="_blank" moz-do-not-send="true">https://github.com/glukhovn/postgres/tree/gist_btree_build</a> <br>
<br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Feb 24, 2019
at 4:59 PM Björn Harrtell <<a
href="mailto:bjorn.harrtell@gmail.com"
target="_blank" moz-do-not-send="true">bjorn.harrtell@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>I've now implemented so
that the traversal of the
R-tree reads nodes on demand
and as suspected it gives a
significant boost:</div>
<div><br>
</div>
<div>> time for i in
{1..10}; do ogrinfo -qq
-spat 10 56 10.2 56.2
gis_osm_roads_free_1.fgb
gis_osm_roads_free_1; done<br>
</div>
<div>real<span style="white-space:pre-wrap"> </span>0m0,378s
(~ 20% improvement)<br>
</div>
<div><br>
</div>
<div>> time for i in
{1..10}; do ogrinfo -qq
-spat 10 56 10.2 56.2
gis_osm_buildings_a_free_1_single.fgb
gis_osm_buildings_a_free_1_single; done</div>
<div dir="ltr">real<span style="white-space:pre-wrap"> </span>0m0,790s
(~ 30% improvement)<br>
</div>
<div dir="ltr"><br>
</div>
<div>I'm now also faster on
the reduced window (56 10.1
56.1) beating gpkg:</div>
<div><br>
</div>
<div>> time for i in
{1..10}; do ogrinfo -qq
-spat 10 56 10.1 56.1
gis_osm_buildings_a_free_1_single.gpkg
gis_osm_buildings_a_free_1_single; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m0,305s<br>
</div>
<div><br>
</div>
<div>> time for i in
{1..10}; do ogrinfo -qq
-spat 10 56 10.1 56.1
gis_osm_buildings_a_free_1_single.fgb
gis_osm_buildings_a_free_1_single; done</div>
<div dir="ltr">real<span style="white-space:pre-wrap"> </span>0m0,242s<br>
</div>
<div dir="ltr"><br>
</div>
<div>It still remains a TODO
to further optimize I/O to
read larger blocks if
possible (applies both to
R-tree nodes and the feature
data itself) and I think
that could prove a very
significant performance
boost potentially leaving
any competition in the dust.
:D</div>
<div><br>
</div>
<div>/Björn</div>
<br>
<div class="gmail_quote">
<div dir="ltr"
class="gmail_attr">Den lör
9 feb. 2019 kl 00:08 skrev
Björn Harrtell <<a
href="mailto:bjorn.harrtell@gmail.com"
target="_blank"
moz-do-not-send="true">bjorn.harrtell@gmail.com</a>>:<br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px
solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>I've now
implemented
spatial
filtering (at
GDAL fork <a
href="https://github.com/bjornharrtell/gdal/tree/flatgeobuf"
target="_blank"
moz-do-not-send="true">https://github.com/bjornharrtell/gdal/tree/flatgeobuf</a>) and
results are
promising, for
two test cases
about 3,5-4
times faster
than shapefile
with the same
content. I've
used these
command lines
to measure
total time
over 10
iterations for
each format
and dataset:</div>
<div><br>
</div>
<div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_roads_free_1.shp
gis_osm_roads_free_1; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m1,940s</div>
</div>
<div><br>
</div>
<div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_roads_free_1.gpkg
gis_osm_roads_free_1; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m0,750s</div>
</div>
<div><br>
</div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_roads_free_1.fgb
gis_osm_roads_free_1; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m0,461s</div>
<div><br>
</div>
<div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_buildings_a_free_1_single.shp
gis_osm_buildings_a_free_1_single; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m4,381s</div>
</div>
<div><br>
</div>
<div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_buildings_a_free_1_single.gpkg
gis_osm_buildings_a_free_1_single; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m1,624s</div>
</div>
<div><br>
</div>
<div>
<div>> time
for i in
{1..10}; do
ogrinfo -qq
-spat 10 56
10.2 56.2
gis_osm_buildings_a_free_1_single.fgb
gis_osm_buildings_a_free_1_single; done</div>
<div>real<span style="white-space:pre-wrap"> </span>0m1,113s</div>
</div>
<div><br>
</div>
<div>The
bounds 10 56
10.2 56.2
contains 21525
out of the
812547 total
features for
gis_osm_roads_free_1
and 57699 out
of 2027890
features for
gis_osm_buildings_a_free_1_single.</div>
<div><br>
</div>
<div>And there
is definitely
room for
optimization.
The
implementation
is naive and
reads the
whole R-tree
in memory and
does not yet
read
consecutive
features
(common case
as the R-tree
is balanced)
in larger I/O
blocks which I
think can be
significant.
If I reduce
the extent to
10 56 10.1
56.1 GPKG is
actually fair
bit faster
which I think
is due to it
being able to
read the index
partially.</div>
<div><br>
</div>
<div>/Björn</div>
<div><br>
</div>
<div
class="gmail_quote">
<div dir="ltr"
class="gmail_attr">Den tis 22 jan. 2019 kl 18:43 skrev Björn Harrtell
<<a
href="mailto:bjorn.harrtell@gmail.com"
target="_blank" moz-do-not-send="true">bjorn.harrtell@gmail.com</a>>:<br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>After
posting about
my
experimental
format I
realized that
I lack numbers
on the
potential
performance,
so I tried to
make some more
or less
scientific
measuring. The
results was
disappointing,
reaching
similar
performance as
shapefile for
full
sequential
reads and so I
lost interest
for a while.
But recently I
found out that
my method of
measuring was
flawed - I
measured the
time to read
into a new
shapefile
using ogr2ogr,
but it turns
out that can
be quite
unfair due to
the string
field length
dynamic resize
that ogr2ogr
will do if the
strings from
input feature
attributes are
of variable
length. I've
now made some
new
measurements
using ogrinfo
and the
undocumented
flag -qq to
get a full
sequential
read.</div>
<div><br>
</div>
<div>I've used
the shapefiles
"gis_osm_buildings_a_free_1" (exploded to single poly) and
"gis_osm_roads_free_1"
from Denmark
at <a
href="http://download.geofabrik.de/europe.html"
target="_blank" moz-do-not-send="true">http://download.geofabrik.de/europe.html</a>
as the source,
converted to
respective
format, then
measured the
time it takes
to run ogrinfo
-qq on them.
Results
(average over
three runs):</div>
<div><br>
</div>
<div>##
gis_osm_buildings_a_free_1
(2027890
Polygons)</div>
<div>* SHP:
3800 ms</div>
<div>* GPKG:
2700 ms</div>
<div>*
FlatGeobuf:
2200 ms</div>
<div>
<div><br>
</div>
<div>##
gis_osm_roads_free_1
(812547
LineStrings)</div>
<div>* SHP:
1600 ms</div>
<div>* GPKG:
1350 ms</div>
<div>*
FlatGeobuf:
800 ms</div>
</div>
<div><br>
</div>
<div>With that
my hope and
interest is
rekindled. I
believe I can
fare even
better against
the
competition
with with
spatially
filtered
searches, will
hopefully soon
have some
results on
that too.</div>
<div><br>
</div>
<div>/Björn</div>
<div dir="ltr"><br>
<div
class="gmail_quote">
<div dir="ltr">Den
sön 9 dec.
2018 kl 20:36
skrev Björn
Harrtell <<a
href="mailto:bjorn.harrtell@gmail.com" target="_blank"
moz-do-not-send="true">bjorn.harrtell@gmail.com</a>>:<br>
</div>
<blockquote
class="gmail_quote"
style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Hi
GDAL/OGR
folks,</div>
<div><br>
</div>
<div>In my
spare time
I've been
working on a
vector file
format called
FlatGeobuf
(tentatively).
The main
reason,
besides
curiosity and
learning, I'm
putting time
into it is
that I think
shapefile
still rules
the read/query
static data
performance
game, which is
kind of sad,
and probably
one of the
reasons it is
still so
widely
popular. Also,
the main
competitor
(GeoPackage)
isn't suitable
for streaming
access (AFAIK)
which
shapefiles
also handles
surprisingly
(?) well.</div>
<div><br>
</div>
<div>By using
a performance
focused write
once binary
encoding
(flatbuffers),
schema
constraint and
focusing on
read/query
performance by
clustering on
an optimal
spatial index
(Packed
Hilbert
R-Tree) I hope
to be able to
beat shapefile
performance
and at the
same time be
optimal for
streaming/cloud
access.</div>
<div><br>
</div>
<div>I think
I'm starting
to get
somewhere,
more details
and source is
at <a
href="https://github.com/bjornharrtell/flatgeobuf"
target="_blank" moz-do-not-send="true">https://github.com/bjornharrtell/flatgeobuf</a> and
I have an
early proof of
concept driver
implementation
at <a
href="https://github.com/bjornharrtell/gdal/tree/flatgeobuf"
target="_blank" moz-do-not-send="true">https://github.com/bjornharrtell/gdal/tree/flatgeobuf</a> and
results are
already quite
promising - it
can do
roundtrip
read/write and
is already
quite a bit
faster than
shapefile. I
also have
implemented
naive read
only QGIS
provider for
experimental
purposes.</div>
<div><br>
</div>
<div>Basically
I'm fishing
for interest
in this
effort, hoping
that others
will see
potential in
it even if
it's "yet
another
format" and
far from
final/stable
yet. Any
feedback is
welcome. As I
see it, GDAL
is a good
place for a
format
specification
and reference
implementation
to incubate.<br>
</div>
<div><br>
</div>
<div>Some
additional
food for
thought that
I've had
during the
experimentation:</div>
<div><br>
</div>
<div>1. The
main in memory
representations of geometry/coordinates seem to be either separate
arrays per
dimension
(GDAL
(partially?)
and QGIS) or a
single array
with dimension
as stride.
I've chosen
the latter as
of yet but I'm
still a bit
undecided.
There is of
course a
substantial
involved in
transforming
between the
two
representations
so the
situation with
two competing
models is a
bit
unfortunate.
I'm also not
sure about
which of these
models are
objectively
"better" than
the other?</div>
<div><br>
</div>
<div>2. One
could get
better space
efficiency
with protobuf
instead of
flatbuffers,
but it has a
performance
cost. I have
not gone far
into
investigating
how much
though and one
could even
reason about
supporting
both these
encodings in a
single format
specification
but it's
taking it a
bit far I
think.</div>
<div><br>
</div>
<div>3. Is
there a reason
to support
different
packing
strategies for
the R-Tree or
is Packed
Hilbert a good
compromise
(besides it
being
beautifully
simple to
implement)?</div>
<div><br>
</div>
<div>4.
FlatGeobuf is
perhaps a too
technical
name, not
catchy enough
and has a
bad/boring
abbreviation.
Other
candidates
I've
considered are
OpenGeoFile,
OpenVectorFile
or
OpenSpatialFile
but I'm
undecided. Any
ideas? :)</div>
<div><br>
</div>
<div>Regards,<br>
<br>
Björn Harrtell</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org"
target="_blank" moz-do-not-send="true">gdal-dev@lists.osgeo.org</a><br>
<a
href="https://lists.osgeo.org/mailman/listinfo/gdal-dev"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a></blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr"
class="gmail-m_-5423527897428860809gmail-m_-3064281804612992525gmail_signature">
<div dir="ltr">
<div>
<div>Darafei Praliaskouski</div>
<div>Support me: <a
href="http://patreon.com/komzpa"
target="_blank" moz-do-not-send="true">http://patreon.com/komzpa</a></div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a></pre>
</blockquote>
<p><br>
</p>
<pre class="moz-signature" cols="72">--
Hälsningar
Andreas Oxenstierna
T-Kartor Geospatial AB
Olof Mohlins väg 12 Kristianstad
mobile: +46 733 206831
mailto: <a class="moz-txt-link-abbreviated" href="mailto:ao@t-kartor.se">ao@t-kartor.se</a>
<a class="moz-txt-link-freetext" href="http://www.t-kartor.com">http://www.t-kartor.com</a></pre>
</body>
</html>