[pdal] Oracle PDAL queries not scaling
Oscar Martinez Rubi
o.rubi at esciencecenter.nl
Tue Aug 11 07:26:36 PDT 2015
Hi,
I have investigated a bit more this issue.
I wanted to see what data does the OCI reader actually reads, so I
executed the pre-selection query out of PDAL, in python. So I run the
attached script (freshly after loading the data).
The script runs the same exact query as done in PDAL twice and prints
for each run the number of returned blocks, time spent in the query and
time spent to fetch the results.
the exact query is:
SELECT
l."OBJ_ID", l."BLK_ID", l."BLK_EXTENT", l."BLK_DOMAIN",
l."PCBLK_MIN_RES", l."PCBLK_MAX_RES", l."NUM_POINTS",
l."NUM_UNSORTED_POINTS", l."PT_SORT_DIM", l."POINTS", b.pc
FROM
AHN_BLCK l, AHN_BASE b, QUERY_POLYGONS g
WHERE
l.obj_id = b.id
AND
SDO_FILTER(l.blk_extent,g.geom) = 'TRUE' AND g.id = 1;
The results I get are:
20M run 1:
#blocks: 12
query time[s]: 0.113833904266
fetch time[s]: *0.571593046188*
20M run 2:
#blocks: 12
query time[s]: 0.000102996826172
fetch time[s]: *0.500910997391*
210M run 1:
#blocks: 13
query time[s]: 0.0586049556732
fetch time[s]: *5.09832000732*
210M run 2:
#blocks: 13
query time[s]: 0.000245094299316
fetch time[s]: *5.05038785934*
2201M run 1:
#blocks: 13
query time[s]: 0.070690870285
fetch time[s]: *52.4960811138*
2201M run 2:
#blocks: 13
query time[s]: 0.000225067138672
fetch time[s]: *53.1006689072*
So, even though the query times and the number of returned blocks are
similar the fetch times are the not. We can see the scaling issue there.
Somehow the fetching is much more expensive (10x) when points are 10x.
I also noticed that after a while doing queries the times get much
better and scalable even when I do new queries with other polygons. So,
the first queries suffer of scaling issues but later it gets better.
Any idea why?
Regards,
O.
On 10-08-15 18:40, Albert Godfrind wrote:
> Like many tools that access an Oracle database, we lack the ability to
> see what actually happens in the database at a detailed level, i.e.
> which actual queries are sent, and how the database executes them in
> terms of CPU use, logical and physical I/Os, network throughput and
> latency.
>
> So I think it is important to add some debugging / tracing facility to
> let me see what happens:
>
> 1) An option to make PDAL (actually the OCI driver here) log each SQL
> statement it executes, together with the elapsed time and the number
> of rows (blocks) fetched. Obviously we have that statement in the
> input XML file, but a trace would put everything in a single log and
> include proper measurements.
>
> 2) More important: an option to make the OCI driver enable SQL tracing
> at the database side. This is simple to do by just issuing an “ALTER
> SESSION …” statement before running the queries. The resulting trace
> will show all details about execution times as well as resource
> consumption (CPU and IO) and wait times. That could be added as an
> option in the XML file. Or maybe extend the XML file with the option
> to specify a SQ statement to be performed before each query (we could
> then use that to manually add the ALTER SESSION statement.
>
> The resulting trace can help isolate the bottleneck as one of:
>
> 1) the I/Os in the database, to fetch the blocks from disk (mostly I/O)
> 2) the network time to pass the blocks to the PDAL client (network
> throughput and latency)
> 3) the time to process the blocks in the PDAL client (mostly CPU)
>
> Albert
>
>> On 5-Aug-2015, at 12:26, Oscar Martinez Rubi
>> <o.rubi at esciencecenter.nl <mailto:o.rubi at esciencecenter.nl>> wrote:
>>
>> Hi,
>>
>> I did a test to see how good Oracle with PDAL scale with bigger data
>> sets. I had 3 datasets that are self-contained with 20M, 210M and
>> 2201M points. I loaded them in different Oracle DBs with PDAL and
>> laz-perf. And, for each of them I ran 7 queries (via a pdal pipeline
>> that preselects blocks, applies a crop and then write to a LAS file)
>>
>> The results are in the attached file.
>>
>> Regarding the loading, for the 20M I only used one core (it is only
>> one file) while for the others I used 16 cores, i.e. 16 simult. PDAL
>> instances loading data to Oracle. I opened an issue in GitHub because
>> I noticed that in some of the runs the size that I got was too large,
>> and I do not know what caused that. The attached numbers are when
>> everything seemed to work and the sizes were as expected.
>>
>> This message, though, is about the queries. Each query is run twice
>> in each DB. As you can see in the results file, for 10x more points
>> in the data set the queries are 10x slower, at least for the first
>> run (with the 2201M the second run is much faster but this does not
>> happen with the 210M).
>>
>> Find also attached one of the XML that i used for the queries
>> (example is for query1). Note that the geometry is previously
>> inserted in oracle so I can use to pre-filter blocks with the query
>> option in oci reader
>>
>> First I though that maybe the query option in the oci reader in the
>> XML was ignored and that all the blocks of the dataset were being
>> processed by PDAL (that would explain 10x more points 10x slower
>> queries) but I ran a pdal pipeline for query1 with verbose and I saw
>> that the crop filter "only" processed 120000 points which makes sense
>> taking into account that region of query 1 only has 74818 points. Or
>> maybe the crop still process all the blocks extents but only opens
>> and decompress the points of the overlapping ones?
>>
>> Any idea what is happening?
>>
>> Regards,
>>
>> O.
>> <results.txt><query1.xml>
>
> --
> ORACLE <http://www.oracle.com>
> Albert Godfrind | Geospatial technologies | Tel: +33 4 93 00 80 67|
> Mobile: +33 6 09 97 27 23| Skype:albert-godfrind <skype:albert-godfrind>
> OracleServer Technologies
> 400 Av. Roumanille,BP 309 |06906 Sophia Antipolis cedex|France
> Everything you ever wanted to know about Oracle Spatial
> <http://www.apress.com/9781590598993>
>
> <http://www.locationintelligence.net/dc/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20150811/263a71ba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 1907 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20150811/263a71ba/attachment-0001.py>
More information about the pdal
mailing list