[gdal-dev] Are these WFS issues worth filing tickets?
Jukka.Rahkonen at mmmtike.fi
Mon Jan 9 16:04:16 EST 2012
Even Rouault wrote:
>> I am speaking as one who is running a WFS server. It does not make me
>> happy that users can use MAXFEATURES if the default is that it is not used.
>> I have more than 50 featureTypes, several of them containing a few hundred
>> thousand features and the biggest one has 1.2 million polygons. I do not
>> use maxFeatures on the server side because a certain point layer with
>> 800000 features is reasonable for unrestricted getFeature. Still my tiny
>> server is spending nearly 15 minutes for sending it out.
>> Doing "ogrinfo -al -so" for my server means reading approximately 500
>> megabytes of data through the web and the ogrinfo result is still broken
>> because some feature types get truncated due to timeout after 15 minutes.
>> I know that I must be ready to take this because I have decided to run an
>> open WFS service and anybody can make such requests it they wish. However,
>> I am not sure if this should be the default behaviour of GDAL. I would
>> rather see that folks should be active before they hurt my service and
>> other users with GDAL.
>> How about setting MAXFEATURES to, let's say, at maximum 5000 by default?
>> By comparison, GDAL WFS driver itself seems to use only 100 features as a
>> default page size if PagingAllowed is selected.
> I don't know. Someone that pastes the bare GetFeature request without a
> MAXFEATURES in his web browser will also make suffer your server. It is both a
> legitimate use case in some situations and a potential situation of denial of
> service in others. I also guess that if you add your server as a WFS
> datasource into QGIS and you add a layer in your map, it will try to fetch all
> the features.
I have not said anything against ogr2ogr which I understand as equivalent for
a GIS client that is adding a layer from WFS into a map project. Of course there must
not be any automatically set additional limits then because user really wants to get
the features home. But I have been thinking that ogrinfo is intended for getting
some describing info about the dataset and it is often used priot to ogr2ogr.
But I have also learned that ogrinfo can do much more than to show info about
> I don't want to deny there is a problem when issuing ogrinfo, but I'm not sure
> that setting a default MAXFEATURES on client side is appropriate. The issue is
> more with GetExtent() and the fact that the reported extent in GetCapabilities
> is reported as WGS84 and not in the default SRS, or that the values are
> sometimes junk.
Sure they are often junk but doing getFeature is an expensive way for getting the
correct information. Ald also this result may be unreliable if it is interpreted to
describe the whole WFS feature type it getFeature request hits the server side
maxFeatures limit. In this case ogrinfo can only report the extents of the first
[maxFeatures] of the feature type.
How about having an option -TRUST_GETCAPABILITIES=TRUE and use it as
a default value? So number of feature could be examined with resulttype=hits
and extents could be taken as they are in the getCapabilities. Or perhaps they
could be reported as unknown if reprojecting the lat/lon bounding box feels
bad or is inaccurate/impossible? Extents are not always so interesting.
>> For those who do not know, ogrinfo wants to read the whole feature type so
>> it can calculate the exact bounds of the dataset. Data are not stored and
>> if user selects to fetch the data for local use another getFeature must be
> Not true. The result of the GetFeature is stored in RAM (if you must download
> 500 MB then I hope you have enough RAM) and reused for all the next calls to
> the OGR API, until it gets invalidated by a change/setting of
> spatial/attribute filter. I've verified that GetFeature was emitted only once
> with a ogrinfo -al on a layer.
What do you get with these two requests (check layer info - download and convert
to shapefile) against this small WFS feature type?
ogrinfo WFS:http://hip.latuviitta.org/cgi-bin/tinyows municipalities
ogr2ogr -f "ESRI Shapefile" test.shp WFS:http://hip.latuviitta.org/cgi-bin
I can see that for me with a Windows laptop two sets of requests are sent.
I am ready to surrender, though, with one tiny remark: Is is necessary to do
at all the first getFeature with resulttype=hits if the next request is always a full
getFeature? GDAL can count the number of features returned by thet second
query and the resulttype=hits one can also be rather expensive for the WFS server.
More information about the gdal-dev