<div dir="ltr"><div>>
The effect will at least be to ignore any rows for which this message
was raised - the function is unconditionally exited after the error is
raised, before a new feature is added to the current layer. <br></div><div><br></div><div>So do I understand correctly that for files containing roughly more than 100000 lines, rows that contain more columns of data than the detected headers are not readable?</div><div>Because if that is the case I will be required to patch my gdal version to not skip these lines.</div><div><br></div><div>Regards,</div><div>Dirk<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 29, 2022 at 8:09 PM Daniel Evans <<a href="mailto:daniel.fred.evans@gmail.com">daniel.fred.evans@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>>
does the error impact the returned data? <br></div><div><br></div><div>The effect will at least be to ignore any rows for which this message was raised - the function is unconditionally exited after the error is raised, before a new feature is added to the current layer.<br></div><div><br></div><div>>
Is there a way to suppress this error without disabling the gdal log
handling. My logs are flooded with these messages, modifying the xlsx
files is not an option because there are many and they are supplied by
clients and regularly updated. <br></div><div><br></div><div>I suspect the only way is by providing GDAL with a custom error handler, which ignores this specific message and otherwise delegates back to CPLDefaultErrorHandler() (or prints to stderr itself).<br></div><div><br></div><div>Regards,</div><div>Daniel<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 29 Mar 2022 at 09:20, Dirk Vanden Boer <<a href="mailto:dirk.vdb@gmail.com" target="_blank">dirk.vdb@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Scanning through the file, it turns out 2 lines actually have a value in the eight column, that's why the column is present, it doesn't have a header for that column however.</div><div><br></div><div>So I have 2 questions:</div><div>- does the error impact the returned data?<br></div><div>- Is there a way to suppress this error without disabling the gdal log handling. My logs are flooded with these messages, modifying the xlsx files is not an option because there are many and they are supplied by clients and regularly updated.</div><div><br></div><div>Regards,</div><div>Dirk<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 29, 2022 at 10:06 AM Daniel Evans <<a href="mailto:daniel.fred.evans@gmail.com" target="_blank">daniel.fred.evans@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Dirk,</div><div><br></div><div>>
I do notice when I open the file in excel and select everything, the eight column in the file is empty but also gets selected. <br></div><div><br></div><div>It looks like that's the key here.</div><div><br></div><div>The code you identified gets hit if GDAL encounters a row with more populated columns than the previous one, and if the product of (previous numbers of rows read) x (number of columns to be added) is too high (>100,000), GDAL gives the error you're getting. That functionality was added in commit 4f3f1fa [1], in response to an OSSFuzz vulnerability report noting that GDAL becomes very slow if an Excel file adds many extra columns after reading many rows already (presumably as it has to modify every feature already read). I think this is where Even would start pointing out that there's downsides to such automated security scanners, as the distinction between "it's just slow for large files" (>25s in the report) and "an actual DOS attack" is awkward when dealing with typical GIS data volumes.<br></div><div><br></div><div>Are you sure the 8th column contains no data at all? Even if it is empty, my experience is that Excel can be pretty stubborn about saving empty columns that have contained data at some point in the file's history. From memory, selecting the whole column, deleting it, and saving again usually convinces Excel to no longer save it.</div><div><br></div><div>Regards,</div><div>Daniel<br></div><div><br></div><div>[1] <a href="https://github.com/OSGeo/gdal/commit/4f3f1facc5da0eeac71f6b1ba946b7618386ee7d" target="_blank">https://github.com/OSGeo/gdal/commit/4f3f1facc5da0eeac71f6b1ba946b7618386ee7d</a></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 29 Mar 2022 at 08:41, Dirk Vanden Boer <<a href="mailto:dirk.vdb@gmail.com" target="_blank">dirk.vdb@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>
<div>When reading xlsx files that contains a lot of lines gdal reports the following error multiple times:</div><div>| Adding too many columns to too many existing features</div><div><br></div><div>It comes from the the xlsx driver:</div><div>GIntBig nFeatureCount = poCurLayer->GetFeatureCount(false);<br>if( nFeatureCount > 0 &&<br> static_cast<size_t>(apoCurLineValues.size() -<br> poCurLayer->GetLayerDefn()->GetFieldCount()) ><br> static_cast<size_t>(100000 / nFeatureCount) )<br>{<br> CPLError(CE_Failure, CPLE_NotSupported,<br> "Adding too many columns to too many "<br> "existing features");<br> return;<br>}</div><div><br></div><div>The featureCount in my case is 128741</div><div>apoCurLineValues.size() = 8</div><div>fieldCount = 7<br></div><div><br></div><div>Why is this error reported? Does it impact the actual read data?</div><div>I do notice when I open the file in excel and select everything, the eight column in the file is empty but also gets selected.<br></div><div><br></div><div>Kind regards,</div><div>Dirk</div>
</div></div>
_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>