<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">Le 29/03/2022 à 20:29, Dirk Vanden Boer
      a écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:CADuK5a3EHbot+pzLG_oRf3FD7TSpRoB64VJRBisfyhpEH4wLAw@mail.gmail.com">
      <div dir="ltr">
        <div>> The effect will at least be to ignore any rows for
          which this message was raised - the function is
          unconditionally exited after the error is raised, before a new
          feature is added to the current layer. <br>
        </div>
        <div><br>
        </div>
        <div>So do I understand correctly that for files containing
          roughly more than 100000 lines, rows that contain more columns
          of data than the detected headers are not readable?</div>
        <div>Because if that is the case I will be required to patch my
          gdal version to not skip these lines.</div>
      </div>
    </blockquote>
    Please file an issue about that at
    <a class="moz-txt-link-freetext" href="https://github.com/OSGeo/gdal/issues">https://github.com/OSGeo/gdal/issues</a><br>
    <blockquote type="cite"
cite="mid:CADuK5a3EHbot+pzLG_oRf3FD7TSpRoB64VJRBisfyhpEH4wLAw@mail.gmail.com">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Regards,</div>
        <div>Dirk<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Mar 29, 2022 at 8:09
          PM Daniel Evans <<a
            href="mailto:daniel.fred.evans@gmail.com"
            moz-do-not-send="true" class="moz-txt-link-freetext">daniel.fred.evans@gmail.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote">
          <div dir="ltr">
            <div>> does the error impact the returned data? <br>
            </div>
            <div><br>
            </div>
            <div>The effect will at least be to ignore any rows for
              which this message was raised - the function is
              unconditionally exited after the error is raised, before a
              new feature is added to the current layer.<br>
            </div>
            <div><br>
            </div>
            <div>> Is there a way to suppress this error without
              disabling the gdal log handling. My logs are flooded with
              these messages, modifying the xlsx files is not an option
              because there are many and they are supplied by clients
              and regularly updated. <br>
            </div>
            <div><br>
            </div>
            <div>I suspect the only way is by providing GDAL with a
              custom error handler, which ignores this specific message
              and otherwise delegates back to CPLDefaultErrorHandler()
              (or prints to stderr itself).<br>
            </div>
            <div><br>
            </div>
            <div>Regards,</div>
            <div>Daniel<br>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Tue, 29 Mar 2022 at
              09:20, Dirk Vanden Boer <<a
                href="mailto:dirk.vdb@gmail.com" target="_blank"
                moz-do-not-send="true" class="moz-txt-link-freetext">dirk.vdb@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote">
              <div dir="ltr">
                <div>Scanning through the file, it turns out 2 lines
                  actually have a value in the eight column, that's why
                  the column is present, it doesn't have a header for
                  that column however.</div>
                <div><br>
                </div>
                <div>So I have 2 questions:</div>
                <div>- does the error impact the returned data?<br>
                </div>
                <div>- Is there a way to suppress this error without
                  disabling the gdal log handling. My logs are flooded
                  with these messages, modifying the xlsx files is not
                  an option because there are many and they are supplied
                  by clients and regularly updated.</div>
                <div><br>
                </div>
                <div>Regards,</div>
                <div>Dirk<br>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Tue, Mar 29, 2022
                  at 10:06 AM Daniel Evans <<a
                    href="mailto:daniel.fred.evans@gmail.com"
                    target="_blank" moz-do-not-send="true"
                    class="moz-txt-link-freetext">daniel.fred.evans@gmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote">
                  <div dir="ltr">
                    <div>Hi Dirk,</div>
                    <div><br>
                    </div>
                    <div>> I do notice when I open the file in excel
                      and select everything, the eight column in the
                      file is empty but also gets selected. <br>
                    </div>
                    <div><br>
                    </div>
                    <div>It looks like that's the key here.</div>
                    <div><br>
                    </div>
                    <div>The code you identified gets hit if GDAL
                      encounters a row with more populated columns than
                      the previous one, and if the product of (previous
                      numbers of rows read) x (number of columns to be
                      added) is too high (>100,000), GDAL gives the
                      error you're getting. That functionality was added
                      in commit 4f3f1fa [1], in response to an OSSFuzz
                      vulnerability report noting that GDAL becomes very
                      slow if an Excel file adds many extra columns
                      after reading many rows already (presumably as it
                      has to modify every feature already read). I think
                      this is where Even would start pointing out that
                      there's downsides to such automated security
                      scanners, as the distinction between "it's just
                      slow for large files" (>25s in the report) and
                      "an actual DOS attack" is awkward when dealing
                      with typical GIS data volumes.<br>
                    </div>
                    <div><br>
                    </div>
                    <div>Are you sure the 8th column contains no data at
                      all? Even if it is empty, my experience is that
                      Excel can be pretty stubborn about saving empty
                      columns that have contained data at some point in
                      the file's history. From memory, selecting the
                      whole column, deleting it, and saving again
                      usually convinces Excel to no longer save it.</div>
                    <div><br>
                    </div>
                    <div>Regards,</div>
                    <div>Daniel<br>
                    </div>
                    <div><br>
                    </div>
                    <div>[1] <a
href="https://github.com/OSGeo/gdal/commit/4f3f1facc5da0eeac71f6b1ba946b7618386ee7d"
                        target="_blank" moz-do-not-send="true"
                        class="moz-txt-link-freetext">https://github.com/OSGeo/gdal/commit/4f3f1facc5da0eeac71f6b1ba946b7618386ee7d</a></div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Tue, 29 Mar
                      2022 at 08:41, Dirk Vanden Boer <<a
                        href="mailto:dirk.vdb@gmail.com" target="_blank"
                        moz-do-not-send="true"
                        class="moz-txt-link-freetext">dirk.vdb@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote">
                      <div dir="ltr">
                        <div>Hi,</div>
                        <div><br>
                        </div>
                        <div>
                          <div>When reading xlsx files that contains a
                            lot of lines gdal reports the following
                            error multiple times:</div>
                          <div>| Adding too many columns to too many
                            existing features</div>
                          <div><br>
                          </div>
                          <div>It comes from the the xlsx driver:</div>
                          <div>GIntBig nFeatureCount =
                            poCurLayer->GetFeatureCount(false);<br>
                            if( nFeatureCount > 0 &&<br>
                               
                            static_cast<size_t>(apoCurLineValues.size()
                            -<br>
                                   
                            poCurLayer->GetLayerDefn()->GetFieldCount())
                            ><br>
                                        static_cast<size_t>(100000
                            / nFeatureCount) )<br>
                            {<br>
                                CPLError(CE_Failure, CPLE_NotSupported,<br>
                                            "Adding too many columns to
                            too many "<br>
                                            "existing features");<br>
                                return;<br>
                            }</div>
                          <div><br>
                          </div>
                          <div>The featureCount in my case is 128741</div>
                          <div>apoCurLineValues.size() = 8</div>
                          <div>fieldCount = 7<br>
                          </div>
                          <div><br>
                          </div>
                          <div>Why is this error reported? Does it
                            impact the actual read data?</div>
                          <div>I do notice when I open the file in excel
                            and select everything, the eight column in
                            the file is empty but also gets selected.<br>
                          </div>
                          <div><br>
                          </div>
                          <div>Kind regards,</div>
                          <div>Dirk</div>
                        </div>
                      </div>
                      _______________________________________________<br>
                      gdal-dev mailing list<br>
                      <a href="mailto:gdal-dev@lists.osgeo.org"
                        target="_blank" moz-do-not-send="true"
                        class="moz-txt-link-freetext">gdal-dev@lists.osgeo.org</a><br>
                      <a
                        href="https://lists.osgeo.org/mailman/listinfo/gdal-dev"
                        rel="noreferrer" target="_blank"
                        moz-do-not-send="true"
                        class="moz-txt-link-freetext">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
                    </blockquote>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
gdal-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/gdal-dev">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
  </body>
</html>