[GRASS-SVN] r66521 - in grass-addons/tools/addons: . test test/data

svn_grass at osgeo.org svn_grass at osgeo.org
Fri Oct 16 14:11:47 PDT 2015


Author: wenzeslaus
Date: 2015-10-16 14:11:47 -0700 (Fri, 16 Oct 2015)
New Revision: 66521

Added:
   grass-addons/tools/addons/test/data/g.almost.empty.html
Modified:
   grass-addons/tools/addons/get_page_description.py
   grass-addons/tools/addons/test/test_description_extraction.sh
Log:
end seach for description text for addons index page sooner

This fixes r66518 which was going over to other sections
and including them to description (happens for pages with short
descriptions without dot).


Modified: grass-addons/tools/addons/get_page_description.py
===================================================================
--- grass-addons/tools/addons/get_page_description.py	2015-10-16 20:44:55 UTC (rev 66520)
+++ grass-addons/tools/addons/get_page_description.py	2015-10-16 21:11:47 UTC (rev 66521)
@@ -81,8 +81,9 @@
     # is not the sentence
     text = re.split(r"\.(\s|$)", text, 1)[0]
     text = remove_unwanted_tags(text)
-    # strip spaces at the beginning and add the tripped dot back
-    return text.lstrip() + '.'
+    # strip spaces from the ends and add the stripped dot back
+    # TODO: unify the behavior with dot, some modules have it, some don't
+    return text.strip() + '.'
 
 
 def main(filename):
@@ -92,12 +93,17 @@
         in_desc_section = False
         desc_section = ''
         desc_section_num_lines = 0
+        # one empty after heading and then a longer sentence over two lines
+        desc_section_max_lines = 3
+        # we expect h2 level
         desc_block_start = re.compile(r'<h2.*>NAME.*/h.>', flags=re.IGNORECASE)
         # the incomplete manual pages have NAME followed by DESCRIPTION
         desc_block_end = re.compile(r'<h2.*>(KEYWORDS|DESCRIPTION).*/h.>',
                                     flags=re.IGNORECASE)
         desc_section_start = re.compile(r'<h2.*>DESCRIPTION.*/h.>',
                                         flags=re.IGNORECASE)
+        #desc_section_end = re.compile(r'<h2.*>.*<.*/h.>', flags=re.IGNORECASE)
+        desc_section_end = re.compile(r'<h2.*>.*/h.>', flags=re.IGNORECASE)
         desc_line = re.compile(r' - ')
         comment_meta_desc_line = re.compile(r'<!-- meta page description:.*-->')
         for line in page_file:
@@ -117,10 +123,14 @@
             # if there was nothing else, last thing to try is get the first
             # sentence from the description section (which is also last
             # item in the file from all things we are trying
+            if in_desc_section and desc_section_end.search(line):
+                in_desc_section = False
+            # we need to store line after we matched for start
+            # and not store the line matched for end
             if in_desc_section:
                 desc_section += line + "\n"
                 desc_section_num_lines += 1
-                if desc_section_num_lines > 4:
+                if desc_section_num_lines > desc_section_max_lines:
                     in_desc_section = False
             if not desc and desc_section_start.search(line):
                 in_desc_section = True

Copied: grass-addons/tools/addons/test/data/g.almost.empty.html (from rev 66518, grass-addons/tools/addons/test/data/g.broken.example.html)
===================================================================
--- grass-addons/tools/addons/test/data/g.almost.empty.html	                        (rev 0)
+++ grass-addons/tools/addons/test/data/g.almost.empty.html	2015-10-16 21:11:47 UTC (rev 66521)
@@ -0,0 +1,39 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+<title>GRASS GIS Manual (test page): r.broken.example</title>
+<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
+</head>
+<body>
+
+<hr class="header">
+
+<h2>NAME</h2>
+<em><b>r.broken.example</b></em> <h2>KEYWORDS</h2>
+
+<h2>DESCRIPTION</h2>
+This doesn't have much description or punctuation<br>
+However, it has a some bad HTML
+<h2>SEE ALSO</h2>
+
+<em>
+<a href="wxGUI.components.html">wxGUI components</a><br>
+</em>
+
+<h2>AUTHORS</h2>
+
+Random Author
+
+<p>
+<i>Data placeholder: 2015-09-06 (Sun, 06 Sep 2015)</i><hr class="header">
+<p>
+<a href="index.html">Main index</a>
+<p>
+© 2003-2015
+<a href="http://grass.osgeo.org">GRASS Development Team</a>,
+GRASS GIS x.x Reference Manual (test page)
+</p>
+
+</div>
+</body>
+</html>

Modified: grass-addons/tools/addons/test/test_description_extraction.sh
===================================================================
--- grass-addons/tools/addons/test/test_description_extraction.sh	2015-10-16 20:44:55 UTC (rev 66520)
+++ grass-addons/tools/addons/test/test_description_extraction.sh	2015-10-16 21:11:47 UTC (rev 66521)
@@ -10,3 +10,4 @@
 ../get_page_description.py data/wxGUI.example.html
 ../get_page_description.py data/g.broken.example.html
 ../get_page_description.py data/g.no.keywords.html
+../get_page_description.py data/g.almost.empty.html



More information about the grass-commit mailing list