python基础教程2第20章项目1：即时标记

simple_markup.py

 1 import sys, re
 2 from util import *
 3
 4 print(‘<html><head><title>...</title><body>‘)
 5
 6 title = True
 7 for block in blocks(sys.stdin):
 8     block = re.sub(r‘\*(.+?)\*‘,r‘<em>\1</em>‘,block)
 9     if title:
10         print(‘<h1>‘)
11         print(block)
12         print(‘</h1>‘)
13         title = False
14     else:
15         print(‘<p>‘)
16         print(block)
17         print(‘</p>‘)
18
19 print(‘</body></html>‘)

handler_first.py

 1 class Handler:
 2     """
 3     处理从Parser调用的方法的对象。
 4
 5     这个解析器会在每个块的开始部分调用start()和end()方法，使用合适的块作为参数。
 6     sub()方法会作用于正则表达式的替换中。当使用了‘emphasis‘这样的名字调用时，它会
 7     返回合适的替换函数。
 8     """
 9     def callback(self, prefix, name, *args):
10         method = getattr(self, prefix+name, None)
11         if callable(method): return method(*args)

handler.py

 1 class Handler:
 2     """
 3     处理从Parser调用的方法的对象。
 4
 5     这个解析器会在每个块的开始部分调用start()和end()方法，使适用合适的
 6     块名作为参数。sub()方法会用于正则表达式替换中。当使用了‘emphasis‘
 7     这样的名字调用时，它会返回合适的替换函数。
 8     """
 9     def callback(self, prefix, name, *args):
10         method = getattr(self, prefix+name, None)
11         if callable(method):return method(*args)
12     def start(self, name):
13         self.callback(‘start_‘, name)
14     def end(self, name):
15         self.callback(‘end_‘, name)
16     def sub(self, name):
17         def substitution(match):
18             result = self.callback(‘sub_‘,name, match)
19             if result is None:result= match.group(0)
20             return result
21         return substitution
22
23 class HTMLRenderer(Handler):
24     """
25     用于生成HTML的具体处理程序
26
27     HTMLRenderer内的方法都可以通过超类处理程序的start()、
28     end()和sub()方法来访问。它们实现了用于HTML文档的基本标签。
29     """
30     def start_document(self):
31         print(‘<html><head><title>...</title></head><body>‘)
32     def end_document(self):
33         print(‘</body></html>‘)
34     def start_paragraph(self):
35         print(‘<p>‘)
36     def end_paragraph(self):
37         print(‘</p>‘)
38     def start_heading(self):
39         print(‘<h2>‘)
40     def end_heading(self):
41         print(‘</h2>‘)
42     def start_list(self):
43         print(‘<ul>‘)
44     def end_list(self):
45         print(‘</ul>‘)
46     def start_listitem(self):
47         print(‘<li>‘)
48     def end_listitem(self):
49         print(‘</li>‘)
50     def start_title(self):
51         print(‘<h1>‘)
52     def end_title(self):
53         print(‘</h1>‘)
54     def sub_emphasis(self, match):
55         return ‘<em>%s</em>‘ % match.group(1)
56     def sub_url(self, match):
57         return ‘<a href="%s">%s</a>‘ % (match.group(1), match.group(1))
58     def sub_email(self, match):
59         return ‘<a href="mailto:%s">%s</a>‘ % (match.group(1), match.group(1))
60     def feed(self, data):
61         print(data)

markup.py

 1 import sys, re
 2 from handlers import *
 3 from util import *
 4 from rules import *
 5
 6 class Parser:
 7     """
 8     语法分析器读取文本文件、应用规则并且控制处理器程序
 9     """
10
11     # 初始化成员，handler,rules(),filters()
12     def __init__(self, handler):
13         self.handler = handler
14         self.rules = []
15         self.filters = []
16
17     # 添加rule,方便扩展
18     def addRule(self, rule):
19         self.rules.append(rule)
20
21     # 添加过滤器， 方便扩展
22     def addFilter(self, pattern, name):
23         def filter(block, handler):
24             return re.sub(pattern, handler.sub(name), block)
25
26         self.filters.append(filter)
27
28
29     # 方法parse，读取文本（调用util.py的blocks(file)）并分成block，
30     # 使用循环用规则(rule)和过滤器（filter(block, handler)处理block，
31     def parse(self, file):
32         self.handler.start(‘document‘)
33         for block in blocks(file):
34             for filter in self.filters:
35                 block = filter(block, self.handler)
36             for rule in self.rules:
37                 if rule.condition(block):
38                     if rule.action(block,self.handler):break
39
40         self.handler.end(‘document‘)
41
42 # Parser派生出的具体的类（通过添加具体的规则和过滤器）：用于添加HTML标记
43
44 class BasicTextParser(Parser):
45     """
46     在构造函数中增加规则和过滤器的具体语法分析器
47     """
48     def __init__(self, handler):
49         Parser.__init__(self, handler)
50         self.addRule(ListRule())
51         self.addRule(ListItemRule())
52         self.addRule(TitleRule())
53         self.addRule(HeadingRule())
54         self.addRule(ParagraphRule())
55
56         self.addFilter(r‘\*(.+?)\*‘, ‘emphasis‘)
57         self.addFilter(r‘(http://[\.a-zA-Z/]+)‘, ‘url‘)
58         self.addFilter(r‘([\.a-zA-Z][email protected][\.a-zA-Z]+[a-zA-Z]+)‘, ‘mail‘)
59
60 # 主程序：构造handler实例，构造parser（使用对应的handler）实例,调用parser的方法parser进行对文本的处理
61 handler = HTMLRenderer()
62 parser = BasicTextParser(handler)
63
64 parser.parse(sys.stdin)

rules.py

 1 class Rule:
 2     """
 3     所有规则的基类
 4     """
 5
 6     def action(self, block, handler):
 7         handler.start(self.type)
 8         handler.feed(block)
 9         handler.end(self.type)
10         return True
11
12 class HeadingRule(Rule):
13     """
14     A heading is a single line that is at most 70 characters and
15     that doesn‘t end with a colon.
16     """
17     type = ‘heading‘
18
19     def condition(self, block):
20         return not ‘\n‘ in block and len(block) <= 70 and not block[-1] == ‘:‘
21
22 class TitleRule(HeadingRule):
23     """
24     The title is the first block in the document, provided that
25     it is a heading.
26     """
27     type = ‘title‘
28     first = True
29
30     def condition(self, block):
31         if not self.first: return False
32         self.first = False
33         return HeadingRule.condition(self, block)
34
35 class ListItemRule(Rule):
36     """
37     A list item is a paragraph that begins with a hyphen. As part of the
38     formatting, the hyphen is removed.
39     """
40     type = ‘listitem‘
41
42     def condition(self, block):
43         return block[0] == ‘-‘
44
45     def action(self, block, handler):
46         handler.start(self.type)
47         handler.feed(block[1:].strip())
48         handler.end(self.type)
49         return True
50
51 class ListRule(ListItemRule):
52     """
53     A list begins between a block that is not a list item and a
54     subsequent list item. It ends after the last consecutive list item.
55     """
56     type = ‘list‘
57     inside = False
58
59     def condition(self, block):
60         return True
61
62     def action(self, block, handler):
63         if not self.inside and ListItemRule.condition(self, block):
64             handler.start(self.type)
65             self.inside = True
66         elif self.inside and not ListItemRule.condition(self, block):
67             handler.end(self.type)
68             self.inside = False
69         return False
70
71 class ParagraphRule(Rule):
72     """
73     A paragraph is simply a block that isn‘t covered by any of the other rules.
74     """
75     type = ‘paragraph‘
76
77     def condition(self, block):
78         return True

util.py

 1 def lines(file):
 2     # 将文件打包成行生成器，只是在文件的最后一行添加一个空行
 3     for line in file:yield line
 4     yield ‘\n‘
 5
 6 def blocks(file):
 7     block=[]
 8     for line in lines(file):
 9         #搜集所有的行，返回一个字符串
10         if line.strip():
11             block.append(line)
12         #遇到空格，然后返回所有的行，也不会搜集空行
13         elif block: #如果不判断就会将初始化的空行返回，遇到空行结束搜集，如果文件尾没有空行，则不会结束。
14             yield ‘‘.join(block).strip()
15             block=[]#新生成一个空列表，继续以上的步骤

test_input.py

 1 Welcome to World Wide Spam, Inc.
 2
 3 These are the corporate web pages of *World Wide Spam*, Inc. We hope
 4 you find your stay enjoyable, and that you will sample many of our
 5 products.
 6
 7 A short history of the company
 8
 9 World Wide Spam was started in the summer of 2000. The business
10 concept was to ride the dot-com wave and to make money both through
11 bulk email and by selling canned meat online.
12
13 After receiving several complaints from customers who weren‘t
14 satisfied by their bulk email, World Wide Spam altered their profile,
15 and focused 100% on canned goods. Today, they rank as the world‘s
16 13,892nd online supplier of SPAM.
17
18 Destinations
19
20 From this page you may visit several of our interesting web pages:
21
22    - What is SPAM? (http://wwspam.fu/whatisspam)
23
24    - How do they make it? (http://wwspam.fu/howtomakeit)
25
26    - Why should I eat it? (http://wwspam.fu/whyeatit)
27
28 How to get in touch with us
29
30 You can get in touch with us in *many* ways: By phone(555-1234), by
31 email ([email protected]) or by visiting our customer feedback page
32 (http://wwspam.fu/feedback).

test_output.py

 1 <html><head><title>...</title></head><body>
 2 <h1>
 3 Welcome to World Wide Spam, Inc.
 4 </h1>
 5 <p>
 6 These are the corporate web pages of <em>World Wide Spam</em>, Inc. We hope
 7 you find your stay enjoyable, and that you will sample many of our
 8 products.
 9 </p>
10 <h2>
11 A short history of the company
12 </h2>
13 <p>
14 World Wide Spam was started in the summer of 2000. The business
15 concept was to ride the dot-com wave and to make money both through
16 bulk email and by selling canned meat online.
17 </p>
18 <p>
19 After receiving several complaints from customers who weren‘t
20 satisfied by their bulk email, World Wide Spam altered their profile,
21 and focused 100% on canned goods. Today, they rank as the world‘s
22 13,892nd online supplier of SPAM.
23 </p>
24 <h2>
25 Destinations
26 </h2>
27 <p>
28 From this page you may visit several of our interesting web pages:
29 </p>
30 <ul>
31 <li>
32 What is SPAM? (<a href="http://wwspam.fu/whatisspam">http://wwspam.fu/whatisspam</a>)
33 </li>
34 <li>
35 How do they make it? (<a href="http://wwspam.fu/howtomakeit">http://wwspam.fu/howtomakeit</a>)
36 </li>
37 <li>
38 Why should I eat it? (<a href="http://wwspam.fu/whyeatit">http://wwspam.fu/whyeatit</a>)
39 </li>
40 </ul>
41 <h2>
42 How to get in touch with us
43 </h2>
44 <p>
45 You can get in touch with us in <em>many</em> ways: By phone(555-1234), by
46 email ([email protected]) or by visiting our customer feedback page
47 (<a href="http://wwspam.fu/feedback">http://wwspam.fu/feedback</a>).
48 </p>
49 </body></html>

readme.txt

 1 20.1 问题是什么
 2     将文本由程序自动转成HTML
 3
 4     要做的工作基本上就是首先将各种文本元素进行分类，比如标题和被强调的文本，然后清晰地标记出它们。为此，
 5     要将HTML标记添加到文本中，使文档能在浏览器中作为网页显示并使用。
 6
 7     编写原型
 8
 9     定义目标
10
11     输入不应该包含人工代码或者标签
12     应该可以处理不同的块
13 20.2 有用的工具
14     要对文件能读写，或者至少能从标准输入（sys.stdin）中读写，在print输出
15     需要对所输入的行进行迭代
16     需要使用一些字符串方法
17     需要一个或者两个生成器
18     可能需要用到re模块
19 20.3 准备工作
20     测试套件评估进度 test_input.txt
21 20.4 初次实现
22     首先要做的是本文本切成段落，段落被一个或者多个空行隔开。比段落更准确的是块（block）
23     20.4.1 找出文本块
24         找出块的简单方法就是搜集遇到的行，直到遇到一个空行，然后返回已经收集到的行。那些返回的行就是一个块。
25         if 。。elif的逻辑选择，
26     20.5.2 添加一些标记
27
28 总结:
29     主程序（markup.py）
30         1、将处理过程封装为方法parse，这个整个程序的处理过程
31         2、将规则条件抽象成规则类，避免了大量if语句使用。
32         3、self.addRule增加规则对象。利用规则对象的方法和属性实际操作块的处理。
33         4、生成一个html处理器对象（HTMLRenderer），传给BasicTextParser生成对象parser，改该对象调用rule对象接受handler
34         对象处理block
35         5、最重要的是如何将功能抽象成具体的类，以及各个类所有的功能，高内聚低耦合。

fun_study.py

 1 # def count():
 2 #     ‘‘‘
 3 #     创建以一个空列表，每次讲定义的函数名存入列表，函数执行三次，列表值[f,f,f]
 4 #     第三次时i为3，fi（）=f(),这是f同样保存了，第三次时的父函数的作用域i=3
 5 #     所以f1=3*3=9
 6 #     ‘‘‘
 7 #     fs = []
 8 #     for i in range(1, 4):
 9 #         def f():
10 #              return i*i
11 #         fs.append(f)
12 #     print(i)
13 #     return fs
14
15
16 def count():
17     def f(j):
18         def g():
19             return j*j
20         return g
21     fs = []
22     for i in range(1, 4):
23         fs.append(f(i))
24     return fs
25 ‘‘‘体现闭包的用法。fs中函数对象将f(1),f(2),f(3)作用域存放在函数中‘‘‘
26 f1,f2,f3=count()
27 print(f1())
28 print(f2())
29 print(f3())

原文地址：https://www.cnblogs.com/landerhu/p/11673067.html

时间： 2024-10-08 08:22:48

python基础教程2第20章项目1：即时标记的相关文章

《Python基础教程》第20章学习笔记

python实现:https://github.com/captainwong/instant_markup c++实现:https://github.com/captainwong/instant_markup_cpp 要点: 1.标准输入输出流的重定向 python markup.py < test_input.txt > test_output.html 上述命令将标准输入设备重定向为文件input.txt,将标准输出设备重定向为文件test_output.html. Python中使用

Python基础教程（第六章抽象）

本文内容全部出自<Python基础教程>第二版,在此分享自己的学习之路. ______欢迎转载:http://www.cnblogs.com/Marlowes/p/5351415.html______ Created on Xu Hoo 本章将会介绍如何将语句组织成函数,这样,你可以告诉计算机如何做事,并且只需要告诉一次.有了函数以后,就不必反反复复像计算机传递同样的具体指令了.本章还会详细介绍参数(parameter)和作用域(scope)的概念,以及地柜的概念及其在程序中的用途. 6.1

Python基础教程（第五章条件、循环和其他语句）

本文内容全部出自<Python基础教程>第二版,在此分享自己的学习之路. ______欢迎转载:http://www.cnblogs.com/Marlowes/p/5329066.html______ Created on Xu Hoo 读者学到这里估计都有点不耐烦了.好吧,这些数据结构什么的看起来都挺好,但还是没法用它们做什么事,对吧? 下面开始,进度会慢慢加快.前面已经介绍过了几种基本语句(print语句.import语句.赋值语句).在深入介绍条件语句和循环语句之前,我们先来看看这几种基

Python基础教程（第十一章文件和流）

本文内容全部出自<Python基础教程>第二版,在此分享自己的学习之路. ______欢迎转载:http://www.cnblogs.com/Marlowes/p/5519591.html______ Created on Marlowes 到目前为止,本书介绍过的内容都是和解释器自带的数据结构打交道.我们的程序与外部的交互只是通过input.raw_input和print函数,与外部的交互很少.本章将更进一步,让程序能接触更多领域:文件和流.本章介绍的函数和对象可以让你在程序调用时存储数据,

《python基础教程》第2章列表和元组读书笔记

第二章列表和元组 1.数据结构:通过某种方式将元素集合在一起. 2.python的6种内建序列:列表,元组,字符串,Unicode字符串,buffer对象,xrange对象. 3.迭代:依次对序列中的元素重复做某一操作. 4.序列都可以用索引来获取单个元素. 5.分片可以提取序列的一部分元素,第一个索引包含在分片内,第二个索引不包含在分片内.[;]可以复制整个序列.分片可以指定一个步长,如[0:10:2]步长为2. 6.序列可以直接用加号+相加. 7.序列用乘号*,就是原来的序列被重复x次.

《python基础教程》第4章字典：当索引不好用时读书笔记

第四章字典:当索引不好用时 1.通过名字来引用值的数据结构,这种数据结构叫做映射,字典是python中唯一内建的映射类型. 2.len():可以返回字典中的键-值对的数量. 3.del 关键字也可以删除字典中的项. 4.in 也可以检查字典中是否存在某一项. 5.字典中的键可以是任意不可变的数据类型,如浮点型,元组,字符串. 6.字典也可以用于格式化字符串. 7.字典方法: ①clear():清空字典中的所有项. ②copy():浅复制一个字典,感觉没什么用,从copy模块中用deepcopy

《python基础教程》第3章使用字符串读书笔记

第三章:使用字符串 1.字符串格式化操作符是一个百分号 % 2.只有元组和字典可以格式化一个以上的值.列表或者其他序列只会被解释为一个值. 3.in操作符只能查找字符串中的单个字符. 4.字符串方法: ①find():find方法可以在一个较长的字符串中查找子串,它返回子串所在位置的最左端索引,如果没有找到则返回-1.这个方法还能提供起始点和结束点的范围(提供第二,第三个参数),范围包含第一个索引,但不包含第二个索引,这在python中是个惯例. ②join():这个方法用来连接序列中的元素(序

《python基础教程》第5章条件、循环和其他语句读书笔记

第五章:条件.循环和其他语句 1. bool():bool函数能够返回参数的布尔类型True或者False. 2. 相等运算符是两个等号 ==. 3. is 是同一性运算符. 4. 布尔运算符:and,or,not 5. a if b else:如果b为真,则返回a,否则,返回b. 6. 关键字assert,在条件必须为真的情况下,程序才能正常工作,否则出现异常. 7. range函数能够创建一个序列,第一个参数是下限(包含),第二个参数是上限(不包含). 8. xrange函数和range函数

Python基础教程笔记——第1章

1.8 函数 pow(x,y) x^y abs(x) 取数的绝对值 round(x) 会把浮点数四舍五入为最接近的整数 floor(x) 向下取整的函数,但是需要先import math模块 1.9 模块用import导入模块来扩展Python的功能 (1)import 模块然后用法:模块.函数 (2)from 模块 import 函数然后用法:函数 (3)使用变量来赋值,然后使用函数,,,如:foo=math.floor() foo(12

python基础教程2第20章 项目1：即时标记

python基础教程2第20章 项目1：即时标记的相关文章

python基础教程2第20章项目1：即时标记

python基础教程2第20章项目1：即时标记的相关文章