Perl语言的最大特点,也是Perl作为CGI首选语言的最大特点,是它的模式匹配操作符。Perl语言的强大的文本处理能力正是通过其内嵌的对模式匹配的支持体现的。模式通过创建正则表达式实现。Perl的正则表达式与模式匹配的特点一是内嵌于语言之中,而不是通过库或函数来实现,因此使用更简便;二是比一般的正则表达式与模式匹配功能强大。
模式匹配操作符简介
操作符 |
意义 |
实例 |
=~ |
匹配(包含) |
|
!~ |
不匹配(不包含) |
|
m// |
匹配 |
$haystack =~ m/needle/ $haystack =~ /needle/ |
s/// |
替换 |
$italiano =~ s/butter/olive oil/ |
tr///(y///) |
转换 |
$rotate13 =~ tr/a-zA-Z/n-za-mN-ZA-M/ |
qr// |
正则表达式 |
使用说明:
l 注意区别记忆Perl的绑定操作符(=~)与AWK的相应操作符(AWK的绑定匹配操作符是 ~),Perl与AWK的否定匹配操作符相同(都是!~)
l 没有绑定操作符时,默认是对$_进行绑定:
/new life/ and /new civilizations/ (对$_进行两次查找)
s/suger/aspartame/ (对$_进行替换)
tr/ATCG/TAGC/ (对$_进行转换)
l m//操作符前面的m可以省略,但是不省略可读性更好,建议不省略。
l 如果有绑定操作符=~,m//都省略也表示匹配:
print “matches” if $somestring =~ $somepattern; 等价于
print “matches” if $somestring =~ m/$somepattern/;
l m//, s///, tr///, qr//操作符是引用操作符,你可以选择自己的分割符(与q//, qq//, qw//一样):
$path =~ s#/tmp#/var/tmp/scratch#
if ($dir =~ m[/bin]) {
print “No binary directories please. /n”;
}
l 一个括号可与其它括号配合使用,可以用空格分开:
s(egg)<larva>
s(larva){pupa};
s[pupa]/imago/;
s (egg) <larva>;
l 如果一个模式成功匹配上,$`, $&, $’将被设置,分别表示匹配左边、匹配、匹配右边的字符串:
“hot cross buns” =~ /cross/;
print “Matched: <$`> $& <$’>/n”; # Matched <hot > cross < buns>
l 模式模式后设置的特殊变量如下:
变量 |
含义 |
$` |
匹配文本之前的文本 |
$& |
匹配文本 |
$’ |
匹配文本之后的文本 |
$1、$2、$3 |
对应第1、2、3组捕获括号匹配的文本 |
$+ |
编号最大的括号匹配的文本 |
$^N |
最后结束的括号匹配的文本 |
@- |
目标文本中各匹配开始位置的偏移值数组 |
@+ |
目标文本中各匹配结束位置的偏移值数组 |
$^R |
最后执行的嵌入代码的结果,如果嵌入代码结构作为条件语句的if部分,则不设定$^R |
m//, s///和qr//都接受以下修饰符:
修饰符 |
意 义 |
/i |
进行忽略字母大小的匹配 |
/s |
单行模式(让.号匹配换行符并且忽略过时的$*变量,点号通配模式) |
/m |
多行模式(让^和$匹配内含的换行符(/n)的之后与之前。如果目标字符串中没有“/n”字符或者模式中没有 ^ 或 $,则设定此修饰符没有任何效果)。 (增强的行锚点模式) |
/x |
宽松排列和注释模式(忽略空白符(除转义空白符之外)并且允许模式中的注释) |
/o |
仅编译一次模式,防止运行时重编译 |
例如:
m//w+:(/s+/w+)/s*/d+/; # A word, colon, space, word, space, digits
m//w+: (/s+ /w+) /s* /d+/x; # A word, colon, space, word, space, digits
m{
/w+; # Match a word and a column
( # (begin group)
/s+ # Match one or more spaces.
/w+ # Match another word
) # (end group)
/s* # Match zero or more spaces
/d+ # Match some digits
}x;
$/ = ""; # "paragrep" mode
while (<>) {
while ( m{
/b # start at a word boundary
(/w/S+) # find a wordish chunk
(
/s+ # separated by some whitespace
/1 # and that chunk again
) + # repeat ad lib
/b # until another word word boundary
}xig
) {
print "dup word ‘$1‘ at paragraph $. /n";
}
}
模式匹配操作符详解
7.3.1 m//操作符(匹配)
EXPR =~ m/PATTERN/cgimosx
EXPR =~ /PATTERN/cgimosx
EXPR =~ ?PATTERN?cgimosx
m/PATTERN/cgimosx
/PATTERN/cgimosx
?PATTERN?cgimosx
说明:
l 如果PATTERN是空字符串,最后成功执行的正则表达式将被代替使用。
l m//特殊修饰符如下:
修饰符 |
意 义 |
/g |
查找所有的匹配 |
/cg |
在/g 匹配失败后允许继续搜索 |
l 在LIST上下文中m//g返回所有匹配
if (@perls = $paragraph =~ /perl/gi) {
printf “Perl mentioned %d times./n”, scalar @perls;
}
l ??分隔符表示一次性匹配, ‘’分隔符压制变量替换和/U等六个转换
open DICT, "/usr/share/dict/words" or die "Cannot open words: $!/n";
while (<DICT>) {
$first = $1 if ?(^love.*)?;
$last = $1 if /(^love.*)/;
}
print $first, "/n";
print $last, "/n";
7.3.2 s///操作符(替换)
LVALUE =~ s/PATTERN/REPLACEMENT/egimosx
s/PATTERN/REPLACEMENT/egimosx
说明:
l 该操作符在字符串中查找PATTERN, 如果查找到,用REPLACEMENT代替匹配的子串。返回值是成功替换的次数(加/g修饰符可能大于1)。若失败,返回””(0)。
if ($lotr =~ s/Bilbo/Frodo/) { print “Successfully wrote sequel. “ }
$change_count = $lotr =~ s/Bilbo/Frodo/g;
l 替换部分作为双引字符串,可以使用动态生成的模式变量($`,$&, $’, $1, $2等):
s/revision/version/release//u$&/g;
s/version ([0-9.]+)/the $Names{$1} release/g;
l 如果PATTERN是空字符串,最后成功执行的正则表达式将被代替使用。PATTERN和REPLACEMENT都需进行变量替换,但是PATTERN在s///作为一个整体处理的时候替换,而REPLACEMENT在每次模式匹配到时替换。
l s///特殊修饰符如下:
修饰符 |
意 义 |
/g |
替换所有的匹配 |
/e |
将右边部分作为一个Perl表达式(代码)而不是字符串 |
/e修饰符的实例:
s/[0-9]+/sprintf(“%#x”, $1)/ge
s{
version
/s+
(
[0-9.]+
)
}{
$Names{$1}
? “the $Names{$1} release”
: $&
}xge;
l 不替换原字符串的方式:
$lotr = $hobbit;
$lotr =~ s/Bilbo/Frodo/g;
($lotr = $hobbit) =~ s/Bilbo/Frodo/g;
l 替换数组中的每一元素:
for (@chapters) { s/Bilbo/Frodo/g }
s/Bilbo/Frodo/g for @chapters;
l 对某一字符串进行多次替换:
for ($string) {
s/^/s+//;
s//s+$//;
s//s+/ /g
}
for ($newshow = $oldshow) {
s/Fred/Homer/g;
s/Wilma/Marge/g;
s/Pebbles/Lisa/g;
s/Dino/Bart/g;
}
l 当一次全局替换不够的时的替换:
# put comma in the right places in an integer
1 while s/(/d)(/d/d/d)(?!/d)/$1,$2/;
# expand tabs to 8-column spacing
1 while s//t+/’ ‘ x (length($&)*8 – length($`)%8)/e;
# remove (nested (even deeply nested (like this))) remarks
1 while s//([^()]*/)//g;
# remove duplicate words (and triplicate ( and quadruplicate…))
1 while s//b(/w+) /1/b/$1/gi;
7.3.3 tr///操作符(字译)
LVALUE =~ tr/SEARCHLIST/REPLACELIST/cds
tr/SEARCHLIST/REPLACELIST/cds
使用说明:
l tr///的修饰符如下:
修饰符 |
意 义 |
/c |
补替换 (Complement SEARCHLIST) |
/d |
删除找到未替换的字符串(在SEARCHLIST中存在在REPLACELIST中不存在的字符) |
/s |
将重复替换的字符变成一个 |
l 如果使用了/d修饰符,REPLACEMENTLIST总是解释为明白写出的字符串,否则,如果REPLACEMENTLIST比SEARCHLIST短,最后的字符将被复制直到足够长,如果REPLACEMENTLIST为空,等价于SEARCHLIST,这种用法在想对字符进行统计而不改变时有用,在用/s修饰符压扁字符时有用。
tr/aeiou/!/; # change any vowel into !
tr{////r/n/b/f. }{_}; # change strange chars into an underscore
tr/A-Z/a-z/ for @ARGV; # canonicalize to lowercase ASCII
$count = ($para =~ tr//n//);
$count = tr/0-9//;
$word =~ tr/a-zA-Z//s; # bookkeeper -> bokeper
tr/@$%*//d; # delete any of those
tr#A-Za-z0-9+/##cd; # remove non-base64 chars
# change en passant
($HOST = $host) =~ tr/a-z/A-Z/;
$pathname =~ tr/a-zA-Z/_/cs; # change non-(ASCII) alphas to single underbar
元字符
Perl元字符有:
/ | ( ) [ { ^ $ * + ?
正则表达式元字符的意义如下:
Symbol |
Atomic |
Meaning |
/... |
Varies |
转义 |
...|... |
No |
选择 |
(...) |
Yes |
集群(作为一个单位) |
[...] |
Yes |
字符集合 |
^ |
No |
字符串开始 |
. |
Yes |
匹配一个字符(一般除换行符外) |
$ |
No |
字符串结尾(或者换行符之前) |
* + ?是数量元字符,Perl数量相关元字符意义如下:
Quantifier |
Atomic |
Meaning |
* |
No |
匹配0或多次(最大匹配),相当于{0,} |
+ |
No |
匹配1或多次(最大匹配),相当于{1,} |
? |
No |
匹配1或0次(最大匹配),相当于{0,1} |
{COUNT} |
No |
匹配精确COUNT次 |
{MIN,} |
No |
匹配最少MIN次 (最大匹配) |
{MIN,MAX} |
No |
匹配最小MIN最大MAX次(最大匹配) |
*? |
No |
匹配0或多次(最小匹配) |
+? |
No |
匹配1或多次(最小匹配) |
?? |
No |
匹配1或0次(最小匹配) |
{MIN,}? |
No |
匹配最少MIN次 (最小匹配) |
{MIN,MAX}? |
No |
匹配最小MIN最大MAX次(最小匹配) |
扩展正则表达式序列如下:
Extension |
Atomic |
Meaning |
(?#...) |
No |
Comment, discard. |
(?:...) |
Yes |
Cluster-only parentheses, no capturing. |
(?imsx-imsx) |
No |
Enable/disable pattern modifiers. |
(?imsx-imsx:...) |
Yes |
Cluster-only parentheses plus modifiers. |
(?=...) |
No |
True if lookahead assertion succeeds. |
(?!...) |
No |
True if lookahead assertion fails. |
(?<=...) |
No |
True if lookbehind assertion succeeds. |
(?<!...) |
No |
True if lookbehind assertion fails. |
(?>...) |
Yes |
Match nonbacktracking subpattern. |
(?{...}) |
No |
Execute embedded Perl code. |
(??{...}) |
Yes |
Match regex from embedded Perl code. |
(?(...)...|...) |
Yes |
Match with if-then-else pattern. |
(?(...)...) |
Yes |
Match with if-then pattern. |
说明:以上定义了向前查找(?=PATTERN),负向前查找(?!PATTERN),向后查找(?<=PATTERN),负向后查找(?<!PATTERN),条件查找等较为高级的正则表达式匹配功能,需要使用时请查阅相关资料。
字母顺序元字符意义:
Symbol |
Atomic |
Meaning |
/0 |
Yes |
Match the null character (ASCII NUL). |
/NNN |
Yes |
Match the character given in octal, up to /377. |
/n |
Yes |
Match nth previously captured string (decimal). |
/a |
Yes |
Match the alarm character (BEL). |
/A |
No |
True at the beginning of a string. |
/b |
Yes |
Match the backspace character (BS). |
/b |
No |
True at word boundary. |
/B |
No |
True when not at word boundary. |
/cX |
Yes |
Match the control character Control-X (/cZ, /c[, etc.). |
/C |
Yes |
Match one byte (C char) even in utf8 (dangerous). |
/d |
Yes |
Match any digit character. |
/D |
Yes |
Match any nondigit character. |
/e |
Yes |
Match the escape character (ASCII ESC, not backslash). |
/E |
-- |
End case (/L, /U) or metaquote (/Q) translation. |
/f |
Yes |
Match the form feed character (FF). |
/G |
No |
True at end-of-match position of prior m//g. |
/l |
-- |
Lowercase the next character only. |
/L |
-- |
Lowercase till /E. |
/n |
Yes |
Match the newline character (usually NL, but CR on Macs). |
/N{NAME} |
Yes |
Match the named char (/N{greek:Sigma}). |
/p{PROP} |
Yes |
Match any character with the named property. |
/P{PROP} |
Yes |
Match any character without the named property. |
/Q |
-- |
Quote (de-meta) metacharacters till /E. |
/r |
Yes |
Match the return character (usually CR, but NL on Macs). |
/s |
Yes |
Match any whitespace character. |
/S |
Yes |
Match any nonwhitespace character. |
/t |
Yes |
Match the tab character (HT). |
/u |
-- |
Titlecase next character only. |
/U |
-- |
Uppercase (not titlecase) till /E. |
/w |
Yes |
Match any "word" character (alphanumerics plus "_"). |
/W |
Yes |
Match any nonword character. |
/x{abcd} |
Yes |
Match the character given in hexadecimal. |
/X |
Yes |
Match Unicode "combining character sequence" string. |
/z |
No |
True at end of string only. |
/Z |
No |
True at end of string or before optional newline. |
(以上均直接Copy自《Programming Perl》,下面未翻译者同)
其中应注意以下经典的字符集合:
Symbol |
Meaning |
As Bytes |
As utf8 |
/d |
Digit |
[0-9] |
/p{IsDigit} |
/D |
Nondigit |
[^0-9] |
/P{IsDigit} |
/s |
Whitespace |
[ /t/n/r/f] |
/p{IsSpace} |
/S |
Nonwhitespace |
[^ /t/n/r/f] |
/P{IsSpace} |
/w |
Word character |
[a-zA-Z0-9_] |
/p{IsWord} |
/W |
Non-(word character) |
[^a-zA-Z0-9_] |
/P{IsWord} |
POSIX风格的字符类如下:
Class |
Meaning |
alnum |
Any alphanumeric, that is, an alpha or a digit. |
alpha |
Any letter. (That‘s a lot more letters than you think, unless you‘re thinking Unicode, in which case it‘s still a lot.) |
ascii |
Any character with an ordinal value between 0 and 127. |
cntrl |
Any control character. Usually characters that don‘t produce output as such, but instead control the terminal somehow; for example, newline, form feed, and backspace are all control characters. Characters with an ord value less than 32 are most often classified as control characters. |
digit |
A character representing a decimal digit, such as 0 to 9. (Includes other characters under Unicode.) Equivalent to /d. |
graph |
Any alphanumeric or punctuation character. |
lower |
A lowercase letter. |
|
Any alphanumeric or punctuation character or space. |
punct |
Any punctuation character. |
space |
Any space character. Includes tab, newline, form feed, and carriage return (and a lot more under Unicode.) Equivalent to /s. |
upper |
Any uppercase (or titlecase) letter. |
word |
Any identifier character, either an alnum or underline. |
xdigit |
Any hexadecimal digit. Though this may seem silly ([0-9a-fA-F] works just fine), it is included for completeness. |
注意:POSIX风格字符类的使用方法,
42 =~ /^[[:digit:]]+$/ (正确)
42 =~ /^[:digit:]$/ (错误)
这里使用的模式以[[开头,以]]结束,这是使用POSIX字符类的正确使用方法。我们使用的字符类是[:digit:]。外层的[]用来定义一个字符集合,内层的[]字符是POSIX字符类的组成部分。
常见问题的正则解决方案
IP地址:
(((/d{1,2})|(1/d{2})|(2[0-4]/d)|(25[0-5]))/.){3}((/d{1,2})|(1/d{2})|(2[0-4]/d)|(25[0-5]))
邮件地址:
(/w+/.)*/[email protected](/w+/.)+[A-Za-z]+
(以上邮件地址正则表达式并非严格的,但是可以匹配绝大多数普通的邮件地址。
HTTP URL:
{http://([^/:]+)(:(/d+))?(/.*)?$}i
https?://(/w*:/w*@)?[-/w.]+(:/d+)?(/([/w/_.]*(/?/S+)?)?)?
C语言注释:
/
在Perl中,类、包、模块是相关的,一个模块只是以同样文件名(带.pm后缀)的一个包;一个类就是一个包;一个对象是一个引用;一个方法就是一个子程序。这里只说明其最简单的使用方法。
模块使用
以下是一个模块(Bestiary.pm)的编写方式,可以作为写一般模块的参考。
package Bestiary;
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(camel); # Symbols to be exported by default
our @EXPORT_OK = qw($weight); # Symbols to be exported on request
our $VERSION = 1.00; # Version number
### Include your variables and functions here
sub camel { print "One-hump dromedary" }
$weight = 1024;
1;
(引自《Programming Perl》)
对象使用
以下例子用来构建一个Ipregion对象,可以使用该对象的get_area_isp_id方法查找一个IP的地区与运营商。本例可以作为写一般对象的参考。
package Ipregion;
use strict;
my ($DEFAULT_AREA_ID, $DEFAULT_ISP_ID) = (999999, 9);
my ($START_IP, $END_IP, $AREA_ID, $ISP_ID) = (0 .. 3);
sub new {
my $invocant = shift;
my $ip_region_file = shift;
my $class = ref($invocant) || $invocant;
my $self = [ ]; # $self is an reference of array of arrays
# Read into ip region data from file
open my $fh_ip_region, ‘<‘, $ip_region_file
or die "Cannot open $ip_region_file to load ip region data $!";
my $i = 0;
while (<$fh_ip_region>) {
chomp;
my ($start_ip, $end_ip, $area_id, $isp_id) = split;
$self->[$i++] = [ $start_ip, $end_ip, $area_id, $isp_id ];
}
bless($self, $class);
return $self;
}
sub get_area_isp_id {
my $self = shift;
my $ip = shift;
my $area_id = $DEFAULT_AREA_ID;
my $isp_id = $DEFAULT_ISP_ID;
# Check if a ip address is in the table using binary search method.
my $left = 0;
my $right = @$self - 1; # Get max array index
my $middle;
while ($left <= $right) {
$middle = int( ($left + $right) / 2 );
if ( ($self->[$middle][$START_IP] <= $ip) && ($ip <= $self->[$middle][$END_IP]) ) {
$area_id = $self->[$middle][$AREA_ID];
$isp_id = $self->[$middle][$ISP_ID];
last;
}
elsif ($ip < $self->[$middle][$START_IP]) {
$right = $middle - 1;
}
else {
$left = $middle + 1;
}
}
return ($area_id, $isp_id);
}
该对象的使用方法是:
use Ipregion;
my $ip_region = Ipregion->new("new_ip_region.dat");
my @search_result = $ip_region->get_area_isp_id(974173694);
.Perl特殊变量
变量符号(名) |
意 义 |
$a |
sort函数使用存储第一个将比较的值 |
$b |
sort函数使用存储第二个将比较的值 |
$_ ($ARG) |
默认的输入或模式搜索空间 |
@_ (@ARG) |
子程序中默认存储传入参数 |
ARGV |
The special filehandle that iterates over command-line filenames in @ARGV |
$ARGV |
Contains the name of the current file when reading from ARGV filehandle |
@ARGV |
The array containing the command-line arguments intended for script |
$^T ($BASETIME) |
The time at which the script began running, in seconds since the epoch |
$? ($CHILD_ERROR) |
The status returned by the last pipe close, backtick(``)command, or wait, waitpid, or system functions. |
DATA |
This special filehandle refers to anything following the __END__ or the __DATA__ token in the current file |
$) ($EGID, $EFFECTIVE_GROUP_ID) |
The effective GID of this process |
$> ($EUID, $EFFECTIVE_USER_ID) |
The effective UID of this process as returned by the geteuid(2) syscall |
%ENV |
The hash containing your current environment variables |
[email protected] ($EVAL_ERROR) |
The currently raised exception or the Perl syntax error message from the last eval operation |
@EXPORT |
Exporter模块import方法使用 |
@EXPORT_OK |
Exporter模块import方法使用 |
%EXPORT_TAGS |
Exporter模块import方法使用 |
%INC |
The hash containing entries for the filename of each Perl file loaded via do FILE, require or use |
@INC |
The array containing the list of directories where Perl module may be found by do FILE, require or use |
$. ($NR, $INPUT_LINE_NUMBER) |
The current record number (usually line numberZ) for the last filehandle you read from. |
$/ ($RS, $INPUT_RECORD_SEPARATOR) |
The input record separator, newline by default, which is consulted by the readline function, the <FH> operator, and the chomp function. $/=””将使得记录分割符为空白行,不同于”/n/n” undef $/; 文件剩余所有行将全部一次读入 $/=/$number将一次读入$number字节 |
@ISA |
This array contains names of other packages to look through when a method call cannot be found in the current package |
@+ @- $` $’ $& $1 $2 $3 |
匹配相关变量 |
$^ $~ $| |
Filehandle相关 |
$” ($LIST_SEPARATOR) |
When an array or slice is interpolated into a double-quoted string, this variable specifies the string to put between individual elements. Default is space. |
$^O ($OSNAME) |
存储平台名 |
$! ($ERRNO, $OS_ERROR) |
数值上下文:最近一次调用的返回值 字符串上下文:响应系统错误信息 |
$, ($OFS, $OUTPUT_FIELD_SEPARATOR) |
print的字段分割符(默认为空) |
$/($ORS, $OUTPUT_RECORD_SEPARATOR) |
print的记录分割符(默认为空,设为”/n”是很好的选择) |
$$ ($PID) |
The process number |
$0 ($PROGRAM_NAME) |
程序名 |
$( ($GID, $PEAL_GROUP_ID) |
进程的真正GID |
$< ($UID, $PEAL_USER_ID) |
|
%SIG |
The hash used to set signal handlers for various signals |
STDERR |
标准错误Filehandle |
STDIN |
标准输入Filehandle |
STDOUT |
标准输出Filehandle |
$; $SUBSEP $SUBSCRIPT_SEPARATOR |
The subscript sesparator for multidimensional hash emulation $foo{$a,$b,$c}=$foo{join($;,$a,$b,$c)} |
说明:若需要使用长文件名,必须使用use English;
程序文档(POD)
POD(Plain Old Documentation), 它是一种简单而易用的标记型语言(置标语言),用于perl程序和模块中的文档书写。POD中用段分可以分为三种,普通段落,字面段落(Verbatim Paragraph)和命令段落。三者的区分非常简单,以=pod|head1|cut|over等指示字开始的段落为命令段落,以空格或制表符(/t)等缩进开始的段落为字面段落,其余的就是普通段落。POD中有其独特的格式代码来表现粗体,斜体,超链接等。
POD使得Perl语言的文档编写易于完成,程序说明文档与程序源代码同时存在。可以用以下解释器解释POD: pod2text、pod2man (pod2man File.pm |nroff –man |more)、pod2html、pod2latex。
一般建议对源代码包括以下部分的POD文档:
=head1 NAME
The name of your program or module.
=head1 SYNOPSIS
A one-line description of what your program or module does (purportedly).
=head1 DESCRIPTION
The bulk of your documentation. (Bulk is good in this context.)
=head1 AUTHOR
Who you are. (Or an alias, if you are ashamed of your program.)
=head1 BUGS
What you did wrong (and why it wasn‘t really your fault).
=head1 SEE ALSO
Where people can find related information (so they can work around your bugs).
=head1 COPYRIGHT
The copyright statement. If you wish to assert an explicit copyright, you should say something like:
Copyright 2013, Randy Waterhouse. All Rights Reserved.
Many modules also add:
This program is free software. You may copy or
redistribute it under the same terms as Perl itself.
编程风格
为了使程序易于阅读、理解和维护,建议使用以下编程风格(以下建议均为Larry Wall在perlstyle文档中所写,其实许多条对于其它语言编程也适用):
l 多行BLOCK的收尾括号应该跟结构开始的关键字对齐;
l 4列的缩进;
l 开始括号与关键字同一行,如果可能的话,否则,与关键字对齐;
l 在多行BLOCK的开始括号之前留空格;
l 一行的BLOCK可以写在一行中,包括括号;
l 在分号前不留空格;
l 分号在短的一行BLOCK中省略;
l 在大多数操作符两边留空格;
l 在复杂的下标两边加空格(在方括号内);
l 在做不同事情的代码段中留空行;
l 不连接在一起的else;
l 在函数名和开始的括号之间不加空格;
l 在每一逗号后加空格;
l 长行在操作符后断开(除了and和or外)
l 在本行匹配的最后的括号后加空格
l 相应项竖直对齐
l 只要清晰性不受损害省略冗余的标点符号
l 你能怎样做不意味着你应该怎样做。Perl设计来给予你做任何事的多种方法,所以选择最易读的方法。
open(FOO, $foo) || die “Cannot open $foo: $!”;
比 die “Cann’t open $foo: $!” u;nless open(FOO, $foo)
l 不要害怕使用循环标号,它们可以增强可读性并且允许多层循环跳出。
l 选择易于理解的标示符。如果你不能记住你的记忆法,你将碰到问题
l 在使用长标示符时用下划线分隔单词
l 在使用复杂的正则表达式时使用/x
l 使用here文档而不是重复的print()语句
l 垂直对齐相应的东西,特别是在一行不能容下时
l 总是检查系统调用的返回值。好的错误信息应该输出的标准错误输出,包括那一程序造成了问题的信息。
l 对齐转换的信息:
tr [abc]
[xyz]
l 考虑复用性,一般化你的代码,考虑写一个模块或者类。
l 使用POD文档化你的代码
l 保持一致
l 保持友好
参考文献
【1】Larry Wall, Tom Christiansen & Jon Orwant. Programming Perl. Third Edition (The Camel Book)
【2】Randal L. Schwartz, Tom Phoenix, and brian d foy. Learning Perl (The Llama Book )
【3】Damian Conway. Perl Best Practices(Perl 最佳实践—影印版). O’reilly. 东南大学出版社. 2006.4
【4】Ben Forta著.杨涛等译. 正则表达式必知必会.北京:人民邮电出版社. 2007.12
【5】Jeffrey E.F. Friedl著. 余晟译. 精通正则表达式(第3版).北京:电子工业出版社. 2007.7
【6】The perldoc mainpage