第17章标准库特殊设施

17.1类型tuple

tuple是类是pair的模板。每个pair都不相同，但是都有两个成员。而一个tuple类型会有任意数量的成员。（定义在tuple头文件中）

tuple<T1,T2,...,Tn> t;	t是一个tuple，成员数位n，第i个成员的类型为Ti。所有成员都进行值初始化
tuple<T1,T2,...,Tn> t(v1,v2,...,vn);	t是一个tuple，成员类型为T1,T2,...,Tn，每个成员用对应的初始值vi进行初始化。此构造函数是explicit的
make_tuple(v1,v2,...,vn);	返回一个用给定初始值初始化的tuple。tuple的类型从初始值的类型推断
t1 == t2 t1 != t2	当两个tulpe具有相同数量的成员对应相等时，两个tuple相等。这两个操作使用成员的==运算符来完成。一旦发现某对成员不等，接下来的成员就不用比较了
t1 relop t2	tuple的关系运算使用字典序。两个tuple必须具有相同数量成员。使用<运算符比较t1的成员和t2的对应成员。
get<i>(t)	返回t的第i个成员的引用；如果t是一个左值，结果是一个左值引用；否则，结果是一个右值引用。tuple的所有成员都是public的
tuple_size<tupleType>::value	一个类模板，可以通过一个tuple类型来初始化，它有一个名为value的public constexpr static数据成员，类型为size_t，表示给定tuple类型中成员的数量
tuple_element<i,tupleType>::type	一个类模板，可以通过一个整型常量和一个tuple类型来初始化，它有一个名为type的public成员，表示给定tuple类型中指定成员的类型。

在关系比较中，两个tuple的类型必须相同（参数个数相同、类型相同）

成员访问

tuple<int, string, double> item = make_tuple(8, "success", 3.1415926);
cout << get<2>(item) << endl;
//查询有多少个元素
typedef decltype(item) trans;
size_t sz = tuple_size<trans>::value;
//成员类型
tuple_element<1, trans>::type second_value = get<1>(item);

17.1.2使用tuple返回多个值

使用tuple的一个用处就是使用tuple可以让函数返回多个值。

typedef tuple<int, string, double> result;
result Get_result()
{
    return  make_tuple(8, string("success"), 3.1415926);
}

17.2类型bitset

将数据当做二进制位集合进行处理，并且可以指定任意的位数。定义在bitset头文件中。

bitset<n> b;	b有n位，每位都为0
bitset<n> b(u);	b是unsigned long型u的一个副本
bitset<n> b(s);	b是string对象s中含有的位串的副本
bitset<n> b(s, pos, m);	b是s中从位置pos开始的m个位的副本
bitset<n> b(s, pos, m);	b是s中从位置pos开始的m个位的副本
bitset<n> b(s, pos, m,zero,one);	b是s中从位置pos开始的m个位的副本
bitset<n> b(cp, pos, m,zero,one);	b是cp中从位置pos开始的m个位的副本

如果使用一个数值初始化bitset，会首先将数值转换为unsigned long long类型，然后进行初始化

//double 5.1转换成unsigned 5，二级制为101
bitset<6> bt = 5.1;
//输出为000101
cout << bt << endl;

如果使用string类型或者char数组类型初始化，则可以将字符中的0或1当成二进制进行处理

//截断到前6位
bitset<6> bt("11100101");
//输出为111001
cout << bt << endl;

17.2.2操作bitset

b.any()	b中是否存在置为1的二进制位？
b.all()	b中是所有的二进制位都置为1？
b.none()	b中不存在置为1的二进制位吗？
b.count()	b中置为1的二进制位的个数
b.size()	b中二进制位的个数
b.test(pos)	b中在pos处的二进制位是否为1？
b.set(pos)	把b中在pos处的二进制位置为1
b.set()	把b中所有二进制位都置为1
b.reset(pos)	把b中在pos处的二进制位置为0
b.reset()	把b中所有二进制位都置为0
b.flip(pos)	把b中在pos处的二进制位取反
b.flip()	把b中所有二进制位逐位取反
b[pos]	访问b中在pos处的二进制位，如果b是const的，返回值为bool
b.to_ulong()	用b中同样的二进制位返回一个unsigned long值
b.to_ullong()	用b中同样的二进制位返回一个unsigned long long值
b.to_string(zero, one)	返回一个string
os << b	把b中的位集以0/1输出到os流
is >> b	is流输入到b中

17.3正则表达式

Regular expression can descripte a sequence of characters. The RE libraty is defined in header file “regex”, and its components are list

regex	Class that represents a regular expression
regex_match	Matches a sqequence of characters against a regular expression
regex_search	Finds the first subsequence that matches the regular expression
regex_replace	Replace a regular expression using a given format
sregex_iterator	Iterator adaptor that calls regex_search to iterate through the matches in a string
smatch	Container class that holds the results of searching a string
ssub_match	Results for a matched subexpression in a string

17.3.1使用正则表达式匹配子串

//使用search得到匹配的子串
string pattern("\\d+");
regex r(pattern);
smatch results;
string test_str = "123 456 789";
if (regex_search(test_str, results, r))
    cout << results.str() << endl;

指定regex对象的选项

When we define a regex or call assign on a regex to give it a new value, we can specify one or more flags that affect how the regex operates. In default, the flag is ECMAScript whitch is the regular expression language that many Web brower use.

Regex选项

regex r(re) regex r(re,f) regex r=re	Define a regex with re and f. F is flag. The default is ECMAScript
r.assign(re,f)	The same effect as assignment operator=
r.make_count()	Nember for subexpression r. (ex：”abc(\d+)”)
r.flags()	Return the flag set for r

定义regex时指定的flag

icase	Ignore case(忽略大小写)
nosubs	Don’t store the subexpression
Optimize	Favor speed of execution over speed of construction
ECMAScript	Using grammer as specified of ECMA-262
basic	Using grammer of POSIX
extended	Using grammer of POSIX extended
awk, grep, egrep	POSIX version of the awk, grep, egrep language

如下

string pattern("\\d+");
regex r(pattern,regex::icase);

正则表达式是在运行时编译的，所以可能出现运行时错误，可以用如下捕获异常

try
{
    // error: missing close bracket after alnum; the constructor will throw
    regex r("[[:alnum:]+\\.(cpp|cxx|cc)$", regex::icase);
}
catch (regex_error e)
{
    cout << e.what() << "\ncode: " << e.code() << endl;
}

Error flag正则表达式错误类型

flag	error
error_collate	The expression contained an invalid collating element name.
error_ctype	The expression contained an invalid character class name.
error_escape	The expression contained an invalid escaped character, or a trailing escape.
error_backref	The expression contained an invalid back reference.
error_brack	The expression contained mismatched brackets ([ and ]).
error_paren	The expression contained mismatched parentheses (( and )).
error_brace	The expression contained mismatched braces ({ and }).
error_badbrace	The expression contained an invalid range between braces ({ and }).
error_range	The expression contained an invalid character range.
error_space	There was insufficient memory to convert the expression into a finite state machine.
error_badrepeat	The expression contained a repeat specifier (one of *?+{) that was not preceded by a valid regular expression.
error_complexity	The complexity of an attempted match against a regular expression exceeded a pre-set level.
error_stack	There was insufficient memory to determine whether the regular expression could match the specified character sequence.

正则表达式在运行时编译，而且很耗时，所以在循环中使用要慎重。

对于正则表达式有不同的字符串输入序列，所以由不同的处理方法：

正则表达式库类

string	regex, smatch, ssub_match, sregex_iterator
const char*	regex, cmatch, csub_match, cregex_iterator
wstring	wregex, wsmatch, wssub_match, wsregex_iterator
const wchar*	wregex, wcmatch, wcsub_match, wcregex_iterator

17.3.1匹配与Regex迭代器类型

sregex_iterator操作

sregex_iterator it(b, e, r);	一个sregex_iterator，遍历迭代器b和e表示的string。它调用regex_search(b, e, m, r)将it定位到输入序列中第一个匹配的位置。
sregex_iterator end;	sregex_iterator的尾后迭代器。
*it it->	根据最后一个调用regex_search的结果，返回一个smatch对象的引用或一个指向smatch对象的指针。
it++, >, <, !=, ==

使用regex迭代器

// findthe characters ei that follow a character other than c
string pattern("\\d+");
// we want the whole word in which our pattern appears
pattern = "[[:alpha:]]*" + pattern + "[[:alpha:]]*";
regex r(pattern, regex::icase); // we‘ll ignore case in doing the match
string file = "123 456 789";
// it will repeatedly call regex_search to find all matches in file
for (sregex_iterator it(file.begin(), file.end(), r), end_it;
it != end_it; ++it)
    cout << it->str() << endl; // matched word

sregex_iterator指向的smatch对象包含有很多细节，可以处理的操作为：

smatch操作

m.ready()	True if m has been set by a call to regex_search or regex_match, false otherwise. Operations on m are undefined if ready returns false
m.size()	Zero if the match failed; otherwise, one plus the number of subexpressions in the most recently matched regular expression.(返回子表达式的数量)
m.empty()	True if m.size() is zero
m.prefix()	An ssub_match representing the sequence befor the match
m.suffix()	An ssub_match representing the part after the end of the match

对smatch中子表达式的操作(index 0 reprents the overall match)

m.length(n)	True if m.size() is zero
m.position(n)	Distance from the start of the sequence
m.str(n)	The matched string of nth subexpression
m[n]	ssub_match object correspending
m.begin(), m.end() m.cbegin(), m.cend()	Iterator of ssub_match

17.3.3使用子表达式

对于得到的子表达式结果ssub_match，有如下操作

子匹配操作

matched	是否匹配了
first	Distance from the start of the sequence
m.str(n)	The matched string of nth subexpression
m[n]	ssub_match object correspending
m.begin(), m.end() m.cbegin(), m.cend()	Iterator of ssub_match

for (sregex_iterator it(file.begin(), file.end(), r), end_it;
it != end_it; ++it)
{
    cout << it->str() << endl; // matched word
    cout << it->length(1) << endl;
    cout << it->position(2) << endl;
    cout << it->operator[](2).matched << endl;
    cout << it->operator[](1).length() << endl;
}

17.3.4使用regex_replace

当我们希望替换一个字符串中指定的部分，可以使用如下操作

正则表达式替换操作

m.formate(dest, fmt, mft)

m.formate(fmt, mft)

最后参数mft默认为formate_default，控制替换方法

dest,将结果存储的位置

fmt, 替换的字符串，eg：“$2-$5-$7”, $n表示子串

regex_replace(dest, seq, r, fmt, mft)

regex_replace(seq, r, fmt, mft)

同上

例子

// findthe characters ei that follow a character other than c
string pattern("([[:alpha:]]+)(\\d+)([[:alpha:]]+)(\\d+)([[:alpha:]]+)(\\d+)([[:alpha:]]+)");
regex r(pattern, regex::icase); // we‘ll ignore case in doing the match
string file = "ffffff123aaa456bbb789ccc";
cout << regex_replace(file, r, "$1-$2-$3") << endl;

smatch results;
regex_search(file, results, r);
cout << results.format("$1-$2-$3")<< endl;

控制匹配和格式的标识mft

需要使用如下命名空间

using std::regex_constants::format_no_copy;

或者

using namespacestd::regex_constants;

匹配标识

flag*	effects on match	notes
`match_default`	Default	Default matching behavior. This constant has a value of zero**.
`match_not_bol`	Not Beginning-Of-Line	The first character is not considered a beginning of line (`"^"` does not match).
`match_not_eol`	Not End-Of-Line	The last character is not considered an end of line (`"$"` does not match).
`match_not_bow`	Not Beginning-Of-Word	The escape sequence `"\b"` does not match as a beginning-of-word.
`match_not_eow`	Not End-Of-Word	The escape sequence `"\b"` does not match as an end-of-word.
`match_any`	Any match	Any match is acceptable if more than one match is possible.
`match_not_null`	Not null	Empty sequences do not match.
`match_continuous`	Continuous	The expression must match a sub-sequence that begins at the first character. Sub-sequences must begin at the first character to match.
`match_prev_avail`	Previous Available	One or more characters exist before the first one. (`match_not_bol` and`match_not_bow` are ignored)
`format_default`	Default	Same as `match_default`. This constant has a value of zero**.
`format_sed`	None	Ignored by this function. See regex_constants for more info.
`format_no_copy`
`format_first_only`

17.4随机数

旧标准使用见的C函数rand生成随机数，生成0到系统相关的最大值（至少为32767）之间的均匀分布的随机数。

定义在random中的随机数库通过一组写作的类，扩展随机数（自定义随机数范围、随机浮点数）：random-number engines and random-number distribution.

其中random-number engines生成随机的unsigned整数序列

random-number distribution使用随机数引擎返回已制定的随机数分布。

17.4.1随机数引擎和分布

reproduce from http://www.cplusplus.com/reference/random/

Generator

Pseudo-random number engines (templates)

Generators that use an algorithm to generate pseudo-random numbers based on an initial seed:

linear_congruential_engine	Linear congruential random number engine (class template )
mersenne_twister_engine	Mersenne twister random number engine (class template )
subtract_with_carry_engine	Subtract-with-carry random number engine (class template )

Engine adaptors

They adapt an engine, modifying the way numbers are generated with it:

discard_block_engine	Discard-block random number engine adaptor (class template )
independent_bits_engine	Independent-bits random number engine adaptor (class template )
shuffle_order_engine	Shuffle-order random number engine adaptor (class template )

Pseudo-random number engines (instantiations)

Particular instantiations of generator engines and adaptors:

default_random_engine	Default random engine (class )
minstd_rand	Minimal Standard minstd_rand generator (class )
minstd_rand0	Minimal Standard minstd_rand0 generator (class )
mt19937	Mersenne Twister 19937 generator (class )
mt19937_64	Mersene Twister 19937 generator (64 bit) (class )
ranlux24_base	Ranlux 24 base generator (class )
ranlux48_base	Ranlux 48 base generator (class )
ranlux24	Ranlux 24 generator (class )
ranlux48	Ranlux 48 generator (class )
knuth_b	Knuth-B generator (class )

Random number generators

Non-deterministic random number generator:

random_device

True random number generator (class )

Distributions

Uniform:

uniform_int_distribution	Uniform discrete distribution (class template )
uniform_real_distribution	Uniform real distribution (class template )

Related to Bernoulli (yes/no) trials:

bernoulli_distribution	Bernoulli distribution (class )
binomial_distribution	Binomial distribution (class template )
geometric_distribution	Geometric distribution (class template )
negative_binomial_distribution	Negative binomial distribution (class template )

Rate-based distributions:

poisson_distribution	Poisson distribution (class template )
exponential_distribution	Exponential distribution (class template )
gamma_distribution	Gamma distribution (class template )
weibull_distribution	Weibull distribution (class template )
extreme_value_distribution	Extreme Value distribution (class template )

Related to Normal distribution:

normal_distribution	Normal distribution (class template )
lognormal_distribution	Lognormal distribution (class template )
chi_squared_distribution	Chi-squared distribution (class template )
cauchy_distribution	Cauchy distribution (class template )
fisher_f_distribution	Fisher F-distribution (class template )
student_t_distribution	Student T-Distribution (class template )

Piecewise distributions:

discrete_distribution	Discrete distribution (class template )
piecewise_constant_distribution	Piecewise constant distribution (class template )
piecewise_linear_distribution	Piecewise linear distribution (class template )

Other

seed_seq	Seed sequence (class )
generate_canonical	Generate canonical numbers (function template )

随机数引擎操作

Engine e;	默认构造，使用默认种子
Engine e(s)	使用自定义种子
e.seed(s)
e.min(), e.max()
Engine::result_type
e.discard(u)	将引擎推动u步，u is unsigned long long

使用random引擎

default_random_engine e;
for (size_t i = 0; i < 10; ++i)
    cout << e() << "\t";

提示：为了每次调用随机数引擎得到不同的随机数，可以将随机数引擎声明成static的，或者通过时间<ctime>作为随机数种子：default_rand_engine e(time(0))

17.4.2其他随机数分布

使用分布类型 - 指定随机数范围

uniform_int_distribution<unsigned> u(0, 9);
default_random_engine e;
for (size_t i = 0; i < 10; ++i)
    cout << u(e) << "\t";

使用分布类型 – 浮点数

uniform_real_distribution<double> u(0, 6);
default_random_engine e;
for (size_t i = 0; i < 10; ++i)
    cout << u(e) << "\t";

使用分布类型 – 正态分布

//均值为4，标准差1.5
normal_distribution<double> u(4,1.5);
default_random_engine e;
for (size_t i = 0; i < 10; ++i)
    cout << u(e) << "\t";

使用分布类型 – bool值

//为true的概率为0.7
bernoulli_distribution u(0.7);
default_random_engine e;
for (size_t i = 0; i < 10; ++i)
    cout << u(e) << "\t";

分布类型操作

Dist d;	默认构造，使用默认种子
Dist d(e)	使用自定义种子
d.min(), d.max()
d.reset()	重新生成

17.5再探IO库

在iostream对象中，处理维护一个状态条件外，还维护了一个格式状态来控制IO如何格式化的细节，比如整型是几进制、浮点值的精度、一个输出元素的宽度。为此，标准库定义了一组操纵符（manipulator）。

操纵符用于两大类输出控制：控制数值输出的形式、控制补白的数量和位置。大多数改变格式状态的操纵符都是设置/复原成对的。

Eg：cout<<boolalpha<<true<<endl;

控制布尔值的格式

boolalpha noboolalpha

指定整型值的进制（不显示前缀0x、0）

hex、oct、dec：十六进制、八进制、改回十进制

显示进制前缀0x、0

showbase、noshowbase

大小写输出缀X、x，E、e

uppercase、nouppercase

控制浮点数格式

使用cout.precision()返回当前精度

精度：setprecision(n), 或者对io对象cout.precision(n)

总是显示小数点：showpoint、noshowpoint(当腹地安置包含小数部分才显示)

对非负数显示+：showpos、noshowpos

对齐方式：left、right

符号值之间填充字符：internal

定点十进制：fixed

科学计数法：scientific

十六进制浮点：hexfloat，重置为double：defaultfloat

每次输出都刷新缓冲区：unitbuf、nounitbuf

输出运算符跳过空白符：skipws、noskipws

刷新缓冲区：flush

空字符，并刷新：\0

换行，并刷新：

17.5.2未格式化的输入/输出操作

未格式化IO操作，允许我们将一个流当做一个无解释的字节序列处理。

单字节操作

is.get(ch)	Put the next byte in ch, then return is
os.put(ch)
is.get()	Return next byte as int
is.putback(ch)	Put the character ch back on is; return is
is.unget()	Move is back on byte; return is
is.peek()	Return next byte as int, don’t remove it

多字节操作

Is.get(sink, size, delim)	读取size个byte到sink，其中遇见delim、文件尾停止，delim留在流中不读取
Is.getline(sink, size, delim)	同上，单会读取delim并丢弃delim
Is.read(sink, size)	读取最多size，存入sink，返回is
Is.gcount()	返回上一个未格式化读取操作从is读取的字节数
Os.write(source, size)	Source的size字节写入os，返回os
Is.ignore(size,delim)	忽略最多size个字符，包括delim，in default,size=1 & delim is EOF

17.5.3流的随机访问

标准库定义了seek和tell，单不是所有的流都会你有意义，依赖于绑定到上边的设备，如对于cin、cout、ceer和clog流不支持随机访问。

Seek和tell函数（g表示正在读取get，p表示放入put）

tellg()

tellp()

当前位置

seekg(pos)

seekp(pos)

重定位到给定的绝对地址，pos通常是前一个tellg或者tellp返回的值

seekp(off, from)

From可以为beg（流开始）、cur（当前）、end（流结尾）3，off可以为正或者负数

读写同一个文件

int main()
{
    // 打开文献，并定位到文件尾
    fstream inOut("copyOut",fstream::ate | fstream::in | fstream::out);
    if (!inOut)
    {
        cerr << "Unable to open file!" << endl;
        return EXIT_FAILURE; // 错误返回
    }
    // 记录当前读取位置
    auto end_mark = inOut.tellg();
    // 转到文件头部
    inOut.seekg(0, fstream::beg);
    size_t cnt = 0;  // 收集了byte的个数
    string line;
    //将文件中的内容重复到文件最后
    while (inOut && inOut.tellg() != end_mark && getline(inOut, line))
    {
        cnt += line.size() + 1;//1是最后的换行符
        auto mark = inOut.tellg();  // remember the read position
        inOut.seekp(0, fstream::end); // set the write marker to the end
        inOut << cnt;  // write the accumulated length
                           // print a separator if this is not the last line
        if (mark != end_mark) inOut << " ";
        inOut.seekg(mark);  // 返回读取的位置
    }
    inOut.seekp(0, fstream::end);  // seek to the end
    inOut << "\n";    // write a newline at end-offile
    return 0;
}

时间： 2024-10-04 06:27:23