erlang语言实现binary_to_term

erlang提供了两个函数用于erlang数据与二进制的转换,term_to_binary 把erlang数据转成一个二进制数据,binary_to_term 则是把二进制数据转为原始的erlang数据。这两个函数都是c实现的,效率很高,可用于序列化和反序列化数据,可以被直接用来当作网络封包协议。文章主要探讨这种二进制协议数据的组织形式,以及用erlang语言实现binary_to_term

前言

接触erlang可能有这些疑问,或者听别人这样说起:

1、erlang数字可以无限长,远远超过了INT64所能表示的范围,怎么表示的?

2、原子不能太长,会影响封包大小?

3、还有,旧版本erlang浮点数占用太多空间?

别再疑惑了,以上这些都可以在文章中找到答案。

下面演示下这两个函数:

1> term_to_binary(a).
<<131,100,0,1,97>>
2> term_to_binary(1).
<<131,97,1>>
3> term_to_binary({a,b,c}).
<<131,104,3,100,0,1,97,100,0,1,98,100,0,1,99>>
4> binary_to_term(v(1)).
a
5> binary_to_term(v(2)).
1
6> binary_to_term(v(3)).
{a,b,c}

可以看出 term_to_binary/1 生成的二进制数据是比较有规律的,第1个字节固定是131,遵循着一定的封包格式,数据才能被解析出来。知道这个二进制数据的组织形式,就可以用任意语言解包Erlang数据。

初识binary数据

现在重点介绍长整形、原子、浮点数、列表的协议规则,其他数据协议参考文档:External Term Format

erlang长整形

现在重点看下erlang长整形(erlang叫法是Bignums):


10.18 SMALL_BIG_EXT

1 1 1 n
110 n Sign d(0) ... d(n-1)

Table 10.24:

Bignums are stored in unary form with a Sign byte that is 0 if the binum is positive and 1 if is negative. The digits are stored with the LSB byte stored first. To calculate the integer the following formula can be used:
B = 256
(d0*B^0 + d1*B^1 + d2*B^2 + ... d(N-1)*B^(n-1))

实际上,erlang一个长整形会被“切分”成多个段,每个段数据用8位表示。公式为:(d0*B^0 + d1*B^1 + d2*B^2 + ... d(N-1)*B^(n-1)), 其中,B = 256

所以,理论上erlang数字可以无限长。

erlang原子


10.7  ATOM_EXT

1 2 Len
100 Len AtomName

Table 10.13:

An atom is stored with a 2 byte unsigned length in big-endian order, followed by Len numbers of 8 bit Latin1 characters that forms theAtomName. Note: The maximum allowed value for Len is 255.

erlang原子会转成字符串,所以,太长会影响二进制封包大小。

erlang浮点数


10.6  FLOAT_EXT

1 31
99 Float String

Table 10.12:

A float is stored in string format. the format used in sprintf to format the float is "%.20e" (there are more bytes allocated than necessary). To unpack the float use sscanf with format "%lf".

以上是旧版本erlang浮点数的表示方式,一个erlang浮点数就使用了31个字节来表示,相当浪费。


10.26  NEW_FLOAT_EXT

1 8
70 IEEE float

Table 10.32:

A float is stored as 8 bytes in big-endian IEEE format.

This term is used in minor version 1 of the external format.

以上的新版的erlang浮点数表示,使用了8个字节来表示一个浮点数,精简了很多。

Eshell V5.9.1 (abort with ^G)
1> term_to_binary(19.2).
<<131,99,49,46,57,49,57,57,57,57,57,57,57,57,57,57,57,57,
  57,57,48,48,48,48,101,43,48,48,49,...>>
Eshell V6.2 (abort with ^G)
1> term_to_binary(19.2).
<<131,70,64,51,51,51,51,51,51,51>>

erlang列表


10.16  LIST_EXT

1 4
108 Length Elements Tail

Table 10.22:

Length is the number of elements that follows in the Elements section. Tail is the final tail of the list; it is NIL_EXT for a proper list, but may be anything type if the list is improper (for instance [a|b]).

列表由三部分组成:长度、元素、nil标志,这里需要注意的是这个nil标志,如果是空列表,就只有nil标志,记NIL_EXT;元祖跟列表类似,少了nil标志。

好了,理论到这里。下面贴个例子,简单识别binary数据:

-module(test).
-compile(export_all).

term_to_data(Term) ->
	Bin = term_to_binary(Term),
	binary_to_data(Bin).

binary_to_data(<<131, Bin/binary>>) ->
	binary_to_data1(Bin);
binary_to_data(_) ->
	error.

-define(NEW_FLOAT_EXT, 70).
-define(SMALL_INTEGER_EXT, 97).
-define(INTEGER_EXT, 98).
-define(FLOAT_EXT, 99).
-define(ATOM_EXT, 100).
-define(SMALL_TUPLE_EXT, 104).
-define(LARGE_TUPLE_EXT, 105).
-define(NIL_EXT, 106).
-define(STRING_EXT, 107).
-define(LIST_EXT, 108).
-define(BINARY_EXT, 109).
-define(SMALL_BIG_EXT, 110).
-define(LARGE_BIG_EXT, 111).
-define(SMALL_ATOM_EXT, 115).

binary_to_data1(<<?LARGE_TUPLE_EXT, _ElemtSize:32, Bin/binary>>) ->
	binary_to_data1(Bin);
binary_to_data1(<<?SMALL_TUPLE_EXT, _ElemtSize:8, Bin/binary>>) ->
	binary_to_data1(Bin);
binary_to_data1(<<?SMALL_INTEGER_EXT, Int:8, Bin/binary>>) ->
	msg(int, Int),
	binary_to_data1(Bin);
binary_to_data1(<<?INTEGER_EXT, Int:32, Bin/binary>>) ->
	msg(int2, Int),
	binary_to_data1(Bin);
binary_to_data1(<<?FLOAT_EXT, Float:31/binary, Bin/binary>>) ->
	F=erlang:binary_to_float(Float),
	msg(float, F),
	binary_to_data1(Bin);
binary_to_data1(<<?NEW_FLOAT_EXT, Float:64/unsigned-big-float, Bin/binary>>) ->
	msg(float2, Float),
	binary_to_data1(Bin);
binary_to_data1(<<?SMALL_BIG_EXT, N:8, _Sign:8, Bin:N/binary, Rest/binary>>) ->
	{N, Big} = gen_small_big(Bin),
	msg(big, Big),
	binary_to_data1(Rest);
binary_to_data1(<<?LARGE_BIG_EXT, N:32, _Sign:8, Bin:N/binary, Rest/binary>>) ->
	{N, Big} = gen_small_big(Bin),
	msg(big2, Big),
	binary_to_data1(Rest);
binary_to_data1(<<?ATOM_EXT, Len:16, Bin:Len/binary, Rest/binary>>) ->
	msg(atom, erlang:binary_to_atom(Bin, latin1)),
	binary_to_data1(Rest);
binary_to_data1(<<?SMALL_ATOM_EXT, Len:8, Bin:Len/binary, Rest/binary>>) ->
	msg(atom2, erlang:binary_to_atom(Bin, latin1)),
	binary_to_data1(Rest);
binary_to_data1(<<?STRING_EXT, Len:16, Bin:Len/binary, Rest/binary>>) ->
	msg(string, Bin),
	binary_to_data1(Rest);
binary_to_data1(<<?BINARY_EXT, Len:32, Bin:Len/binary, Rest/binary>>) ->
	msg(binary, Bin),
	binary_to_data1(Rest);
binary_to_data1(<<?LIST_EXT, _ElemtSize:32, Bin/binary>>) ->
	%%msg(list, Bin),
	binary_to_data1(Bin);
binary_to_data1(<<?NIL_EXT, Rest/binary>>) ->
	%%msg(list_nil, []),
	binary_to_data1(Rest);
binary_to_data1(<<>>) ->
	ok;
binary_to_data1(Bin) ->
	msg(unknown, Bin).

gen_small_big(<<Bin/binary>>) ->
	gen_small_big(Bin, 0, 0).
gen_small_big(<<>>, Number, Index) ->
	{Index, Number};
gen_small_big(<<Num:8, Rest/binary>>, Number, Index) ->
	gen_small_big(Rest, Number + Num * (1 bsl (Index * 8)), Index+1).
msg(Type, Data) ->
	io:format("~w ~w~n", [Type,Data]),
	ok.

保存为test.erl,运行结果如下:

7> c(test).
{ok,test}
8> test:term_to_data({a,1,"abc"}).
atom a
int 1
string <<97,98,99>>
ok
9> term_to_binary(11111111111111).
<<131,110,6,0,199,177,212,1,27,10>>
10> test:binary_to_data(term_to_binary(11111111111111)).
big 11111111111111
ok

实现binary_to_term

下面改写上面的例子,用erlang语言实现binary_to_term,这里除了要识别二进制数据,还要将这些数据转成原始的 erlang 数据。

-module(test).
-compile(export_all).

term_to_data(Term) ->
	Bin = term_to_binary(Term),
	binary_to_data(Bin).

binary_to_data(<<131, Bin/binary>>) ->
	binary_to_data1(Bin, [], []);
binary_to_data(_) ->
	error.

-define(NEW_FLOAT_EXT, 70).
-define(SMALL_INTEGER_EXT, 97).
-define(INTEGER_EXT, 98).
-define(FLOAT_EXT, 99).
-define(ATOM_EXT, 100).
-define(SMALL_TUPLE_EXT, 104).
-define(LARGE_TUPLE_EXT, 105).
-define(NIL_EXT, 106).
-define(STRING_EXT, 107).
-define(LIST_EXT, 108).
-define(BINARY_EXT, 109).
-define(SMALL_BIG_EXT, 110).
-define(LARGE_BIG_EXT, 111).
-define(SMALL_ATOM_EXT, 115).

binary_to_data2( DataList, SizeList, Data) ->
	DataList1 = case Data of
		undefined -> DataList;
		_ -> [Data|DataList]
	end,
	case SizeList of
		[{Type, Size, Index}|R] ->
			Index1 = Index +1,
			case Size=:=Index1 of
				true ->
					{List, DataList2} = split_list(Type, DataList1, Size, []),
					DataList3 = gen_data_block(Type, List, DataList2),
					case R of
						[_|_] ->
							binary_to_data2( DataList3, R, undefined);
						_ ->
							{DataList3, R}
					end;
				_ ->
					{DataList1, [{Type, Size, Index1}|R]}
			end;
		_ ->
			{DataList1, SizeList}
	end.

split_list(list, [[]|TailList], Size, List) ->
	split_list(list, TailList, Size-1, List);
split_list(_Type, [], _Size, List) ->
	{List,[]};
split_list(_Type, TailList, 0, List) ->
	{List, TailList};
split_list(Type, [Data|TailList], Size, List) ->
	split_list(Type, TailList, Size-1, [Data|List]).

gen_data_block(tuple, List, DataList) ->
	[list_to_tuple(List)|DataList];
gen_data_block(list, List, DataList) ->
	[List|DataList].

binary_to_data1(<<?LARGE_TUPLE_EXT, ElemtSize:32, Bin/binary>>, DataList, SizeList) ->
	binary_to_data1(Bin, DataList, [{tuple, ElemtSize, 0}|SizeList]);
binary_to_data1(<<?SMALL_TUPLE_EXT, ElemtSize:8, Bin/binary>>, DataList, SizeList) ->
	binary_to_data1(Bin, DataList, [{tuple, ElemtSize, 0}|SizeList]);
binary_to_data1(<<?SMALL_INTEGER_EXT, Int:8, Bin/binary>>, DataList, SizeList) ->
	%%msg(int, Int),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Int),
	binary_to_data1(Bin, DataList1, SizeList1);
binary_to_data1(<<?INTEGER_EXT, Int:32, Bin/binary>>, DataList, SizeList) ->
	%%msg(int2, Int),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Int),
	binary_to_data1(Bin, DataList1, SizeList1);
binary_to_data1(<<?FLOAT_EXT, F:31/binary, Bin/binary>>, DataList, SizeList) ->
	Float = erlang:binary_to_float(F),
	%%msg(float, Float),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Float),
	binary_to_data1(Bin, DataList1, SizeList1);
binary_to_data1(<<?NEW_FLOAT_EXT, Float:64/unsigned-big-float, Bin/binary>>, DataList, SizeList) ->
	%%msg(float2, Float),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Float),
	binary_to_data1(Bin, DataList1, SizeList1);
binary_to_data1(<<?SMALL_BIG_EXT, N:8, _Sign:8, Bin:N/binary, Rest/binary>>, DataList, SizeList) ->
	{N, Big} = gen_small_big(Bin),
	%%msg(big, Big),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Big),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?LARGE_BIG_EXT, N:32, _Sign:8, Bin:N/binary, Rest/binary>>, DataList, SizeList) ->
	{N, Big} = gen_small_big(Bin),
	%%msg(big2, Big),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Big),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?ATOM_EXT, Len:16, Bin:Len/binary, Rest/binary>>, DataList, SizeList) ->
	Atom = erlang:binary_to_atom(Bin, latin1),
	%%msg(atom, Atom),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Atom),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?SMALL_ATOM_EXT, Len:8, Bin:Len/binary, Rest/binary>>, DataList, SizeList) ->
	Atom = erlang:binary_to_atom(Bin, latin1),
	%%msg(atom2, Atom),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Atom),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?STRING_EXT, Len:16, Bin:Len/binary, Rest/binary>>, DataList, SizeList) ->
	String = binary_to_list(Bin),
	%%msg(string, String),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, String),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?BINARY_EXT, Len:32, Bin:Len/binary, Rest/binary>>, DataList, SizeList) ->
	%%msg(binary, Bin),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, Bin),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<?LIST_EXT, ElemtSize:32, Bin/binary>>, DataList, SizeList) ->
	%%msg(list, Bin),
	binary_to_data1(Bin, DataList, [{list, ElemtSize+1, 0}|SizeList]);
binary_to_data1(<<?NIL_EXT, Rest/binary>>, DataList, SizeList) ->
	%%msg(list_nil, []),
	{DataList1, SizeList1} = binary_to_data2( DataList, SizeList, []),
	binary_to_data1(Rest, DataList1, SizeList1);
binary_to_data1(<<>>, DataList, _SizeList) ->
	%%msg(final, DataList),
	case lists:reverse(DataList) of
		[Data] -> next;
		[] -> Data=error;
		Data -> next
	end,
	Data;
binary_to_data1(Bin, _DataList, _SizeList) ->
	msg(unknown, Bin),
	error.

gen_small_big(<<Bin/binary>>) ->
	gen_small_big(Bin, 0, 0).
gen_small_big(<<>>, Number, Index) ->
	{Index, Number};
gen_small_big(<<Num:8, Rest/binary>>, Number, Index) ->
	gen_small_big(Rest, Number + Num * (1 bsl (Index * 8)), Index+1).

msg(Type, Data) ->
	io:format("~w ~w~n", [Type,Data]),
	ok.

保存为test.erl,运行结果如下:

16> c(test).
{ok,test}
17> test:term_to_data({a,b,c}).
{a,b,c}
18> test:term_to_data([]).
[]
19> test:term_to_data({a,b,{c,{d,{e,"TTT"}},f,[g,h,i],[j]}}).
{a,b,{c,{d,{e,"TTT"}},f,[g,h,i],[j]}}

以上的例子有一个小问题,erlang R15以下版本不支持 binary_to_float 函数,所以在这些erlang版本测试float使用会有问题。有兴趣的小伙伴还可以自己动手试试实现 binary_to_float

代码比较长,这里也贴了例子下载地址:http://download.csdn.net/detail/cwqcwk1/8342895,欢迎下载

最后语

文章介绍了二进制协议数据的组织形式,而且用erlang语言实现binary_to_term,再找时间也把term_to_binary实现了。今天和一个朋友说起这事,他问我能不能用c实现,我当然说能了。看来以后还要老老实实用c实现这两个函数,不然就口出狂言了。

参考:http://blog.csdn.net/mycwq/article/details/42460033

时间: 2024-10-14 14:09:58

erlang语言实现binary_to_term的相关文章

Erlang语言学习入门

学习资料: 官方Doc:http://www.erlang.org/doc.html 写的特别详细: http://www.cnblogs.com/zhengsyao/ (强推,写得很详细) 系统技术研究:http://blog.yufeng.info/ 庄周梦蝶erlang板块:http://www.blogjava.net/killme2008/category/20770.html 博客园的一位大牛:http://www.cnblogs.com/lulu/category/559387.h

Erlang语言研究综述

摘 要: 本文前半部分主要是关于Erlang编程语言相关的内容:着重就一般学习编程语言的一般的关注点来阐述了Erlang编程语言的基本语法点,主要包括:Erlang的变量.Erlang的数据类型.Erlang的语句和Erlang编程语言的函数与模块四个方面:本文的后半部分主要就Erlang语言的并行化编程的实践:Erlang的并行化编程与Erlang并行化编程在矩阵乘积的实际应用,通过实践,可以发现,Erlang语言确实在并行化编程方面表现得很优秀. 关键词:并行计算:Erlang:编程语言:矩

2.Erlang语言精要

?与Erlang shell交互 ?数据类型.模型.函数与代码编译 ?单赋值变量与模式匹配 ?Erlang语言生存指南 ?如何运用递归来编程 Erlang shell使用小技巧  help().

CentOS安装Prolog和Erlang语言

安装Erlang比较简单 下载Erlang的压缩包 输入tar -zxvf 压缩包 解压 进入解压的目录下 输入./configure 在./configure执行完成后,输入make 然后输入make install即可完成 运行时输入erl即可进入交互命令行模式 Prolog我安装的是Swing-Prolog 还有一个GNU Prolog不过我并没有安装 安装Swing-Prolog同样是很简单的, 首先将压缩包解压,输入tar -zxvf 压缩包 进入解压后的目录,输入./configur

记一次erlang语言bug导致rabbitmq的队列没有消费者的问题

公司开发和测试环境采用的erlang版本是19.0.3,rabbitmq版本为3.6.10.集群条件下稳定使用了近一年时间,没什么问题. 为了保持和线下一致,线上生产环境采用了相同的版本,运行几个月后,出现问题.现象如下: 在几天时间内,有三个队列出现没有消费者的问题.查看rabbitmq的日志. operation queue.declare caused a channel exception not_found: failed to perform operation on queue '

erlang开发经验谈:防坑指南

任何语言在使用中都会遇到这样那样的问题,erlang也是.这里整理下我遇到的一些问题,避免继续踩坑.说实话,"防坑指南"这个标题有点过于标新立异,不过还是希望能引起重视,避免在实际开发中重复犯这些问题. '--' 运算 1> [1,2,3,4] -- [1] -- [2]. [2,3,4] 算是erlang经典的问题了.这是从后面算起的,先算 [1] -- [2] ,得到 [1] 后被 [1,2,3,4] --,最后得到 [2,3,4] 如果是 ++ 也是一样的,从后面开始算起,

erlang程序优化点的总结(转)

注意,这里只是给出一个总结,具体性能需要根据实际环境和需要来确定 霸爷指出,新的erlang虚拟机有很多调优启动参数,今后现在这个方面深挖一下. 1. 进程标志设置: 消息和binary内存:erlang:process_flag(min_bin_vheap_size, 1024*1024),减少大量消息到达或处理过程中产生大量binary时的gc次数 堆内存:erlang:process_flag(min_heap_size, 1024*1024),减少处理过程中产生大量term,尤其是lis

Erlang 位串和二进制数据

http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=25876834&id=3300393 因为在本人工作中,服务端Erlang和客户端的flash通信都是发送二进制数据(协议)来通信,Erlang处理起来二进制数据真的很方便,在空余时间查看和翻译了Erlang的二进制相关一些说明文档,当然里面也有根据自己的经验和知识理解的地方. 在二进制解析部分,其实还有很多好的例子.还有就是Erlang的二进制实际应用的例子,下次会再分享的,

erlang程序优化点的总结

http://wqtn22.iteye.com/blog/1820587 转载请注明出处 注意,这里只是给出一个总结,具体性能需要根据实际环境和需要来确定 霸爷指出,新的erlang虚拟机有很多调优启动参数,今后现在这个方面深挖一下. 1. 进程标志设置: 消息和binary内存:erlang:process_flag(min_bin_vheap_size, 1024*1024),减少大量消息到达或处理过程中产生大量binary时的gc次数 堆内存:erlang:process_flag(min