15 Linux Split and Join Command Examples to Manage Large Files--reference

by HIMANSHU
ARORA on OCTOBER
16, 2012

http://www.thegeekstuff.com/2012/10/15-linux-split-and-join-command-examples-to-manage-large-files/

Linux split and join commands are very helpful when you are manipulating
large files. This article explains how to use Linux split and join command with
descriptive examples.

Join and split command syntax:

join [OPTION]… FILE1 FILE2
split [OPTION]… [INPUT
[PREFIX]]


Linux Split Command Examples


1. Basic Split Example

Here is a basic example of split command.

$ split split.zip

$ ls
split.zip xab xad xaf xah xaj xal xan xap xar xat xav xax xaz xbb xbd xbf xbh xbj xbl xbn
xaa xac xae xag xai xak xam xao xaq xas xau xaw xay xba xbc xbe xbg xbi xbk xbm xbo


So we see that the file split.zip was split into smaller files with x** as
file names. Where ** is the two character suffix that is added by default. Also,
by default each x** file would contain 1000 lines.

$ wc -l *
40947 split.zip
1000 xaa
1000 xab
1000 xac
1000 xad
1000 xae
1000 xaf
1000 xag
1000 xah
1000 xai
...
...
...

So the output above confirms that by default each x** file contains 1000
lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above, the default suffix length is 2. But this can
be changed by using -a option.

As you see in the following example, it is using suffix of length 5 on the
split files.

$ split -a5 split.zip
$ ls
split.zip xaaaac xaaaaf xaaaai xaaaal xaaaao xaaaar xaaaau xaaaax xaaaba xaaabd xaaabg xaaabj xaaabm
xaaaaa xaaaad xaaaag xaaaaj xaaaam xaaaap xaaaas xaaaav xaaaay xaaabb xaaabe xaaabh xaaabk xaaabn
xaaaab xaaaae xaaaah xaaaak xaaaan xaaaaq xaaaat xaaaaw xaaaaz xaaabc xaaabf xaaabi xaaabl xaaabo

Note: Earlier we also discussed about other file manipulation utilities
– tac,
rev, paste
.

3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.

In this example, the split files were created with a size of 200000
bytes.

$ split -b200000 split.zip

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu 4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xar
...
...
...


4. Create Split Files with Numeric Suffix using -d option


As seen in examples above, the output has the format of x** where ** are
alphabets. You can change this to number using -d option.

Here is an example. This has numeric suffix on the split files.

$ split -d split.zip
$ ls
split.zip x01 x03 x05 x07 x09 x11 x13 x15 x17 x19 x21 x23 x25 x27 x29 x31 x33 x35 x37 x39
x00 x02 x04 x06 x08 x10 x12 x14 x16 x18 x20 x22 x24 x26 x28 x30 x32 x34 x36 x38 x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks, use the -C option.

This example will create 50 chunks of split files.

$ split -n50 split.zip
$ ls
split.zip xac xaf xai xal xao xar xau xax xba xbd xbg xbj xbm xbp xbs xbv
xaa xad xag xaj xam xap xas xav xay xbb xbe xbh xbk xbn xbq xbt xbw
xab xae xah xak xan xaq xat xaw xaz xbc xbf xbi xbl xbo xbr xbu xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks, its good
to avoid zero sized chunks as they do not add any value. This can be done using
-e option.

Here is an example:

$ split -n50 testfile

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
...
...
...


So we see that lots of zero size chunks were produced in the above output.
Now, lets use -e option and see the results:

$ split -n50 -e testfile
$ ls
split.zip testfile xaa xab xac xad xae xaf

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa


So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l
option.

As seen in the example below, split files are created with 20000 lines.

$ split -l20000 split.zip

$ ls
split.zip testfile xaa xab xac

$ wc -l x*
20000 xaa
20000 xab
947 xac
40947 total


Get Detailed Information using –verbose option


To get a diagnostic message each time a new split file is opened, use
–verbose option as shown below.

$ split -l20000 --verbose split.zip
creating file `xaa‘
creating file `xab‘
creating file `xac‘

Linux Join Command Examples


8. Basic Join Example

Join command works on first field of the two files (supplied as input) by
matching the first fields.

Here is an example :

$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
4 UK London
5 Canada Toronto


So we see that a file containing countries was joined with another file
containing capitals on the basis of first field.

9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows
up a warning in output and that particular entry is not joined.

In this example, since the input file is not sorted, it will display a
warning/error message.

$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK

$ cat testfile2
1 NewDelhi
2 Washington
3 Dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland Dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto


10. Ignore Case using -i option


When comparing fields, the difference in case can be ignored using -i option
as shown below.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
a NewDelhi
B Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
c Ireland Dublin
d UK London
e Canada Toronto

$ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto


11. Verify that Input is Sorted using –check-order option


Here is an example. Since testfile1 was unsorted towards the end so an error
was produced in the output.

$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
join: testfile1:6: is not sorted: e Canada


12. Do not Check the Sortness using –nocheck-order option


This is the opposite of the previous example. No check for sortness is done
in this example, and it will not display any error message.

$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM]
option we can have those lines that cannot be paired while comparing. FILENUM is
the file number (1 or 2).

In the following example, we see that using -a1 produced the last line in
testfile1 (marked as bold below) which had no pair in testfile2.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia

$ cat testfile2
a NewDelhi
b Washington
c Dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

$ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto
f Australia


14. Print Only Unpaired Lines using -v option


In the above example both paired and unpaired lines were produced in the
output. But, if only unpaired output is desired then use -v option as shown
below.

$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2
option

By default the first columns in both the files is used for comparing before
joining. You can change this behavior using -1 and -2 option.

In the following example, the first column of testfile1 was compared with the
second column of testfile2 to produce the join command output.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
Dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland Dublin
d UK London
e Canada Toronto

15 Linux Split and Join Command Examples to Manage Large
Files--reference

时间: 2024-08-10 23:26:16

15 Linux Split and Join Command Examples to Manage Large Files--reference的相关文章

Basic chkconfig Command Examples in Linux

This is our ongoing Linux command series where we are going to review how we can usechkconfigcommand efficiently with it's available parameters. TheChkconfigcommand tool allows to configure services start and stop automatically in the/etc/rd.d/init.d

Java中的split和join

Javascript中的用于字符串和数组之间转换的split和join函数使用起来非常方便,在Java中也有这两个函数,只不过join是在apache commons的lang库里实现的. 1 import org.apache.commons.lang3.StringUtils; 2 3 public class SplitJoin { 4 public static void main(String[] args){ 5 String str = "a|b|c|d|e|f|g";

python —— strip,split,join

strip函数原型 声明:s为字符串,rm为要删除的字符序列 s.strip(rm)        删除s字符串中开头.结尾处,位于 rm删除序列的字符 s.lstrip(rm)       删除s字符串中开头处,位于 rm删除序列的字符 s.rstrip(rm)      删除s字符串中结尾处,位于 rm删除序列的字符 注意: 1. 当rm为空时,默认删除空白符(包括'\n', '\r',  '\t',  ' ') 例如: 2.这里的rm删除序列是只要边(开头或结尾)上的字符在删除序列内,就删

(15):(行为型模式) Command 命令模式

(15):(行为型模式) Command 命令模式

C# 中奇妙的函数–7. String Split 和 Join

很多时候处理字符串数据,比如从文件中读取或者存入 - 我们可能需要加入分隔符(如CSV文件中的逗号),或使用一个分隔符来合并字符串序列. 很多人都知道使用split()的方法,但使用与其对应的Join()方法的人就没有那么多.今天,让我们看看它们的应用. Split() – 根据分隔符切割字符串成几部分 分割符可以是以下几种形式中的一种: 只有一个字符的数组: 比方对于“A,B,C,D,E,F” 可以使用 ‘,’ 或者 new [] { ‘,’ }  有多个字符的数组: 比方对于“A,B-C,D

String Split 和 Join

很多时候处理字符串数据,比如从文件中读取或者存入 - 我们可能需要加入分隔符(如CSV文件中的逗号),或使用一个分隔符来合并字符串序列. 很多人都知道使用split()的方法,但使用与其对应的Join()方法的人就没有那么多.今天,让我们看看它们的应用. Split() – 根据分隔符切割字符串成几部分 分割符可以是以下几种形式中的一种: 只有一个字符的数组: 比方对于“A,B,C,D,E,F” 可以使用 ‘,’ 或者 new [] { ‘,’ }  有多个字符的数组: 比方对于“A,B-C,D

OK335xS Linux Qt make: icpc: Command not found

OK335xS Linux Qt make: icpc: Command not found 一.出错现象: make: icpc: Command not found make: *** [main.o] Error 127 08:55:20: The process "/usr/bin/make" exited with code 2. Error while building/deploying project heatMachine (kit: TI_arm) When exe

javaScript之split与join的区别

共同点:split与join函数通常都是对字符或字符串的操作: 两者的区别:(1)split()用于分割字符串,返回一个数组,例如 var string="hello world?name=xiaobai"; var splitString = string.split("?"); console.log(splitString);//["hello world","name=xiaobai"] split()只有一个参数时:

Linux split命令

Linux split命令用于将一个文件分割成数个. 该指令将大文件分割成较小的文件,在默认情况下将按照每1000行切割成一个小文件. 语法 split [--help][--version][-<行数>][-b <字节>][-C <字节>][-l <行数>][要切割的文件][输出文件名] 参数说明: -<行数> : 指定每多少行切成一个小文件 -b<字节> : 指定每多少字节切成一个小文件 --help : 在线帮助 --versi