最近有一个文本分析的需求,因分析系统用的是Perl,而Perl下优秀的中文文本分析包又少,所以调用R处理文本数据。
为什么不用Python
尽管Python拥有完备的NLP开源包支持,但是理由也很简单——因为Python目前接触不多,不敢班门弄斧,Python以后再说。目前,也只是需要的是一个快速原型,如果生产数据剧增,后期还需用c++重构下核心算法(顺便提一下,HMM就不重写了,吃力不讨好)。
如何开始
1.安装R程序,将R程序安装路径加入环境变量。
2.测试命令行批跑功能
cmd 输入 Rscript --arch x64 --help查看,x64是我当前安装的版本。
3.测试调用R程序,输出与Rgui一致,plot时候不会打开绘图窗口,默认以Report.pdf文件形式导出到脚本目录下。
测试
1 #!/usr/bin/perl 2 # Run R Script By Call R Program 3 # Liangwl 4 # 2015/9/19 19:43:14 5 # Todo: Get the value from R runtime.Each parameter should be defind in Perl. 6 use strict; 7 8 #Write R scripts here 9 sub Rscripts 10 { 11 my $r =<<EndOfScript; 12 #R Scripts Begin 13 #Description: Test R Script 14 Args <- commandArgs(); 15 cat("Args[1]=",Args[1],"\n"); 16 cat("Args[2]=",Args[2],"\n"); 17 cat("Args[3]=",Args[3],"\n"); 18 cat("Args[4]=",Args[4],"\n"); 19 cat("Args[5]=",Args[5],"\n"); 20 cat("Args[6]=",Args[6],"\n"); 21 cat("Args[7]=",Args[7],"\n"); 22 a <- c(1:10); 23 b <- c(10,5); 24 c = a + b; 25 d <- c(11:20); 26 c; 27 d; 28 x <- rbinom(1000, 10, 0.25); 29 y <- rbinom(1000, 10, 0.25); 30 plot(x, y); 31 plot(jitter(x),jitter(y)); 32 pairs(iris[,1:4]); 33 q(); 34 #R Scripts End 35 EndOfScript 36 return $r; 37 } 38 39 #Use pipe to Call&Exec R scripts 40 sub callR 41 { 42 my ($file,$TX_DATE) = @_; 43 my $rc = open(R,"| r --no-save $TX_DATE") or die $!; 44 unless ($rc) { 45 print "Could not invoke R command\n"; 46 return -1; 47 } 48 print R $file; 49 return $rc; 50 } 51 52 sub main 53 { 54 my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time()); 55 my $current = sprintf("%04d-%02d-%02d %02d:%02d:%02d",$year+1900,$mon + 1,$mday,$hour,$min,$sec); 56 print "$current\nPID:$$ \n------------------------------------------------------------\n"; 57 58 # There‘s two way to execute R script 59 60 # 1.execute R Script in batch 61 # The parameter which follow ‘Rscript‘ should be a *.r file 62 # The *.r file should be encode with ANSI/ASCII in UNIX/LF mode. 63 my $path = "C:\\Users\\LiangWenLong\\Desktop\\test.r"; 64 my $rc_batch = `Rscript $path 123456` or die $! ; 65 print $rc_batch; 66 print "------------------------------------------------------------\n"; 67 68 # 2.use pipe call R program and execute script 69 my $TX_DATE = ‘20150920‘; 70 my $rc_pipe = callR(Rscripts(),$TX_DATE); 71 72 #return $rc_pipe; 73 return $rc_batch; 74 } 75 my $ret = main(); 76 exit($ret);
运行结果
应用场景
分词、词频、文本挖掘、情感分析、语义分析
时间: 2024-10-12 00:10:04