sed和awk编程
扫描二维码
随时随地手机看文章
一、概念
sed为非交互式编辑器,可以文本文件和标准输入进行编辑,标准输入可以是由键盘来输入、文本重定向、字符串、变量甚至来自管道的文本;相对于vi编辑器sed一次性可以处理所有编辑任务,显得非常高效;sed一般使用于三种场合:
①编辑相对于交互式编辑器来说太大的文本;
sed [选项] ‘sed命令’ 输入文件
②编辑命令太复杂,在交互式文本编辑器难以输入的情况下;
sed [选项] -f sed脚本文件 输入文件
③对文件扫描完毕但是需要执行多个比编辑函数的情况;
./sed 脚本文件 输入文件
二、sed命令选项
文件input的内容:
Tish is Cretificate Request file:
It should be send to renyunjun!
eMail:yunjun.ren@163.com;
company:tct :
1、"-n"表示不打印文件的所有行;
例子:sed -n "3p" input 含义:打印出文件input中第三行的内容;
sed "3p" input 含义:不仅打印出第三行,会将文件的所有内容都打印出来,第三行会打印出2次;
sed -n "2,3p" input :打印出文件input内容的第2、3行;
sed -n '/yunjun/p' input:打印出文件中与yunjun匹配的所有行;
2、sed "-e"向sed传递多个命令的时候“-e”才能有用武之地;sed匹配关键字也是大小写敏感的;
sed -n -e '/yunjun/=' -e "/yunjun/p" input;
2
It should be send to renyunjun!
3
eMail:yunjun.ren@163.com;
3、sed "-f"只有调用sed脚本文件时才起作用,追加、插入、修改、删除、替换等;
sed '/yunjun/aThis is sed append msg!' input;
在匹配行后增加该内容:
Tish is Cretificate Request file:
It should be send to renyunjun!
This is sed append msg!
eMail:yunjun.ren@163.com;
This is sed append msg!
company:tct
(1)追加定位文本
新建脚本文件:append.sed内容如下:
/bin/sed -f
/yunjun/a #a表示此处需要换行添加文本
This is sed append msg!. #.换行添加文本
Append another new line!.
./append.sed input
输出:
Tish is Cretificate Request file:
It should be send to renyunjun!
This is sed append msg!.
Append another new line!.
eMail:yunjun.ren@163.com;
This is sed append msg!.
Append another new line!.
company:tct
②sed文本定位,匹配元字符,需要使用转义符""来进行屏蔽;
例子:sed -n '/./p' input
输出:eMail:yunjun.ren@163.com;
③“$”在正则表达式表示行尾,但是在sed中表示文件的最后一行:
例子:sed -n '$p' input
输出:company:tct
例子:sed -n '/.*com/p' input
匹配包含有com的任意行
④!表示取反,x,y!表示匹配不在x~y行号之外的行;
例子:sed -n '2,4!p' input 打印出input 文件中2~4行之外的所有行内容;
输出:Tish is Cretificate Request file:
⑤打印与“should”匹配的行到最后一行的内容:
sed -n '/should/,$p' input
输出:
It should be send to renyunjun!
eMail:yunjun.ren@163.com;
company:tct
⑥sed -n '/should/,3p' input打印与should匹配的行到第三行之间的内容;
输出:It should be send to renyunjun!
eMail:yunjun.ren@163.com;
(2)插入、修改、替换文本等操作
①插入
#!/bin/sed -f
/yunjun/i
insert an new Line!.
输出:Tish is Cretificate Request file:
insert an new Line!.
It should be send to renyunjun!
insert an new Line!.
eMail:yunjun.ren@163.com;
company:tct
②修改
#!/bin/sed -f
/yunjun/c
find yunjun modify here!.
输出:
Tish is Cretificate Request file:
find yunjun modify here!.
find yunjun modify here!.
company:tct
③替换
sed -n 's/yunjun/YUNJUN/p' input
输出:
It should be send to renYUNJUN!
eMail:YUNJUN.ren@163.com;
将输出内容重定向到output中
sed -n 's/yunjun/YUNJUN/w output' input
(3)写入文件“w”
sed -n "1,4 w output" input
则output文件的内容:
Tish is Cretificate Request file:
It should be send to renyunjun!
eMail:yunjun.ren@163.com;
company:tct
(4)从文本中读取文件“r”
readfile文件中的内容:
learn dan start
day day up
come on
input中内容:
Tish is Cretificate Request file:
It should be send to renyunjun!
eMail:yunjun.ren@163.com;
company:tct
sed '/on/r input' readfile 在input中的内容读取出来加在readfile文件中on的后面;
(5)退出命令“q”
sed '3 q' input
输出:
Tish is Cretificate Request file:
It should be send to renyunjun!
eMail:yunjun.ren@163.com;
(6)在定位行执行命令组{x;y}
sed -n -e '/yunjun/p' -e '/yunjun/=' input
sed -n '/yunjun/{p;=}' input
输出:
It should be send to renyunjun!
2
eMail:yunjun.ren@163.com;
3
二、awk命令选项 1、模式匹配
查找文件中空行,只要有一个空行就进行一次打印This is a blank line!
#!/usr/bin/awk -f
/^$/{printf("This is a blank line!n")}
2、记录和域
awk每个文件的输入行定义为记录,行中的每个字符串定义为域,域之间用空格、tab键或者其它符号进行分割,分割域 的符号叫做分隔符;
awk定义域操作符"$"来指定执行动作的域,域操作符“$”后面跟数字或者变量来标识域的位置;每一条记录的域重1开始编号,$1表示第一个域,$2表示第二个于,$0表示所有的域;
(1)awk打印域:
awk '{print $1,$2,$3,$4}' stu_recored
输出:
Li Hao njue 025-9998768
wang hao dsf 035-8627898
zhang san sds 035-7654456
Li Si dfdfg 025-5364555
(3)域操作符"$"后面跟变量或者变量表达式
awk 'BEGIN {one = 1;two = 2}{print $(one+two)}' stu_recored
打印输出所有记录的第三域:
输出:
njue
dsf
sds
dfdfg
(4)awk默认分隔符为空格键,将分割符由空格键改为Tab键,利用"-F"
awk -F"t" '{print $3}' stu_recored
输出结果为:
025-9998768
035-8627898
035-7654456
025-5364555
由此可以得出分隔符是可以转变的,不同的分割符可以将记录分成不同的域;
awk还提供了另一种方法来改变分割符,环境变量FS;将源文件stu_recored修改为如下:
Li Hao,njue,025-9998768
wang hao,dsf,035-8627898
zhang san,sds,035-7654456
Li Si,dfdfg,025-5364555
awk 'BEGIN{FS = ","} {print $0}' stu_recored
输出:
Li Hao,njue,025-9998768
wang hao,dsf,035-8627898
zhang san,sds,035-7654456
Li Si,dfdfg,025-5364555
(5)awk中关系运算符
1、~匹配正则表达式;!~不匹配正则表达式
awk 'BEGIN{FS = ":"} $1~/root/' /etc/passwd
匹配passwd文件中用:分割的域并匹配root的行打印出来;
2、awk中有if/if else/if else else三中条件语句
awk 'BEGIN {FS = ":"}{if($3>$4) print $0}' /etc/passwd
awk 'BEGIN {FS = ":"}{if(($3==117)||($4==15)) print $0}' /etc/passwd
3、表达式
awk变量的定义和赋值:
x = 1
y = "Good"
z = "very" "good"
分别将x,y,z赋值为:1,Good,very good
例子:打印input中的空行数
awk '/^$/{print x+=1}' input
例子,计算平均成绩
学生成绩信息:
Li Hao,njue,025-9998768,89,67,98,78
wang hao,dsf,035-8627898,78,98,99,97
zhang san,sds,035-7654456,56,67,89,79
Li Si,dfdfg,025-5364555 ,78,89,83,92
脚本如下:
#!/usr/bin/awk -f
BEGIN {FS = ","}
{total = $4+$5+$6+$7
arv = total/4
print $1,arv}
执行脚本输出如下:
Li Hao 83
wang hao 93
zhang san 72.75
Li Si 85.5
(6)系统变量
awk定义了很多内建变量用于环境设置,称为系统变量;
awk 'BEGIN {FS=","} {print NF,NR,$0} END {print FILENAME}' stu_recored
打印出stu_recored文件中的每条记录的域数,文件的记录数以及文件结尾时打印出文件名;
7 1 Li Hao,njue,025-9998768,89,67,98,78
7 2 wang hao,dsf,035-8627898,78,98,99,97
7 3 zhang san,sds,035-7654456,56,67,89,79
7 4 Li Si,dfdfg,025-5364555 ,78,89,83,92
stu_recored
(7)格式化输出
awk一大功能是产生报表,报表要求按一定的格式输出,awk定义了printf输出语句,可以规定输出的格式;
例子:参数是变量列表
awk 'BEGIN{FS = ","}{printf("%s,%dn",$1,$6)}' stu_recored
输出:
Li Hao,98
wang hao,99
zhang san,89
Li Si,83
转换为ASCII字符
awk 'BEGIN{printf("%cn",65)}'
输出:A
转换为浮点数:
awk 'BEGIN{printf("%fn",2016)}'
输出:2016.000000
格式输出增加注释:
awk 'BEGIN{FS = ",";print "namettTelphon"}{printf("%-15st%sn",$1,$3)}' stu_recored
输出:
name
Telphon
Li Hao 025-9998768
wang hao 035-8627898
zhang san 035-7654456
Li Si 025-5364555
(8)内置字符串函数
awk提供了强大的内置字符串函数,用于实现文本字符串的查找、替换、分割等;
①将zhang san替换成wang er,gsub(r,x)或gsub(r,x,w)分别是将x替换成r和在w中用x替换r;
Li Hao,njue,025-9998768,89,67,98,78
wang hao,dsf,035-8627898,78,98,99,97
zhang san,sds,035-7654456,56,67,89,79
Li Si,dfdfg,025-5364555 ,78,89,83,92
执行:awk 'BEGIN{FS = ",";OFS = ","}{gsub(/zhang san/,"wang er",$1)}{print $0}' stu_recored
Li Hao,njue,025-9998768,89,67,98,78
wang hao,dsf,035-8627898,78,98,99,97
wang er,sds,035-7654456,56,67,89,79
Li Si,dfdfg,025-5364555 ,78,89,83,92
②index(r,x)返回x在r字符串第一次出现的位置:
awk 'BEGIN{FS = ","}{print index($1,"hao")}' stu_recored
输出:
0
6
0
0
③length(s)返回s字符串长度
awk 'BEGIN{FS = ","}{print length($1)}' stu_recored
输出:
6
8
9
5
④match(s,t)测试s是否包含匹配t的字符串;
awk 'BEGIN{print match("xhang san",/san/)}'
输出:7
⑤sub(r,x,w)将w中第一次出现r替换成x;
awk 'BEGIN{str = "";print sub(/hel/,"HEL",str);printf("%sn",str)}'
输出:
1
HELlo world!
awk 'BEGIN{FS = ","}{$1~wang sub(/27/,"99",$0);print $0}' stu_recored
输出:
Li Hao,njue,025-9998768,89,67,98,78
wang hao,dsf,035-8699898,78,98,99,97
zhang san,sds,035-7654456,56,67,89,79
Li Si,dfdfg,025-5364555 ,78,89,83,92
⑥substr(r,s,t )返回字符串从s开头长度为t的字符串;
awk 'BEGIN{str = "helloworld!";print substr(str,6,6)}'
输出:world!
(9)向awk脚本传递参数;
#!/usr/bin/awk -f
NF!MAX
{printf("The line "NR" dose not have "MAX" filds!n")}
执行:
./pass.awk MAX=7 FS="," stu_recored
输出:
Li Hao,njue,025-9998768,89,67,98,78
The line 1 dose not have 7 filds!
wang hao,dsf,035-8627898,78,98,99,97
The line 2 dose not have 7 filds!
zhang san,sds,035-7654456,56,67,89,79
The line 3 dose not have 7 filds!
Li Si,dfdfg,025-5364555 ,78,89,83,92
The line 4 dose not have 7 filds!
在stu_recored每一条记录前面增加行号并全部输出:
awk 'BEGIN{FS=","}{print NR,$0}' OFS="." stu_recored
输出:
1.Li Hao,njue,025-9998768,89,67,98,78
2.wang hao,dsf,035-8627898,78,98,99,97
3.zhang san,sds,035-7654456,56,67,89,79
4.Li Si,dfdfg,025-5364555 ,78,89,83,92
OFS="."定义的输出输出分隔符增加在NR和$0之间;