What is awk?
“AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.” – Alfred V. Aho
Why awk?
1.AWK is simpler to use than most conventional programming languages.
2. It is fast.
3. It has string manipulation functions, so it can search for particular strings and modify the output.
4. A version of the AWK language is a standard feature of nearly every modern Unix-like operating system available today.
Simple examples on how to use AWK:
Seb
http://www.pement.org/awk/awk1line.txt
Greg_B
# - SAM files - # #Count number of reads aligning to each contig/chromosome and print total and as a percent awk '{c[$3]++}END{for(j in c) print j,c[j],(c[j]/NR*100),"%"}' Aligned.sam # - Blast files - # # Not the prettiest (piping awk into awk) but gets the job done quickly, improvements welcome! # count hits wc blast_all_vs_all/trin_vs_trin.tab 3028796 #no self hits awk '$1!=$2' blast_all_vs_all/trin_vs_trin.tab > blast_all_vs_all/no_self wc blast_all_vs_all/no_self 2958010 # how many matches are 200bp +? awk '$4>200' blast_all_vs_all/no_self | wc 17867 #of those how many have 80% ID? awk '$4>200' blast_all_vs_all/no_self | awk '$3>80' | wc 14151 # over 90% awk '$4>200' blast_all_vs_all/no_self | awk '$3>90' | wc 143
Thanks Seb.
I think that as you and others come up with one-liners to post they should probably be added to the body of this post rather than in the comments here or in a separate post.
Dan.
“I want to get the number of lines and file name from a directory of sam files
but I only want to count lines in which column six does not equal ‘*’ and do not match ‘@SQ’.”
awk ‘$6!=”*” && $1 !~/@SQ/ {sum++} END {print sum, FILENAME}’ sam/* >sam_counts
There are a few cool things to note here:
sum++ #we increment the count as we go through the file
END #we issue a command at the end of each file
FILENAME #this is a super handy built-in awk variable