cat query.fasta | parallel --block 100k --recstart '>' --pipe \
blastp -evalue 0.01 -outfmt 6 -db db.fa -query - > result.tsv
This will split the FASTA file into smaller chunks of about 100 kilobyte, while making sure that the records are valid (i.e. start with an ">").
2 comments:
Nice feature, but I can't get this syntax to occupy more than 100% of a CPU (looking at "top"). Normally I see gnu parallel occupy up to 1200% of a CPU (12 cores).
Perhaps it has something to do with the size of the FASTA file? The command line I posted takes 100k chunks of the file, so if you have a smaller file, it won't split it. You could try something like "wc" instead of blastp to see how many calls are made.
Post a Comment