本文介绍如何从细菌/古菌中预测前噬菌体。
软件选择
软件安装
安装主程序
推荐使用conda
1
2conda install phispy
conda install perl-bioperl
下载模型
1 | wget https://ftp.ncbi.nlm.nih.gov/pub/kristensen/pVOGs/downloads/All/AllvogHMMprofiles.tar.gz |
输入文件
- 需要GenBank格式的文件,可通过RAST server或PROKKA获得。
运行软件
1 | PhiSpy.py genbank_file -o output_directory --phage_genes 1 --color --threads 6 --phmms pVOGs.hmm --min_contig_size 5000 --output_choice 1 |
- genbank file: 输入文件
- output directory: 输出目录
- --phage_genes: 区域内含有的被鉴定为噬菌体基因的最小数量。默认采用严格模式,即在每个前噬菌体区域必须鉴定得到两个或更多个phage基因。调高该参数的值将会减少预测到的前噬菌体的数量,反之,减小参数值,将会得到更多的移动元件。如果该参数设置为0,将会预测plasmids, integrons, and pathogenicity islands. Somewhat unexpectedly, it will also identify the ribosomal RNA operons as likely being mobile since they are unlike the host’s backbone!
- --color: 根据CDs的功能对其着色,使用Artemis查看
- --threads: 线程数
- --min_contig_size: 低于阈值的序列将被忽略,不对其进行预测
- --output_choice: 控制输出的文件类型,详见官网
报错处理
由于序列ID引发的错误
- 错误信息如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21Traceback (most recent call last):
File "/home/liu/miniconda3/envs/component/bin/PhiSpy.py", line 10, in <module>
sys.exit(run())
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/main.py", line 122, in run
main(sys.argv)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/main.py", line 44, in main
args_parser.record = PhiSpyModules.SeqioFilter(filter(lambda x: len(x.seq) > args_parser.min_contig_size, SeqIO.parse(handle, "genbank")))
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/PhiSpyModules/seqio_filter.py", line 33, in __init__
for n, item in enumerate(content):
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 74, in __next__
return next(self.records)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records
record = self.parse(handle, do_features)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 499, in parse
if self.feed(handle, consumer, do_features):
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 465, in feed
self._feed_first_line(consumer, self.line)
File "/home/liu/miniconda3/envs/component/lib/python3.7/site-packages/Bio/GenBank/Scanner.py", line 1571, in _feed_first_line
raise ValueError("Did not recognise the LOCUS line layout:\n" + line)
ValueError: Did not recognise the LOCUS line layout:
LOCUS NODE_52_length_15591_cov_14.37480715591 bp- 解决方案
- 将sequence ID缩短
批处理并整合结果
撰写脚本“run_PhiSpy.pl”。
1 | #!/usr/bin/perl |
将“run_PhiSpy.pl”和后缀名为“.gbk”的GenBank格式的文件,以及“pVOGs.hmm”放在同一目录下,运行如下命令:
1 | perl run_PhiSpy.pl |
得到两个汇总文件:
- All.prophages.txt: 包含prophage信息的文件
- All.prophages.seq: 包含prophage序列的文件
脚本获取
关注公众号“生信之巅”,聊天窗口回复“53f4”获取下载链接。
敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!