氮(N)循环是地球生态系统中重要的生物地球化学途径的集合,在生态学和环境研究中得到了广泛的关注。目前,鸟枪法宏基因组测序已被广泛应用于探索负责 N 循环过程的基因家族。NCycDB是一个手动管理的综合数据库,用于从鸟枪法宏基因组测序数据中快速准确地分析 N 循环基因(亚)家族。 NCycDB 总共包含 68 个基因(亚)家族,涵盖 8 个 N 循环过程,分别具有 95% 和 100% 一致性阈值的 84 759 和 219 146 个代表性序列。数据库中还包含了1958 个同源直系同源组的序列,以避免由于“小数据库”问题导致的假阳性分配。
数据库及脚本
下载
通过Git
1
git clone https://github.com/liaochenlanruo/NCyc.git
通过wget
1
wget https://github.com/liaochenlanruo/NCyc/archive/refs/heads/master.zip
配置
- 通过
Git
下载的不需要解压,通过Wget
下载的需要先解压。 - 修改
NCycProfilter.PL
文件的第8-13行中4个依赖软件的安装路径。 - 将
data
目录下的NCyc_100_2019Jul.7z
解压,将解压得到的NCyc_100_2019Jul
重命名为NCyc_100.faa
并移动至data
目录下。
依赖
- Blast
- diamond
- usearch
- Perl
输入文件
序列文件
宏基因组测序得到的Reads文件、组装后的序列文件以及通过基因预测后得到的氨基酸序列文件均可。序列文件可以是压缩的,也可以是解压的。基因组-序列数对应文件
提供一份文本文档,共包含两列,第一列为样本名称
(即序列文件的名字,不带文件后缀名),第二列为样本包含的序列数量
。
预测
1 | perl NCycProfiler.PL -d ./ -m diamond -f faa -s prot -si SI.txt -o Ncycle.out.txt |
:::info
参数解析:
:::
-d 指定工作目录,即序列文件所在目录。
-m 指定用哪个软件进行序列比对,可选diamond
,blast
,usearch
。
-f 指定序列文件的后缀名,不需要带.
。
-s 指定序列类型,氨基酸为prot
,核苷酸为nucl
。
-si 基因组-序列数对应文件
-rs 随机取样大小,如果不指定,将取包含序列最少的样本的序列数
-o 指定输出的文件名称
结果解析
得到的结果文件是一个表格,第一行为随机取样大小。第一列为参与N循环的基因名,其他列为各样本含有的对应基因的数量。
可视化
可参考本站另一篇文章R语言绘制气泡图Bubb_Plot进行数据可视化。
- 不带分组
1 | setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc") |
依据下表获得基因的Pathways
和Annotation
,随后依据Pathways
进行分组并绘图。
Pathways | Gene (sub) families | Annotation |
---|---|---|
Nitrification | amoA_A | Ammonia monooxygenase subunit A (archaea) |
^^ | amoB_A | Ammonia monooxygenase subunit B (archaea) |
^^ | amoC_A | Ammonia monooxygenase subunit C (archaea) |
^^ | amoA_B | Ammonia monooxygenase subunit A (bacteria) |
^^ | amoB_B | Ammonia monooxygenase subunit B (bacteria) |
^^ | amoC_B | Ammonia monooxygenase subunit C (bacteria) |
^^ | hao | Hydroxylamine dehydrogenase |
^^ | nxrA | Nitrite oxidoreductase, alpha subunit |
^^ | nxrB | Nitrite oxidoreductase, beta subunit |
Denitrification | napA | Periplasmic nitrate reductase NapA |
^^ | napB | Cytochrome c-type protein NapB |
^^ | napC | Cytochrome c-type protein NapC |
^^ | narG | Nitrate reductase |
^^ | narH | Nitrate reductase |
^^ | narJ | Nitrate reductase molybdenum cofactor assembly chaperone |
^^ | narI | Nitrate reductase gamma subunit |
^^ | nirK | Nitrite reductase (NO-forming) |
^^ | nirS | Nitrite reductase (NO-forming) |
^^ | norB | Nitric oxide reductase subunit B |
^^ | norC | Nitric oxide reductase subunit C |
^^ | nosZ | Nitrous-oxide reductase |
^^ | narZ | Nitrate reductase 2, alpha subunit |
^^ | narY | Nitrate reductase 2, beta subunit |
^^ | narV | Nitrate reductase 2, gamma subunit |
^^ | narW | Nitrate reductase 2, delta subunit |
Assimilatory nitrate reduction | nasA | Assimilatory nitrate reductase catalytic subunit |
^^ | nasB | Assimilatory nitrate reductase electron transfer subunit |
^^ | nirA | Ferredoxin-nitrite reductase |
^^ | NR | Nitrate reductase (NAD(P)H) |
^^ | narB | Assimilatory nitrate reductase |
^^ | narC | Cytochrome b-561 |
Dissimilatory nitrate reduction | napA | Periplasmic nitrate reductase NapA |
^^ | napB | Cytochrome c-type protein NapB |
^^ | napC | Cytochrome c-type protein NapC |
^^ | narG | Nitrate reductase |
^^ | narH | Nitrate reductase |
^^ | narJ | Nitrate reductase molybdenum cofactor assembly chaperone |
^^ | narI | Nitrate reductase gamma subunit |
^^ | narZ | Nitrate reductase 2, alpha subunit |
^^ | narY | Nitrate reductase 2, beta subunit |
^^ | narV | Nitrate reductase 2, gamma subunit |
^^ | narW | Nitrate reductase 2, delta subunit |
^^ | nirB | Nitrite reductase (NADH) large subunit |
^^ | nirD | Nitrite reductase (NADH) small subunit |
^^ | nrfA | Nitrite reductase (cytochrome c-552) |
^^ | nrfB | Cytochrome c-type protein NrfB |
^^ | nrfC | Protein NrfC |
^^ | nrfD | Protein NrfD |
Nitrogen fixation | anfG | Nitrogenase delta subunit |
^^ | nifD | Nitrogenase molybdenum-iron protein alpha chain |
^^ | nifH | Nitrogenase iron protein NifH |
^^ | nifK | Nitrogenase molybdenum-iron protein beta chain |
^^ | nifW | Nitrogenase-stabilizing/protective protein |
Anammox | hzo | Hydrazine oxidoreductase |
^^ | hzsA | Hydrazine synthase subunit A |
^^ | hzsB | Hydrazine synthase subunit B |
^^ | hzsC | Hydrazine synthase subunit C |
^^ | hdh | Hydrazine dehydrogenase |
Organic degradation and synthesis | ureA | Urease subunit gamma |
^^ | ureB | Urease subunit beta |
^^ | ureC | Urease subunit alpha |
^^ | nao | Nitroalkane oxidase |
^^ | nmo | Nitronate monooxygenase |
^^ | gdh_K00260 | Glutamate dehydrogenase |
^^ | gdh_K00261 | Glutamate dehydrogenase (NAD(P)+) |
^^ | gdh_K00262 | Glutamate dehydrogenase (NADP+) |
^^ | gdh_K15371 | Glutamate dehydrogenase |
^^ | gs_K00264 | Glutamate synthase (NADPH/NADH) |
^^ | gs_K00265 | Glutamate synthase (NADPH/NADH) large chain |
^^ | gs_K00266 | Glutamate synthase (NADPH/NADH) small chain |
^^ | gs_K00284 | Glutamate synthase (ferredoxin) |
^^ | glsA | Glutaminase |
^^ | glnA | Glutamine synthetase |
^^ | asnB | Asparagine synthase (glutamine-hydrolysing) |
^^ | ansB | Glutamin-(asparagin-)ase |
Others | hcp | Hydroxylamine reductase |
^^ | pmoA | Particulate methane monooxygenase subunit A |
^^ | pmoB | Particulate methane monooxygenase subunit B |
^^ | pmoC | Particulate methane monooxygenase subunit C |
- 带分组
1 | setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc") |
参考
- NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes
- GitHub
脚本获取
关注公众号“生信之巅”,聊天窗口回复“18ea”获取下载链接。
敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!