氮(N)循环是地球生态系统中重要的生物地球化学途径的集合,在生态学和环境研究中得到了广泛的关注。目前,鸟枪法宏基因组测序已被广泛应用于探索负责 N 循环过程的基因家族。NCycDB是一个手动管理的综合数据库,用于从鸟枪法宏基因组测序数据中快速准确地分析 N 循环基因(亚)家族。 NCycDB 总共包含 68 个基因(亚)家族,涵盖 8 个 N 循环过程,分别具有 95% 和 100% 一致性阈值的 84 759 和 219 146 个代表性序列。数据库中还包含了1958 个同源直系同源组的序列,以避免由于“小数据库”问题导致的假阳性分配。
数据库及脚本
下载
通过Git
1
git clone https://github.com/liaochenlanruo/NCyc.git
通过wget
1
wget https://github.com/liaochenlanruo/NCyc/archive/refs/heads/master.zip
配置
- 通过
Git下载的不需要解压,通过Wget下载的需要先解压。 - 修改
NCycProfilter.PL文件的第8-13行中4个依赖软件的安装路径。 - 将
data目录下的NCyc_100_2019Jul.7z解压,将解压得到的NCyc_100_2019Jul重命名为NCyc_100.faa并移动至data目录下。
依赖
- Blast
- diamond
- usearch
- Perl
输入文件
序列文件
宏基因组测序得到的Reads文件、组装后的序列文件以及通过基因预测后得到的氨基酸序列文件均可。序列文件可以是压缩的,也可以是解压的。基因组-序列数对应文件
提供一份文本文档,共包含两列,第一列为样本名称(即序列文件的名字,不带文件后缀名),第二列为样本包含的序列数量。
预测
1 | perl NCycProfiler.PL -d ./ -m diamond -f faa -s prot -si SI.txt -o Ncycle.out.txt |
:::info
参数解析:
:::
-d 指定工作目录,即序列文件所在目录。
-m 指定用哪个软件进行序列比对,可选diamond,blast,usearch。
-f 指定序列文件的后缀名,不需要带.。
-s 指定序列类型,氨基酸为prot,核苷酸为nucl。
-si 基因组-序列数对应文件
-rs 随机取样大小,如果不指定,将取包含序列最少的样本的序列数
-o 指定输出的文件名称
结果解析
得到的结果文件是一个表格,第一行为随机取样大小。第一列为参与N循环的基因名,其他列为各样本含有的对应基因的数量。
可视化
可参考本站另一篇文章R语言绘制气泡图Bubb_Plot进行数据可视化。
- 不带分组
1 | setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc") |
依据下表获得基因的Pathways和Annotation,随后依据Pathways进行分组并绘图。
| Pathways | Gene (sub) families | Annotation |
|---|---|---|
| Nitrification | amoA_A | Ammonia monooxygenase subunit A (archaea) |
| ^^ | amoB_A | Ammonia monooxygenase subunit B (archaea) |
| ^^ | amoC_A | Ammonia monooxygenase subunit C (archaea) |
| ^^ | amoA_B | Ammonia monooxygenase subunit A (bacteria) |
| ^^ | amoB_B | Ammonia monooxygenase subunit B (bacteria) |
| ^^ | amoC_B | Ammonia monooxygenase subunit C (bacteria) |
| ^^ | hao | Hydroxylamine dehydrogenase |
| ^^ | nxrA | Nitrite oxidoreductase, alpha subunit |
| ^^ | nxrB | Nitrite oxidoreductase, beta subunit |
| Denitrification | napA | Periplasmic nitrate reductase NapA |
| ^^ | napB | Cytochrome c-type protein NapB |
| ^^ | napC | Cytochrome c-type protein NapC |
| ^^ | narG | Nitrate reductase |
| ^^ | narH | Nitrate reductase |
| ^^ | narJ | Nitrate reductase molybdenum cofactor assembly chaperone |
| ^^ | narI | Nitrate reductase gamma subunit |
| ^^ | nirK | Nitrite reductase (NO-forming) |
| ^^ | nirS | Nitrite reductase (NO-forming) |
| ^^ | norB | Nitric oxide reductase subunit B |
| ^^ | norC | Nitric oxide reductase subunit C |
| ^^ | nosZ | Nitrous-oxide reductase |
| ^^ | narZ | Nitrate reductase 2, alpha subunit |
| ^^ | narY | Nitrate reductase 2, beta subunit |
| ^^ | narV | Nitrate reductase 2, gamma subunit |
| ^^ | narW | Nitrate reductase 2, delta subunit |
| Assimilatory nitrate reduction | nasA | Assimilatory nitrate reductase catalytic subunit |
| ^^ | nasB | Assimilatory nitrate reductase electron transfer subunit |
| ^^ | nirA | Ferredoxin-nitrite reductase |
| ^^ | NR | Nitrate reductase (NAD(P)H) |
| ^^ | narB | Assimilatory nitrate reductase |
| ^^ | narC | Cytochrome b-561 |
| Dissimilatory nitrate reduction | napA | Periplasmic nitrate reductase NapA |
| ^^ | napB | Cytochrome c-type protein NapB |
| ^^ | napC | Cytochrome c-type protein NapC |
| ^^ | narG | Nitrate reductase |
| ^^ | narH | Nitrate reductase |
| ^^ | narJ | Nitrate reductase molybdenum cofactor assembly chaperone |
| ^^ | narI | Nitrate reductase gamma subunit |
| ^^ | narZ | Nitrate reductase 2, alpha subunit |
| ^^ | narY | Nitrate reductase 2, beta subunit |
| ^^ | narV | Nitrate reductase 2, gamma subunit |
| ^^ | narW | Nitrate reductase 2, delta subunit |
| ^^ | nirB | Nitrite reductase (NADH) large subunit |
| ^^ | nirD | Nitrite reductase (NADH) small subunit |
| ^^ | nrfA | Nitrite reductase (cytochrome c-552) |
| ^^ | nrfB | Cytochrome c-type protein NrfB |
| ^^ | nrfC | Protein NrfC |
| ^^ | nrfD | Protein NrfD |
| Nitrogen fixation | anfG | Nitrogenase delta subunit |
| ^^ | nifD | Nitrogenase molybdenum-iron protein alpha chain |
| ^^ | nifH | Nitrogenase iron protein NifH |
| ^^ | nifK | Nitrogenase molybdenum-iron protein beta chain |
| ^^ | nifW | Nitrogenase-stabilizing/protective protein |
| Anammox | hzo | Hydrazine oxidoreductase |
| ^^ | hzsA | Hydrazine synthase subunit A |
| ^^ | hzsB | Hydrazine synthase subunit B |
| ^^ | hzsC | Hydrazine synthase subunit C |
| ^^ | hdh | Hydrazine dehydrogenase |
| Organic degradation and synthesis | ureA | Urease subunit gamma |
| ^^ | ureB | Urease subunit beta |
| ^^ | ureC | Urease subunit alpha |
| ^^ | nao | Nitroalkane oxidase |
| ^^ | nmo | Nitronate monooxygenase |
| ^^ | gdh_K00260 | Glutamate dehydrogenase |
| ^^ | gdh_K00261 | Glutamate dehydrogenase (NAD(P)+) |
| ^^ | gdh_K00262 | Glutamate dehydrogenase (NADP+) |
| ^^ | gdh_K15371 | Glutamate dehydrogenase |
| ^^ | gs_K00264 | Glutamate synthase (NADPH/NADH) |
| ^^ | gs_K00265 | Glutamate synthase (NADPH/NADH) large chain |
| ^^ | gs_K00266 | Glutamate synthase (NADPH/NADH) small chain |
| ^^ | gs_K00284 | Glutamate synthase (ferredoxin) |
| ^^ | glsA | Glutaminase |
| ^^ | glnA | Glutamine synthetase |
| ^^ | asnB | Asparagine synthase (glutamine-hydrolysing) |
| ^^ | ansB | Glutamin-(asparagin-)ase |
| Others | hcp | Hydroxylamine reductase |
| ^^ | pmoA | Particulate methane monooxygenase subunit A |
| ^^ | pmoB | Particulate methane monooxygenase subunit B |
| ^^ | pmoC | Particulate methane monooxygenase subunit C |
- 带分组
1 | setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc") |
参考
- NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes
- GitHub
脚本获取
关注公众号“生信之巅”,聊天窗口回复“18ea”获取下载链接。
![]() |
![]() |
敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!

