氮(N)循环是地球生态系统中重要的生物地球化学途径的集合,在生态学和环境研究中得到了广泛的关注。目前,鸟枪法宏基因组测序已被广泛应用于探索负责 N 循环过程的基因家族。NCycDB 是一个手动管理的综合数据库,用于从鸟枪法宏基因组测序数据中快速准确地分析 N 循环基因(亚)家族。 NCycDB 总共包含 68 个基因(亚)家族,涵盖 8 个 N 循环过程,分别具有 95% 和 100% 一致性阈值的 84 759 和 219 146 个代表性序列。数据库中还包含了 1958 个同源直系同源组的序列,以避免由于 “小数据库” 问题导致的假阳性分配。
数据库及脚本
下载
通过 Git
git clone https://github.com/liaochenlanruo/NCyc.git
通过 wget
wget https://github.com/liaochenlanruo/NCyc/archive/refs/heads/master.zip
配置
- 通过
Git
下载的不需要解压,通过Wget
下载的需要先解压。 - 修改
NCycProfilter.PL
文件的第 8-13 行中 4 个依赖软件的安装路径。 - 将
data
目录下的NCyc_100_2019Jul.7z
解压,将解压得到的NCyc_100_2019Jul
重命名为NCyc_100.faa
并移动至data
目录下。
依赖
- Blast
- diamond
- usearch
- Perl
输入文件
序列文件
宏基因组测序得到的 Reads 文件、组装后的序列文件以及通过基因预测后得到的氨基酸序列文件均可。序列文件可以是压缩的,也可以是解压的。基因组 - 序列数对应文件
提供一份文本文档,共包含两列,第一列为样本名称
(即序列文件的名字,不带文件后缀名),第二列为样本包含的序列数量
。
预测
perl NCycProfiler.PL -d ./ -m diamond -f faa -s prot -si SI.txt -o Ncycle.out.txt
参数解析:
-d 指定工作目录,即序列文件所在目录。
-m 指定用哪个软件进行序列比对,可选 diamond
, blast
, usearch
。
-f 指定序列文件的后缀名,不需要带 .
。
-s 指定序列类型,氨基酸为 prot
,核苷酸为 nucl
。
-si 基因组 - 序列数对应文件
- rs 随机取样大小,如果不指定,将取包含序列最少的样本的序列数
- o 指定输出的文件名称
结果解析
得到的结果文件是一个表格,第一行为随机取样大小。第一列为参与 N 循环的基因名,其他列为各样本含有的对应基因的数量。
可视化
可参考本站另一篇文章 R 语言绘制气泡图 Bubb_Plot 进行数据可视化。
- 不带分组
setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc")
data <- read.table("Ncycle.out.txt",header = TRUE, sep = "\t")
library(ggplot2)
library(reshape)
data_melt <- melt(data)
names(data_melt) = c("Genes", "Samples", "Abundances")
data_melt <-as.data.frame(data_melt)
# 做主图
bubble <- ggplot(data_melt[which(data_melt$Abundances>0),], aes(x = Samples, y = Genes, size = Abundances, color = Samples)) + geom_point()
# 修改细节 — 图注,点大小,点shape
bubble_style <- bubble + theme_classic()+
labs(
x = "Sediment layers",
y = "N cycling genes",
color="Samples", # 颜色图注名
size="Abundances")+ # 大小图注名
scale_size(range = c(0.1, 10), breaks = seq(0.1, 0.6, 0.2)) + #等比修改圆圈大小
theme(plot.title=element_text(family="Arial",size=8,
color="red",face="italic",
hjust=0.5,lineheight=0.5),
plot.subtitle = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 0, hjust = 0.5))
bubble_style
依据下表获得基因的 Pathways
和 Annotation
,随后依据 Pathways
进行分组并绘图。
Pathways | Gene (sub) families | Annotation |
---|---|---|
Nitrification | amoA_A | Ammonia monooxygenase subunit A (archaea) |
amoB_A | Ammonia monooxygenase subunit B (archaea) | |
amoC_A | Ammonia monooxygenase subunit C (archaea) | |
amoA_B | Ammonia monooxygenase subunit A (bacteria) | |
amoB_B | Ammonia monooxygenase subunit B (bacteria) | |
amoC_B | Ammonia monooxygenase subunit C (bacteria) | |
hao | Hydroxylamine dehydrogenase | |
nxrA | Nitrite oxidoreductase, alpha subunit | |
nxrB | Nitrite oxidoreductase, beta subunit | |
Denitrification | napA | Periplasmic nitrate reductase NapA |
napB | Cytochrome c-type protein NapB | |
napC | Cytochrome c-type protein NapC | |
narG | Nitrate reductase | |
narH | Nitrate reductase | |
narJ | Nitrate reductase molybdenum cofactor assembly chaperone | |
narI | Nitrate reductase gamma subunit | |
nirK | Nitrite reductase (NO-forming) | |
nirS | Nitrite reductase (NO-forming) | |
norB | Nitric oxide reductase subunit B | |
norC | Nitric oxide reductase subunit C | |
nosZ | Nitrous-oxide reductase | |
narZ | Nitrate reductase 2, alpha subunit | |
narY | Nitrate reductase 2, beta subunit | |
narV | Nitrate reductase 2, gamma subunit | |
narW | Nitrate reductase 2, delta subunit | |
Assimilatory nitrate reduction | nasA | Assimilatory nitrate reductase catalytic subunit |
nasB | Assimilatory nitrate reductase electron transfer subunit | |
nirA | Ferredoxin-nitrite reductase | |
NR | Nitrate reductase (NAD(P)H) | |
narB | Assimilatory nitrate reductase | |
narC | Cytochrome b-561 | |
Dissimilatory nitrate reduction | napA | Periplasmic nitrate reductase NapA |
napB | Cytochrome c-type protein NapB | |
napC | Cytochrome c-type protein NapC | |
narG | Nitrate reductase | |
narH | Nitrate reductase | |
narJ | Nitrate reductase molybdenum cofactor assembly chaperone | |
narI | Nitrate reductase gamma subunit | |
narZ | Nitrate reductase 2, alpha subunit | |
narY | Nitrate reductase 2, beta subunit | |
narV | Nitrate reductase 2, gamma subunit | |
narW | Nitrate reductase 2, delta subunit | |
nirB | Nitrite reductase (NADH) large subunit | |
nirD | Nitrite reductase (NADH) small subunit | |
nrfA | Nitrite reductase (cytochrome c-552) | |
nrfB | Cytochrome c-type protein NrfB | |
nrfC | Protein NrfC | |
nrfD | Protein NrfD | |
Nitrogen fixation | anfG | Nitrogenase delta subunit |
nifD | Nitrogenase molybdenum-iron protein alpha chain | |
nifH | Nitrogenase iron protein NifH | |
nifK | Nitrogenase molybdenum-iron protein beta chain | |
nifW | Nitrogenase-stabilizing/protective protein | |
Anammox | hzo | Hydrazine oxidoreductase |
hzsA | Hydrazine synthase subunit A | |
hzsB | Hydrazine synthase subunit B | |
hzsC | Hydrazine synthase subunit C | |
hdh | Hydrazine dehydrogenase | |
Organic degradation and synthesis | ureA | Urease subunit gamma |
ureB | Urease subunit beta | |
ureC | Urease subunit alpha | |
nao | Nitroalkane oxidase | |
nmo | Nitronate monooxygenase | |
gdh_K00260 | Glutamate dehydrogenase | |
gdh_K00261 | Glutamate dehydrogenase (NAD(P)+) | |
gdh_K00262 | Glutamate dehydrogenase (NADP+) | |
gdh_K15371 | Glutamate dehydrogenase | |
gs_K00264 | Glutamate synthase (NADPH/NADH) | |
gs_K00265 | Glutamate synthase (NADPH/NADH) large chain | |
gs_K00266 | Glutamate synthase (NADPH/NADH) small chain | |
gs_K00284 | Glutamate synthase (ferredoxin) | |
glsA | Glutaminase | |
glnA | Glutamine synthetase | |
asnB | Asparagine synthase (glutamine-hydrolysing) | |
ansB | Glutamin-(asparagin-)ase | |
Others | hcp | Hydroxylamine reductase |
pmoA | Particulate methane monooxygenase subunit A | |
pmoB | Particulate methane monooxygenase subunit B | |
pmoC | Particulate methane monooxygenase subunit C |
- 带分组
setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc")
data <- read.table("Ncycle.out.txt",header = TRUE, sep = "\t")
library(ggplot2)
library(reshape)
data_melt <- melt(data)
names(data_melt) = c("Genes", "Annotation", "Pathways", "Samples", "Abundances")
data_melt <-as.data.frame(data_melt)
bubble <- ggplot(data_melt[which(data_melt$Abundances>0),], aes(x = Samples, y = Genes, size = Abundances, color = Samples)) + theme_bw()+ labs(x = "Sediment layers", y = "N cycling genes")+ theme(axis.text.x = element_text(angle = 0, colour = "black", vjust = 1, hjust = 1, size = 10), axis.text.y = element_text(size = 10)) +
theme(panel.grid = element_blank(), panel.border = element_blank()) +
theme(panel.spacing = unit(.1, "lines")) +
theme(plot.margin=unit(c(1, 0, 0, 1), "cm"))+ geom_point()+ facet_grid(Pathways ~ ., drop=TRUE, scale="free",space="free", switch = "y") +
theme(strip.background = element_rect(fill = "grey95", colour = "white"), strip.text.y.left = element_text(angle=360), strip.text=element_text(size=10))
参考
- NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes
- GitHub
脚本获取
关注公众号 “生信之巅”,聊天窗口回复 “18ea” 获取下载链接。
![]() | ![]() |
敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!