氮（N）循环是地球生态系统中重要的生物地球化学途径的集合，在生态学和环境研究中得到了广泛的关注。目前，鸟枪法宏基因组测序Shotgun metagenome sequencing已被广泛应用于探索负责 N 循环过程的基因家族。NCycDB是一个手动管理的综合数据库，用于从鸟枪法宏基因组测序数据中快速准确地分析 N 循环基因（亚）家族。 NCycDB 总共包含 68 个基因（亚）家族，涵盖 8 个 N 循环过程，分别具有 95% 和 100% 一致性阈值的 84 759 和 219 146 个代表性序列。数据库中还包含了1958 个同源直系同源组Homologous orthology groups的序列，以避免由于“小数据库”问题导致的假阳性分配。

数据库及脚本

下载

通过Git

1	git clone https://github.com/liaochenlanruo/NCyc.git

通过wget

1	wget https://github.com/liaochenlanruo/NCyc/archive/refs/heads/master.zip

配置

通过Git下载的不需要解压，通过Wget下载的需要先解压。
修改NCycProfilter.PL文件的第8-13行中4个依赖软件的安装路径。
将data目录下的NCyc_100_2019Jul.7z解压，将解压得到的NCyc_100_2019Jul重命名为NCyc_100.faa并移动至data目录下。

依赖

Blast
diamond
usearch
Perl

输入文件

序列文件
宏基因组测序得到的Reads文件、组装后的序列文件以及通过基因预测后得到的氨基酸序列文件均可。序列文件可以是压缩的，也可以是解压的。
基因组-序列数对应文件
提供一份文本文档，共包含两列，第一列为样本名称(即序列文件的名字，不带文件后缀名)，第二列为样本包含的序列数量。

预测

1	perl NCycProfiler.PL -d ./ -m diamond -f faa -s prot -si SI.txt -o Ncycle.out.txt

:::info
参数解析：
:::

-d 指定工作目录，即序列文件所在目录。
-m 指定用哪个软件进行序列比对，可选diamond，blast，usearch。
-f 指定序列文件的后缀名，不需要带.。
-s 指定序列类型，氨基酸为prot，核苷酸为nucl。
-si 基因组-序列数对应文件
-rs 随机取样大小，如果不指定，将取包含序列最少的样本的序列数
-o 指定输出的文件名称

结果解析

得到的结果文件是一个表格，第一行为随机取样大小。第一列为参与N循环的基因名，其他列为各样本含有的对应基因的数量。

可视化

可参考本站另一篇文章R语言绘制气泡图Bubb_Plot进行数据可视化。

不带分组

setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc")
data <- read.table("Ncycle.out.txt",header = TRUE, sep = "\t")

library(ggplot2)
library(reshape)
data_melt <- melt(data)
names(data_melt) = c("Genes", "Samples", "Abundances")
data_melt <-as.data.frame(data_melt)

# 做主图
bubble <- ggplot(data_melt[which(data_melt$Abundances>0),], aes(x = Samples, y = Genes, size = Abundances, color = Samples)) + geom_point()

# 修改细节 — 图注，点大小，点shape
bubble_style <- bubble + theme_classic()+
    labs(
        x = "Sediment layers",
        y = "N cycling genes",
        color="Samples", # 颜色图注名
        size="Abundances")+    # 大小图注名
    scale_size(range = c(0.1, 10), breaks = seq(0.1, 0.6, 0.2)) +  #等比修改圆圈大小
    theme(plot.title=element_text(family="Arial",size=8,
                                  color="red",face="italic",
                                  hjust=0.5,lineheight=0.5),
          plot.subtitle = element_text(hjust = 0.5)) + 
    theme(axis.text.x = element_text(angle = 0, hjust = 0.5))

bubble_style

依据下表获得基因的Pathways和Annotation，随后依据Pathways进行分组并绘图。

Pathways	Gene (sub) families	Annotation
Nitrification	amoA_A	Ammonia monooxygenase subunit A (archaea)
^^	amoB_A	Ammonia monooxygenase subunit B (archaea)
^^	amoC_A	Ammonia monooxygenase subunit C (archaea)
^^	amoA_B	Ammonia monooxygenase subunit A (bacteria)
^^	amoB_B	Ammonia monooxygenase subunit B (bacteria)
^^	amoC_B	Ammonia monooxygenase subunit C (bacteria)
^^	hao	Hydroxylamine dehydrogenase
^^	nxrA	Nitrite oxidoreductase, alpha subunit
^^	nxrB	Nitrite oxidoreductase, beta subunit
Denitrification	napA	Periplasmic nitrate reductase NapA
^^	napB	Cytochrome c-type protein NapB
^^	napC	Cytochrome c-type protein NapC
^^	narG	Nitrate reductase
^^	narH	Nitrate reductase
^^	narJ	Nitrate reductase molybdenum cofactor assembly chaperone
^^	narI	Nitrate reductase gamma subunit
^^	nirK	Nitrite reductase (NO-forming)
^^	nirS	Nitrite reductase (NO-forming)
^^	norB	Nitric oxide reductase subunit B
^^	norC	Nitric oxide reductase subunit C
^^	nosZ	Nitrous-oxide reductase
^^	narZ	Nitrate reductase 2, alpha subunit
^^	narY	Nitrate reductase 2, beta subunit
^^	narV	Nitrate reductase 2, gamma subunit
^^	narW	Nitrate reductase 2, delta subunit
Assimilatory nitrate reduction	nasA	Assimilatory nitrate reductase catalytic subunit
^^	nasB	Assimilatory nitrate reductase electron transfer subunit
^^	nirA	Ferredoxin-nitrite reductase
^^	NR	Nitrate reductase (NAD(P)H)
^^	narB	Assimilatory nitrate reductase
^^	narC	Cytochrome b-561
Dissimilatory nitrate reduction	napA	Periplasmic nitrate reductase NapA
^^	napB	Cytochrome c-type protein NapB
^^	napC	Cytochrome c-type protein NapC
^^	narG	Nitrate reductase
^^	narH	Nitrate reductase
^^	narJ	Nitrate reductase molybdenum cofactor assembly chaperone
^^	narI	Nitrate reductase gamma subunit
^^	narZ	Nitrate reductase 2, alpha subunit
^^	narY	Nitrate reductase 2, beta subunit
^^	narV	Nitrate reductase 2, gamma subunit
^^	narW	Nitrate reductase 2, delta subunit
^^	nirB	Nitrite reductase (NADH) large subunit
^^	nirD	Nitrite reductase (NADH) small subunit
^^	nrfA	Nitrite reductase (cytochrome c-552)
^^	nrfB	Cytochrome c-type protein NrfB
^^	nrfC	Protein NrfC
^^	nrfD	Protein NrfD
Nitrogen fixation	anfG	Nitrogenase delta subunit
^^	nifD	Nitrogenase molybdenum-iron protein alpha chain
^^	nifH	Nitrogenase iron protein NifH
^^	nifK	Nitrogenase molybdenum-iron protein beta chain
^^	nifW	Nitrogenase-stabilizing/protective protein
Anammox	hzo	Hydrazine oxidoreductase
^^	hzsA	Hydrazine synthase subunit A
^^	hzsB	Hydrazine synthase subunit B
^^	hzsC	Hydrazine synthase subunit C
^^	hdh	Hydrazine dehydrogenase
Organic degradation and synthesis	ureA	Urease subunit gamma
^^	ureB	Urease subunit beta
^^	ureC	Urease subunit alpha
^^	nao	Nitroalkane oxidase
^^	nmo	Nitronate monooxygenase
^^	gdh_K00260	Glutamate dehydrogenase
^^	gdh_K00261	Glutamate dehydrogenase (NAD(P)+)
^^	gdh_K00262	Glutamate dehydrogenase (NADP+)
^^	gdh_K15371	Glutamate dehydrogenase
^^	gs_K00264	Glutamate synthase (NADPH/NADH)
^^	gs_K00265	Glutamate synthase (NADPH/NADH) large chain
^^	gs_K00266	Glutamate synthase (NADPH/NADH) small chain
^^	gs_K00284	Glutamate synthase (ferredoxin)
^^	glsA	Glutaminase
^^	glnA	Glutamine synthetase
^^	asnB	Asparagine synthase (glutamine-hydrolysing)
^^	ansB	Glutamin-(asparagin-)ase
Others	hcp	Hydroxylamine reductase
^^	pmoA	Particulate methane monooxygenase subunit A
^^	pmoB	Particulate methane monooxygenase subunit B
^^	pmoC	Particulate methane monooxygenase subunit C

带分组

setwd("E:/Researches/Xiaqian/NGS/CleanData/宏基因组数据/Result/NCyc")

data <- read.table("Ncycle.out.txt",header = TRUE, sep = "\t")

library(ggplot2)
library(reshape)
data_melt <- melt(data)
names(data_melt) = c("Genes", "Annotation", "Pathways", "Samples", "Abundances")
data_melt <-as.data.frame(data_melt)
bubble <- ggplot(data_melt[which(data_melt$Abundances>0),], aes(x = Samples, y = Genes, size = Abundances, color = Samples)) + theme_bw()+ labs(x = "Sediment layers", y = "N cycling genes")+ theme(axis.text.x = element_text(angle = 0, colour = "black", vjust = 1, hjust = 1, size = 10), axis.text.y = element_text(size = 10)) +
    theme(panel.grid = element_blank(), panel.border = element_blank()) +
    theme(panel.spacing = unit(.1, "lines")) + 
    theme(plot.margin=unit(c(1, 0, 0, 1), "cm"))+ geom_point()+ facet_grid(Pathways ~ ., drop=TRUE, scale="free",space="free", switch = "y") +
    theme(strip.background = element_rect(fill = "grey95", colour = "white"), strip.text.y.left = element_text(angle=360), strip.text=element_text(size=10))

参考

脚本获取

关注公众号“生信之巅”，聊天窗口回复“18ea”获取下载链接。

敬告：使用文中脚本请引用本文网址，请尊重本人的劳动成果，谢谢！Notice: When you use the scripts in this article, please cite the link of this webpage. Thank you!

了尘兰若的小坑

分享微生物生物信息学分析方法，欢迎加入QQ群交流945751012，不接受群内广告！

数据库及脚本

下载

配置

依赖

输入文件

预测

结果解析

可视化

参考

脚本获取

你的赏识是我前进的动力