NCBI上传基因簇之table2asn的使用
发表于:2024-08-28 | 分类: 生物信息
字数统计: 1.9k | 阅读时长: 10分钟 | 阅读量:

NCBI提交基因簇的时候需要提供sqn格式的文件,之前我在文章《NCBI上传基因簇之tbl2asn的使用》中介绍过如何使用tbl2asn生成sqn文件,遗憾的是tbl2asn官方已经不再提供软件下载了,提供的新工具为table2asn,本文介绍table2asn的使用方法。

软件安装

下载table2asn

1
2
# 此处下载Linux版,Windows和MacOS请自行到https://ftp.ncbi.nlm.nih.gov/asn1-converters/by_program/table2asn/下载
wget https://ftp.ncbi.nlm.nih.gov/asn1-converters/by_program/table2asn/linux64.table2asn.gz

安装

解压缩文件后得到可执行程序,将其重命名为table2asn,并将其加入环境变量即可,环境变量的设置请自行搜索。

文件准备

table2asn依赖三个文件来生成sqn文件:

  • 文件1:fasta格式的基因组序列文件,文件后缀需要为.fsa,如Toyoncin.fas

注意Header处需要添加中括号部分,及相关描述信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
>Toyoncin_biosynthesis_gene_cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence
ttaaaa taatttaata
gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca
tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca
ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta
accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt
aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca
cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac
aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc
tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac
atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca
gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc
aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta
caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga
ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat
cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact
gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct
ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat
gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat
agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta
atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa
ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac
atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc
aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact
gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa
ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc
actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc
aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga
catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc
aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag
aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc
aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt
cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt
cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata
tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc
tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc
tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag
ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat
aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt
aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc
tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac
ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat
ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt
catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt
attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac
aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag
tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata
caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct
gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta
aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc
ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca
cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct
tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca
acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata
tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc
agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat
ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta
ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt
aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa
ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac
cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat
cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta
cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat
atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta
cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg
ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta
atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac
cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag
caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact
tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct
cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt
tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat
caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg
ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca
gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat
tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt
tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa
ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa
caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa
acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa
tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa
aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg
aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa
taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa
aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat
agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta
ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt
aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga
gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt
gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg
gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa
aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg
ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg
gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg
attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa
ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat
atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta
ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac
ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt
tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg
ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct
tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca
gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca
aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac
gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa
agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat
ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt
atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat
atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta
taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa
tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat
tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac
caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca
atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat
attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat
ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt
tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct
ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt
taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta
tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata
catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata
taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat
ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta
tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct
tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa
ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa
atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg
gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat
tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa
aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa
aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt
aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta
ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct
ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga
tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact
tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta
atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct
aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca
agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt
tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga
cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc
cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa
  • 文件2:描述基因特征的feature table文件(.tbl),文件名与FASTA文件一致,如Toyoncin.tbl

该文件可以用prokka对文件1进行注释而得到,但是需要自己加以修改,加上gene相关的信息,product部分也要自己修改,该文件共5列,各列之间用制表符分隔。Header部分的名称要与Fasta文件中的一致,但开头需要加上Feature

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
>Feature Toyoncin_biosynthesis_gene_cluster
585 1 gene
gene orf1
585 1 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00001
product MarR family transcriptional regulator
1476 811 gene
gene orf2
1476 811 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00002
product YIP1 family membrane protein
2710 1496 gene
gene orf3
2710 1496 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00003
product ABC transporter permease
3387 2707 gene
gene orf4
3387 2707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00004
product ABC transporter ATP-binding protein
4595 3384 gene
gene orf5
4595 3384 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00005
product RND family efflux transporter, MFP subunit
4746 4952 gene
gene orf6
4746 4952 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00006
product Helix-turn-helix transcriptional regulator
5010 5198 gene
gene orf7
5010 5198 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00007
product Putative membrane protein
5337 5549 gene
gene toyA
5337 5549 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00008
product Toyonsin precusor
5657 6304 gene
gene orf9
5657 6304 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00009
product ABC transporter ATP-binding protein
6349 7707 gene
gene orf10
6349 7707 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00010
product Putative membrane protein
8391 7849 gene
gene orf11
8391 7849 CDS
inference ab initio prediction:Prodigal:002006
locus_tag Toyoncin_biosynthesis_gene_cluster_00011
product MarR family transcriptional regulator
  • 文件3:描述作者信息的模板文件(.sbt)

可以在NCBI上生成该文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
Submit-block ::= {
contact {
contact {
name name {
last "xin",
first "bingyue",
middle "",
initials "",
suffix "",
title ""
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
email "xinbingyuex@163.com",
postal-code "235000"
}
}
},
cit {
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
},
affil std {
affil "Huaibei Normal University",
div "College of Life Sciences",
city "Huaibei",
sub "Anhui",
country "China",
street "Dongshan road No.100",
postal-code "235000"
}
}
},
subtype new
}
Seqdesc ::= pub {
pub {
gen {
cit "unpublished",
authors {
names std {
{
name name {
last "Xin",
first "Bingyue",
middle "",
initials "",
suffix "",
title ""
}
}
}
},
title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "ALT EMAIL:xinbingyuex@163.com"
}
}
}
Seqdesc ::= user {
type str "Submission",
data {
{
label str "AdditionalComment",
data str "Submission Title:None"
}
}
}

注意:文件1和文件2的序列描述信息必须一致,此例中均为“Toyoncin_biosynthesis_gene_cluster”。

文件生成

1
table2asn -i Toyoncin.fas -t template.sbt -V vb

-i 指定FASTA文件
-t 指定模板文件
-V
-v 生成验证文件,保存错误信息
-b 生成gbf文件
-x 文件1(FASTA文件)的后缀名,根据实际情况填写

参考

加关注

扫码关注公众号“生信之巅”。

生信之巅微信公众号 生信之巅小程序码

敬告:使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!Notice: When you use the scripts in this article, please cite the link of this webpage. Thank you!

上一篇:
Scikit-learn机器学习实战-HumanResourcesAnalytics
下一篇:
深入理解特征标准化:为何、如何及其重要性