向NCBI 提交基因簇的时候需要提供sqn格式的文件,之前我在文章《NCBI上传基因簇之tbl2asn的使用》 中介绍过如何使用tbl2asn 生成sqn
文件,遗憾的是tbl2asn
官方已经不再提供软件下载了 ,提供的新工具为table2asn ,本文介绍table2asn
的使用方法。
软件安装 下载table2asn 1 2 复制 wget https://ftp.ncbi.nlm.nih.gov/asn1-converters/by_program/table2asn/linux64.table2asn.gz
安装 解压缩文件后得到可执行程序,将其重命名为table2asn
,并将其加入环境变量即可,环境变量的设置请自行搜索。
文件准备 table2asn依赖三个文件来生成sqn文件:
文件1:fasta格式的基因组序列文件,文件后缀需要为.fsa
,如Toyoncin.fas
。
注意Header处需要添加中括号部分,及相关描述信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 复制 >Toyoncin_ biosynthesis_ gene_ cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence ttaaaa taatttaata gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa
文件2:描述基因特征的feature table文件(.tbl),文件名与FASTA文件一致,如Toyoncin.tbl
。
该文件可以用prokka对文件1进行注释而得到,但是需要自己加以修改,加上gene相关的信息,product
部分也要自己修改,该文件共5
列,各列之间用制表符分隔。Header部分的名称要与Fasta文件中的一致,但开头需要加上Feature
。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 复制 >Feature Toyoncin_ biosynthesis_ gene_ cluster 585 1 gene gene orf1 585 1 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00001 product MarR family transcriptional regulator 1476 811 gene gene orf2 1476 811 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00002 product YIP1 family membrane protein 2710 1496 gene gene orf3 2710 1496 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00003 product ABC transporter permease 3387 2707 gene gene orf4 3387 2707 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00004 product ABC transporter ATP-binding protein 4595 3384 gene gene orf5 4595 3384 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00005 product RND family efflux transporter, MFP subunit 4746 4952 gene gene orf6 4746 4952 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00006 product Helix-turn-helix transcriptional regulator 5010 5198 gene gene orf7 5010 5198 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00007 product Putative membrane protein 5337 5549 gene gene toyA 5337 5549 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00008 product Toyonsin precusor 5657 6304 gene gene orf9 5657 6304 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00009 product ABC transporter ATP-binding protein 6349 7707 gene gene orf10 6349 7707 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00010 product Putative membrane protein 8391 7849 gene gene orf11 8391 7849 CDS inference ab initio prediction:Prodigal:002006 locus_ tag Toyoncin_ biosynthesis_ gene_ cluster_ 00011 product MarR family transcriptional regulator
可以在NCBI 上生成该文件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 复制 Submit-block ::= { contact { contact { name name { last "xin", first "bingyue", middle "", initials "", suffix "", title "" }, affil std { affil "Huaibei Normal University", div "College of Life Sciences", city "Huaibei", sub "Anhui", country "China", street "Dongshan road No.100", email "xinbingyuex@163.com", postal-code "235000" } } }, cit { authors { names std { { name name { last "Xin", first "Bingyue", middle "", initials "", suffix "", title "" } } }, affil std { affil "Huaibei Normal University", div "College of Life Sciences", city "Huaibei", sub "Anhui", country "China", street "Dongshan road No.100", postal-code "235000" } } }, subtype new } Seqdesc ::= pub { pub { gen { cit "unpublished", authors { names std { { name name { last "Xin", first "Bingyue", middle "", initials "", suffix "", title "" } } } }, title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes" } } } Seqdesc ::= user { type str "Submission", data { { label str "AdditionalComment", data str "ALT EMAIL:xinbingyuex@163.com" } } } Seqdesc ::= user { type str "Submission", data { { label str "AdditionalComment", data str "Submission Title:None" } } }
注意 :文件1和文件2的序列描述信息必须一致,此例中均为“Toyoncin_biosynthesis_gene_cluster”。
文件生成 1 复制 table2asn -i Toyoncin.fas -t template.sbt -V vb
-i 指定FASTA文件 -t 指定模板文件 -V -v 生成验证文件,保存错误信息 -b 生成gbf文件 -x 文件1(FASTA文件)的后缀名,根据实际情况填写
参考
加关注 扫码关注公众号“生信之巅”。
生信之巅微信公众号
生信之巅小程序码
敬告 :使用文中脚本请引用本文网址,请尊重本人的劳动成果,谢谢!Notice : When you use the scripts in this article, please cite the link of this webpage. Thank you!