加载中...
NCBI上传基因簇之tbl2asn的使用
发表于:2021-11-12 | 分类: 生物信息
字数统计: 1.8k | 阅读时长: 10分钟 | 阅读量:

NCBI 提交基因簇的时候需要提供 sqn 格式的文件,这个文件需要通过 tbl2asn 生成。

文件准备

tbl2asn 依赖三个文件来生成 sqn 文件:

  • 文件 1:fasta 格式的基因组序列文件

Header 处的中括号部分可以不写。

>Toyoncin_biosynthesis_gene_cluster [organism=Bacillus toyonensis] [strain=XIN-YC13] [topology=linear] [moltype=DNA] [tech=wgs] [gcode=11] [country=China] Bacillus toyonensis strain XIN-YC13 Toyoncin biosynthesis gene cluster, complete sequence
ttaaaa taatttaata
gggaagtttt ttagttgttt tggactcttc ccaaacactg ctttaagtgt tggattaaca
 tcatccctat tccccgaaaa cataatgtga ggatttatga ataatgcata tgctctaaca
ttattatcat caacaccact ctctgaacga gccataatac ccttatcaat taattttcta
accaatggac taactttagt ttcatgtctt ccaatttttt tagctaattc tcgctgagtt
 aatgggattt gttctttcga attaatatca ttaactaaac aattacttaa aaaacctaca
cacattgaaa tatctactaa aaataccttc tcagcatttg ttaaataatc aatttcgaac
 aaatactgga tattttgttg aataatctga acaaacttcg ctttattttt cactttacgc
 tcaggaacta atttcattcc tcttgaacga gcttttgatt gaagtttatt tgctaaatac
 atctcttctt cagacaagac ctttaaatcc tcaatatctc tcaatcttga atttttttca
 gcttgttcta agttgataaa ctttgacata ttctttttgc tcctcttttc taagattttc
 aactagagaa ggaaaaaatt ttatgttatg attcctgtag aatttacaat tcaatatgta
 caaaagaact ccccttttct aattgatagt ttggtcgctt tcaattataa tacaagggga
 ttttttacat cttaaaattt ttcatttttg aatcaatccc tgaaaatata aagaacacat
 cacataaatt attcttaata ttttataatc gaaaaaataa taggaataaa gaaaaatact
 gcaataaata tattcatctg tttcttactc aaaccggcca ctatatttaa tcccattcct
 ataataatta attcccaaat tgaaaacact tcaaatttac tacaaattat atatagtaat
 gtacctggtt caaatattga acccaaacta gtatacgtta ctatttctcc tcctataaat
 agtgttaata atgtattaat taatttacct aaaatagaaa ttacactagc aaatattgta
 atagatacta actttttata agaaacatct ttactcatca gcatcattac aatctttaaa
 ataatccccc aaataaaagg tgtaattaaa gcaatgaaaa tcgatgcaaa acctcctaac
 atcatttggg aaacaagggg tatttccata tctgcaaata cttctttttg aattttaacc
 aattctggat tgctatgtct tgcatataca gataaaatcc ctattattgc ttgtataact
 gataaataca taagaggaaa ccatatcgga ctaattattt tcatacgctc gaattcagaa
 ataggagatg taatcataaa aattagagat ggtttttcat aattgttttt ttctttattc
 actactaaac tattatccat atattaacac cttctttttt tattcataac gtaatgcttc
 aattggatct aattttgcag ccttattggc tggaatcaat ccaaatataa taccaagcga
 catcgaaaat aatacgccac ccacaacaac ttcccatgaa acaagaggcg gccattttgc
 aaatgtggac acaatgtacg ctccacaata accaagtcca atcccaatca atccaccaag
 aagtgtcaac ataattgctt caattaaaaa ttgcaacaaa attttaccac gcgttgctcc
aagtgcttta cgtaccccaa tctcacgtgt acgctctgtt acagaaacaa gcatgatatt
cataactcca attccgccta caactaaaga aatacttgca atacctgcaa taatcattgt
cataatatta gtaactttag aaataccttt ttggatttct tctaaattta caatttcata
 tttcccttta aactcttcag attgtctatc atttaataat tttactccct tttttccagc
tgtttgtaat tgatcaaccc ctattgcttg aattgtaata gattgttgag agttatcatc
 tccatataat attggccata ttgaaagtgg tattaaaatt tctgacattc caaaaccaag
 ctcttcatct cctgaactga atagaccaat aatttgaagt ggctgacctt taatttctat
 aattttacca atgactgatt catgctcatt aggaaataac tctttcacta atgtttgatt
 aaccattatt acattattac cttgcatcaa atcatcttca ttaagagaac gacctttctc
 tattttcatt ttagtcatat taaaatattc ttttgtaata ccatttatat tagttacaac
ctttttatca tcaccaatta atgtctctgt actagagttt tgaacaatta catttttaat
 ttcttttatc ttttttaact caaaaagatc ttcttcactt acagatggtt ttttgtcatt
 catagatcct gttgttaata actcattaat atcttcttta tatgtaatcg gaatagtgtt
 attgccagaa gcggtaaatt gtgatttaag cattgcttct ccacctttac caatggctac
 aacagtaata atagaaccta caccaataat aattccaagc atcgtaagag ctgagcgcag
 tttatgagct aaaatagaag ataaggcaat ttttatacta tctaataaac tcataccata
 caccttctat cttctgtaat tttcccatct cgcaatatga tgcgacgtga agaataagct
 gctacctctt cttcatgtgt aaccataacg attgtcgtac cttctgcatt taacttcgta
 aagatatcca taacttgtgc accagacttc gtatcaagcg caccagttgg ctcatcagcc
 ataataaacg ttggattatt cgcaatcgat cttgcaatag caacacgctg cttctgtcca
 cctgacagct cactaggtaa atgatgtact ctatccgcta atccaacttt cccaagcgct
 tcgagcgctc tttgacgacg ctctgctttc ttcactccac cataaatcag tggtaattca
 acgttttcca ctgcggaaag gcgcggcaat aaattaaaat gctggaacac aaaaccgata
tattcattac gaattaaagc aagttttgac tcatctgctg ttaaaatatt cacatcattc
agcatatatt cgccttctgt tggacgatct aaacaaccga taatattcat aagagttgat
ttaccagaac cagacggtcc cataattgaa acaaattcac caccttgaat agttaaacta
ataccgtgca aaataggaac cgccattttt ccttgataat acgttttagc aatattattt
aacgtaatca tttctctttc acttccattc cgtcatatac gttgtcggaa ggatttttaa
ccaccttttg ccccactgtt gcgccctcta caatctctgt ccaatctcca tcagtagcac
cttttttcac attttgttta cgaagcttac ctttctcttc gatatataca aatgcatcat
      cgcctttttc aacaatactc ttacttggaa cagcaatcat tcttttattc tctaaattta
      cttgtaacga aacatgataa cctggagata aaccatcttg actatcaaga cttgctttat
      atgtatattg agacatattt tgagtcactt cccccatgcc atcagcttga gccatttcta
      cacttgttgg gaactcactt acctctgtaa tcttccctgt ccactttttc ttactatttg
      ctttcgcagt tacagtaaac gtttgatcct tttgaatttg cgacttctga agctcagtta
      atgttccttg aatttggaat ggatctttag aagcaacttg taaaaaggct ttcccttgac
      cacctaacgc ttgtgatgaa ctttgtgctg catctttatc taacttttga acaacaccag
      caaaattgct ataaatcgta agttcgttct gctttttatt taactcttct ttttgtaact
      tccctttctc tttctcaagg tctgttgtct tttgcgctat ttctaattca cttacttgct
      cttccatcgg atctattact tctttcccag ctccgctatc tttcgccttc ttaatttctt
      tcttcaacga atcaatcttc tttttccctt ggtcataacg catatctgcc atcttttgat
      caagcacagc ttgcttcatt tgcaaattaa tttcttcatt atcgtaagaa aacaatttcg
      ttcccttttc tatttcttgt ccttctttca cttcaatatc tttcactttt cctttagtca
      gatccgcgta gaaactttca atattccccg gcttcacttg accagaaatt aactttgtat
      tattaagatt gcgctctgtg actttttcaa aactaacagt atctattttt gttaccgctt
      tcttcttact ttgcactacg aaaatattaa taaatgtaac aataacaatt aacgcaataa
      ctccaataat agctcctttc tttttatttt taaaaataaa aagttctttt ttaatcacaa
      caatcttctc cttattcata tctaaaattt aaacttttaa attttacata aaaatttaaa
      acttctaaaa tataacatgt ataatttacc atagatgatt tattttgtat aatataaaaa
      tatctatata aataatgcta attttcaaac aatggggtgg aagatactaa tgttagaaaa
      aaaagataga ctaacagaaa tagaggaaca aattatatac ttaatttcaa aggaattagg
      aaataaagaa atagcggaaa aattaaatta ttcacaacgt agcatcggtt acaaaataaa
      taatattttt aaaaaattaa atgttaattc aagaatcgga ctgattatag aagctgtaaa
      aaaaaatata atttaaatat aagaatgctt tcatgttaat attttataga aactaaatat
      agaggtgatt aaaatgcaaa aattttttga agctattagt gctataggta tagtaggtta
      ctttttaggt aaattcacaa gtattccttt aatagacaaa tatacattgt atttcggcgt
      aatgttgatg attggggtta ttggaagatt tattataaaa gtaattaact cagaagaaga
      gacacatgat tcaaacaaat aaaatactct aataaaaatg gaagaagatt gcacttaagt
      gcaatcttct tccattttta ttgaaaattg attaaataat gttaatattg caattgtgtg
      gtgcagatta gggtgattat gtaatagggg gaaattaaaa atgatcaata cagcttggaa
      aattattaaa gcactacaaa aatacggtac aaaagcatac aatgttatca aaaaaggcgg
      ccaagcaatg tacgacagct tcatggcagc taaagctaaa ggttggacac atgcagcttg
      gtggctagta gaacatggtt caactttagg aacattctat gatttattaa aagctgctgg
      attaatcgac taattacagc aactaaacaa ctaaacaact aaacaactta aaaatacaaa
      ttaccctaaa ctgtacccct attacatatt aactaattat tttaaaggtt ggatgataat
      atgtcaaata acatcatatc tgtaaaaaat ttaattaaaa gcttcgataa caaaatagta
      ttagataaat taaatttcga aatgaaagaa aactccactg ttgtaataat aggtaaaaac
      ggtgcaggta aaagtgtctt tctaaattgt ttacttggat ttattcatta caaccaaggt
      tcaatactaa tagatggaca acctgtagaa aatcgattac atctccgcaa gattacatcg
      ttaatttctt cagaccatca agaacatcta aatttattaa cccccaatga atatttttct
      tttttacaag atatttacca actaaaaagt aataataaag acaaaattca aaattactca
     gaagatctat atgttactaa agaactcaat actgtatttt catcactttc ttttggaaca
     aaaaagaaaa tacaattaat tggtagccta ttatattctc ctaaattatt gatttgcgac
     gaaatatttg aagggcttga tacagactca gtaaaatggg ttaaaaactt atttcaacaa
     agaaaacaag aaaatctttc tactttattt acaactcata ttactgaaca tataacagat
     ataacagaaa aaaattacat acttgaaaat ggaaaattaa ttgtgtaagt ttaaccactt
     atatttaaag ctaaaattaa ggagcttaaa atatgaattt taatatatat aagagactat
     atgataaatc aacagaagaa aaaagcaaaa caataacaaa acaaatatta tttggaatta
     taaatagttc tatattaata ggtatactac tcacatgttt ggagattttc aactttaaaa
     tttcaactgt aatgtatggt tatttcacta tatatataat actagaactt ttactattat
     tctctgcaaa tcaactatat gaaagtacag aattcataat aaaattcctt aaatatacac
     caataaccat aaataaacta tatttctcac attttctaag ttctaaatat tcattttcca
     atctttttga aataataact ctcacatcaa ttttattaat atataatgtc gatatcttat
     attcatttat tttcataatt agcttacaaa ttattagctt aataagaaca tatttagaat
     ttttactatt atattctcaa aaaaaacagg ttaaaatttt tactctaacc cattttgttt
     tcataatatc tatggttttt tatattattg ttaaaacaaa atcgatagat ttagtattct
     ttgaaaacac aaatatgtta attatatctg ttcttctcat aacattcttg atatcacttt
     taacatataa acatattata gaatacttaa tgaaaaataa tgaaattgta tataatgcta
     tttttatcaa gttaactttt aacacagcta atttaattag taaattattt aaatttaata
     catcaattgc atctttaata aaaatacata taatacgatt attacgtaat caagactata
     taagtagatt actaaaaata ggaatattac tatttatttt ttcttctata agctttctat
     ttttcgataa atcatcaaca aacaatgaaa tgagtgatat actttacttt tcatttttta
     tttccttatt tagtttttct aacatacgat tagactataa cttagtttct aaattaagct
     tagaggatta tccaataaca aaattacaat caagattaag cattgatata gcacatggaa
     ttttactatt tatactatct ttatttcttt tattaacaca atacttattg aatccaacaa
     atattctaac tctaattgat ggtttattat catttatttg tttttatttt ctaagtcttg
     gtatagaaaa agcagatatt ataataacac caaaaacaaa atggaaaatg tatccattat
     tttttgtgat gggattaata attgaagcaa tatttctatt aaaattcaaa atatggataa
     aattaataac tttattcctt tgtatactgt ggtcatattt acgtgtttat tggaaattaa
     aaaaacaata aacacaatta aaaagttccc ttcatatttt ttgaagggaa cttttatttt
     aaacaaaaat tacaaacaag caaagttatt taaaagtaaa cttttaaaat tattgaatta
     ataacaatta gtctaagata tatcagccaa atttaatttt taaacaaacc gaaaaaccct
     ttccgttttt gtttctgatt ttggctctgt atttctctaa tgttttcaag caataactga
     tctcgttttt caaatttttt ctctataaaa acctctaatt caatattttt atcttctact
     tcctttaatt ttctctccgt attagccaaa tgttcttttg tggtaactaa ttcattcgta
     atctcttgta atttttgaac aagcgtttga ttgaactgat tttgtaattg ttgattttct
     aatacttcat ccaacttctt ttctaattcc gatttttcct ctcttgaaac aaacaaatca
     agttctccat tccgccatgc tcgaacttga tcataagacc atttttgcac ctgtattttt
     tctttaattg ctatcaattc ttctatattt tcctttgagt acctacgatg ccctccctga
     cttcgctccg tttgtatatt aaattcgttt gaccatgctt ttaacaagtc aggggtaatc
     cctaaacgat ccgcaacaat tttcggtgta tacatttctg attttaattc caa
  • 文件 2:描述基因特征的 feature table 文件(.tbl)

该文件可以用 prokka 对文件 1 进行注释而得到,但是需要自己加以修改,加上文件前几行以及 gene 相关的信息,各列之间用制表符分隔。

>Feature Toyoncin_biosynthesis_gene_cluster
1	8409	source
			organism	Bacillus toyonensis
			mol_type	genomic DNA
			strain	XIN-YC13
585	1	gene
			gene	orf1
585	1	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00001
			product	MarR family transcriptional regulator
1476	811	gene
			gene	orf2
1476	811	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00002
			product	YIP1 family membrane protein
2710	1496	gene
			gene	orf3
2710	1496	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00003
			product	ABC transporter permease
3387	2707	gene
			gene	orf4
3387	2707	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00004
			product	ABC transporter ATP-binding protein
4595	3384	gene
			gene	orf5
4595	3384	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00005
			product	RND family efflux transporter, MFP subunit
4746	4952	gene
			gene	orf6
4746	4952	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00006
			product	Helix-turn-helix transcriptional regulator
5010	5198	gene
			gene	orf7
5010	5198	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00007
			product	Putative membrane protein
5337	5549	gene
			gene	toyA
5337	5549	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00008
			product	Toyonsin precusor
5657	6304	gene
			gene	orf9
5657	6304	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00009
			product	ABC transporter ATP-binding protein
6349	7707	gene
			gene	orf10
6349	7707	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00010
			product	Putative membrane protein
8391	7849	gene
			gene	orf11
8391	7849	CDS
			inference	ab initio prediction:Prodigal:002006
			locus_tag	Toyoncin_biosynthesis_gene_cluster_00011
			product	MarR family transcriptional regulator
  • 文件 3:描述作者信息的模板文件(.sbt)

可以在 NCBI 上生成该文件。

Submit-block ::= {
  contact {
    contact {
      name name {
        last "xin",
        first "bingyue",
        middle "",
        initials "",
        suffix "",
        title ""
      },
      affil std {
        affil "Huaibei Normal University",
        div "College of Life Sciences",
        city "Huaibei",
        sub "Anhui",
        country "China",
        street "Dongshan road No.100",
        email "xinbingyuex@163.com",
        postal-code "235000"
      }
    }
  },
  cit {
    authors {
      names std {
        {
          name name {
            last "Xin",
            first "Bingyue",
            middle "",
            initials "",
            suffix "",
            title ""
          }
        }
      },
      affil std {
        affil "Huaibei Normal University",
        div "College of Life Sciences",
        city "Huaibei",
        sub "Anhui",
        country "China",
        street "Dongshan road No.100",
        postal-code "235000"
      }
    }
  },
  subtype new
}
Seqdesc ::= pub {
  pub {
    gen {
      cit "unpublished",
      authors {
        names std {
          {
            name name {
              last "Xin",
              first "Bingyue",
              middle "",
              initials "",
              suffix "",
              title ""
            }
          }
        }
      },
      title "Purification and characterization of a novel leaderless bacteriocin, toyoncin, produced by Bacillus toyonensis XIN-YC13 that specifically active against Bacilus cereus and Listeria monocytogenes"
    }
  }
}
Seqdesc ::= user {
  type str "Submission",
  data {
    {
      label str "AdditionalComment",
      data str "ALT EMAIL:xinbingyuex@163.com"
    }
  }
}
Seqdesc ::= user {
  type str "Submission",
  data {
    {
      label str "AdditionalComment",
      data str "Submission Title:None"
    }
  }
}

注意:文件 1 和文件 2 的序列描述信息必须一致,此例中均为 “Toyoncin_biosynthesis_gene_cluster”。

文件生成

tbl2asn -t template.sbt -p ./ -V vb -x .fna

-t 模板文件
- p 输入文件所在路径
- V
-v 生成验证文件,保存错误信息
- b 生成 gbf 文件
- x 文件 1(FASTA 文件)的后缀名,根据实际情况填写

注意:如果用 Prokka 带的 tbl2asn,生成的 sqn 和 gbf 文件中的日期通常是 1-JAN-2019,需要自己手动改正为当前时间,这是因为 Prokka 里的 tbl2asn 是经过修改的。建议使用官方版的 tbl2asn,可避免日期错误。

参考

上一篇:
利用NCycDB数据库从宏基因组中预测氮循环基因
下一篇:
R语言安装依赖包错误集锦
本文目录
本文目录