BtToxin_Digger is the latest high-throughput Bacillus thuringiensis virulence factor mining pipeline on the whole network, including the basic web version and the local advanced version.
What is BtToxin_Digger?
BtToxin_Digger is a high-throughput, automatic gene mining tool that can mine toxin genes, such as Cry, Cyt and Vip, etc, from Bacillus thuringiensis. The pipeline accepts multiple forms of input data including Reads, assembled genomes, CDSs, and protein sequences and can output rich and useful results. It is derived from the re-design of the tool BtToxin_Digger we developed previously. Compared with BtToxin_Digger, BtToxin_Digger has many improvements, as follows:
Can be run in batches, suitable for large-scale genome analysis.
Added genome assembly functions, including second-generation short-reads assembly, third-generation long-reads assembly, and hybrid assembly of short-reads and long-reads, to realize the full-automatic mining of genes from Reads to virulence factors. The previous three input files (assembled genomes, ORFs and protein sequences) are still supported, and genome assembly can be used independently.
Fixed a bug where BtToxin_Digger often reported errors when processing assembled genomes.
Added support for CDSs and not limited to ORFs.
The database has been updated, adding support for App, Gpp, Mcf, Mpf, Mpp, Mtx, Pra, Prb, Spp, Tpp, Cyt, Vip, Vpa, Vpb, Xpp and other virulence factors.
BtToxin_Digger generates comprehensive and readable outputs including toxin list and sequence for each input; a matrix of all strains and the virulence factors it contains (behavior strain names, listed as virulence factor names), which can be used as virulence factors contained in the strain Database; and a file writes the information and sequences of all toxins (Table 1) to facilitate centralized processing and downstream analysis and experiment designs.
Added multi-thread support, greatly improving the running speed of the pipeline.
Installation
The BtToxin_Digger codes can be downloaded from GitHub.
- Install with Bioconda - OSX/Linux/WSL
1 | conda create -n toxin python=3 |
- Install with the docker container
1 | docker pull quay.io/biocontainers/bttoxin_digger:<tag> |
(See bttoxin_digger/tags for valid values for
Usage
1 | BtToxin_Digger [Options] |
Options:
1 | [--help] Print the help message and exit |
Examples
- Processing Illumina paired-end Reads
1 | BtToxin_Digger --SeqPath <Illumina Reads PATH> --SequenceType reads --platform illumina --reads1 <suffix name of reads 1> -reads2 <suffix name of reads 2> --threads <INT> --suffix_len <INT> |
- Processing PacBio long Reads
1 | BtToxin_Digger --SeqPath <PacBio Reads PATH> --SequenceType reads --platform pacbio --reads1 <suffix name of PacBio reads> --threads <INT> --suffix_len <INT> |
- Processing Oxford long Reads
1 | BtToxin_Digger --SeqPath <Oxford Reads PATH> --SequenceType reads --platform oxford --reads1 <suffix name of Oxford reads> --threads <INT> --suffix_len <INT> |
- Processing Hybrid Reads (Long reads + illumina short reads)
1 | BtToxin_Digger --SeqPath <Reads PATH> --SequenceType reads --platform hybrid --short1 <short reads 1> --short2 <short reads 2> --long <long reads> --threads <INT> |
- Processing Assembled genomes
1 | BtToxin_Digger --SeqPath <Assembled genome PATH> --SequenceType nucl --Scaf_suffix <suffix of genomes> --threads <INT> |
- Processing Protein sequences
1 | BtToxin_Digger --SeqPath <Protein file PATH> --SequenceType prot --prot_suffix <suffix of protein files> --threads <INT> |
- Processing Coding sequences
1 | BtToxin_Digger --SeqPath <CDSs file PATH> --SequenceType orfs --orfs_suffix <suffix of orfs files> --threads <INT> |
Outputs
Results/Assembles/*: Genome assembly results;
Results/Toxins/*.list: Toxin list of each strain;
Results/Toxins/*.gbk: Toxin sequences in Genbank format of each strain;
Results/Toxins/Btallgenes.table: A matrix describes Strains vs. Toxins;
Results/Toxins/All_Toxins.txt: A table containing all information and sequences of all toxin genes. See table 1 for details.
Contents of *.list:
1 | Overview of prediction Sequence type: nucl |
Contents of *.gbk (Partial):
1 | LOCUS NHPK02000017.1_00027 77664 bp dna linear UNK |
Table 1. Description of “All_Toxins.txt”
Header | Description |
---|---|
Strain | The name of your input |
Protein_id | The protein ID |
Protein_len | The length of protein sequence |
Strand | Positive or negative strand where the gene comes from |
Gene location on scaffold | Gene coordinates on the genome |
SVM | Is the protein predicted by SVM |
BLAST | Is the protein predicted by BLAST |
HMM | Is the protein predicted by HMM |
Hit_id | The subject sequence ID |
Hit_length | The length of subject sequence |
Aln_length | alignment length (sequence overlap) |
Query start-end | Start and end of alignment in query |
Hit stard-end | Start and end of alignment in subject |
Identity | Percentage of identical matches |
Evalue of blast | Expect value of BLAST |
Hmm hit | The subject model ID |
Hmm hit length | The length of subject model sequence |
Evalue of Hmm | Expect value of HMM |
Nomenclature | Bt nomenclature containing 4 Ranks |
Endotoxin_N | Whether the Cry protein contain Endotoxin_N domain |
Endotoxin_M | Whether the Cry protein contain Endotoxin_M domain |
Endotoxin_C | Whether the Cry protein contain Endotoxin_C domain |
Endotoxin_mid | Whether the Cry protein contain Endotoxin_mid domain |
Toxin_10 | Whether the Cry protein contain Toxin_10 domain |
ETX_MTX2 | Whether the Cry protein contain ETX_MTX2 domain |
Gene sequence | The nucleotide sequence of the toxin |
Protein sequence | Amino acid sequence of the toxin |
Scaffold sequence | The scaffold sequence where the toxin gene is located |
Table 2. The contents of “All_Toxins.txt” (Partial)
Strain | Protein_id | Protein_len | Strand | Gene location on scaffold | SVM | BLAST | HMM | Hit_id | Hit_length | Aln_length | Query start-end | Hit stard-end | Identity | Evalue of blast | Hmm hit | Hmm hit length | Evalue of Hmm | Nomenclature | Endotoxin_N | Endotoxin_M | Endotoxin_C | Endotoxin_mid | Toxin_10 | ETX_MTX2 | Gene sequence | Protein sequence | Scaffold sequence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1126_1 | NHPK02000017.1_00027 | 329 | -3 | 10001-10987 | NO | YES | NA | Zwa5B-other | 322 | 322 | 8-329 | 1-322 | 99.0683229813665 | 0.0 | NA | NA | NA | Rank4 | NO | NO | NO | NO | NO | NO | GTCTTCTAATAAAACACCTATTTCCTTTTCCCTAGCCCTTTGATAGTAATAAGCAGCAATGGCTACATCAAAAATGGCCATACCCATTGGATTAAATAATATTGCTTTGTCATAAGGGAAGTTTTGCAGTGCTCCTCGGATTACAACGTCAGCAATTGATACTGTATCTTCTTTTTGTAAACCTCTTTCAAGGTGCATCCTTTCAACATCAGTTTTTTCACGACAGACTTCTTCCCAATTATCAACAACTAGGGATTTTGTATATTCTAAGATATCAGGCTTAAAATCACGAAGAGAAACATTGAGTAGTAGCGCTCCATCTTTAGGTTTCTTATCAATATAACCTTCTGCAGAAACAGTACAAGTAATAAAAATATCAGCATCATTATAGGCTTCTTCATAAGCATTAACCTTTTGAGTTTTAGAATAAATATCCTCAGGAATTAATTCGTCTTTAATTCCATTGATATCATATAGGTAAACACTTTCGATTTTATCGCCTAATAGGGCAGTTACCATCTTTAAATGCATCTGACCAATAGGTCCAAATCCAATAATTCCAACTTTAACATTATTTAAATCTCTTAACTTGGCAAACTCTCGAATCATTAACCCTGTTACAGAAGCTGTTCGTATTACACTGACCATTGCTGTATTAAGCGTCGCAAATGGTTTTCCAGTCATAGCATCATTTAATATAGTAACTGAATGAGCACGTTGAATACCTTTCTCAATATTCTTCGGAAAACTAGCTATCCATTTGATTCCTGAAACCTTAAATTCTCCTCCAATATAAGCTGGCATAGCAATAATTCTATTAGCCGGATCATCAAAACGTAAGTATGGCTTAATTGGCTGAGAAAAATCTTCTGCATCAAGACTTTGTACTGCCTTTGAAATGACATTTATTGTTTCTTCCCAATTAACACCCATTTCATTGATATGTTTAGTATTTAAATACATCATAATTGCTTCCTCATTTCTATA | YRNEEAIMMYLNTKHINEMGVNWEETINVISKAVQSLDAEDFSQPIKPYLRFDDPANRIIAMPAYIGGEFKVSGIKWIASFPKNIEKGIQRAHSVTILNDAMTGKPFATLNTAMVSVIRTASVTGLMIREFAKLRDLNNVKVGIIGFGPIGQMHLKMVTALLGDKIESVYLYDINGIKDELIPEDIYSKTQKVNAYEEAYNDADIFITCTVSAEGYIDKKPKDGALLLNVSLRDFKPDILEYTKSLVVDNWEEVCREKTDVERMHLERGLQKEDTVSIADVVIRGALQNFPYDKAILFNPMGMAIFDVAIAAYYYQRAREKEIGVLLED | TCTCAATACGAACCAGTAAAAATGGTTCAGGGCGTTGAAGATACTGGACGTAATTCTTTTAGCCAAGCTAGTTTTTCAAACGTACTTCTAAGAACAGATACATCTGGAGCTACCTACAAAAGTTGGAACGGTTCAATTTCTTCCTCATTTTCTAAACACGGTACTTATGGTAACAGTTTCACAGTTTATTCTCAAGTGCCATTAAGCGCAAGTTTACGCTAAAATTGTTCTTGTATACGAATAATTTTGTGAAAAAGGTAGCTGTACTTTTGTACAGCTACCTTTTTCATATGGTGAATTTTTAATAGATATTGTATCAGACATAAGTTACATTCTTCATATTTTAACTTCTTTGAATTCAATAATACTTAATCAGCTTGTCTTATTACTTCACTTAACGTATCGAGATACAGCTTATAGCAACTACAACACGCTTTATTATTAATAATACCACTCTTCTTAATTTAACTTTAATTCCATTGCATTGATTTTTTAATAATAAATATACAACTTAGTAATGATAATAGATATACAATAAGAATAATATACAACTGTTTTGGTTCAATGCTATCGATATAAAGACTCATAATTGGATATGACCAAGGAAACAGTATCCAATAGTTTGTATGCGAGATAAAAAAGCTACCCAACCCGCCTATAACCCCTACTAATAATGAAAGTAACGGGATTTTATAAAATATTATATTTATTAAAAAATGTAAATTCACAATAGTTAAAGAGCTAATCACCATAAAAAGTATAGAAAAAATTAATCTTTGATAAACAGTCCCAAGATTATCAACCGTTAATACTATAGGTATACTAGTTATGACTAAAGCACATATTAAATAACCAAAAGTAATTATTAAGCAAAACAAAAACTTATTACTTAATATAGTTTTTAAAGTGATTCCGTGATAAATAGCAGACATATATCCTTTTCTTCTGAACTCGTTTACAAAAGAAAATGACACTAATAGACCTATTGCAATAGGTAAAAATAATCTACAAAAAAGACTTAGAATCGCTATATGCATGAAAACGCCATCTTTTGATTGAGATGATACAGTTAGAAAGATTTGAAAACTAATTGGTATCAACAGAAATACTAGTAACAGGATATACAATCCATTATGAAGACTTTTTTTATATTCACCTAATATACCCACTTATTTCACCCTTTATTATTCATATGTATAATATTTAAAACTCACAGCTAATAACATAATTCCTAAAATATAGATTGAAGAACTTGCAAGATATATTAAAATAGTTGTTTCTTTCATCATTACATCACTTACTATATTCAATGCATAAAATATTGGAAATGTCTTTGTGAATACTACAGGAAATAATCCATAAGTTACGCTAAAAAAAATCCAAATAACTGACAAGGACGCCGCAAGAATTCCCTTTTTTATAATCAAATGAAAAAGTAATTGAGAATATGGTATTGCTAAAGATAGAATTGTACAAAGCATCAACATTTTCAAAAAATAAAGTATAGATATTTCATATCCATTTATAAAAATATAGATACCCATAACTGAAAAATTTAAAATATTAAAGATTAAATAAAACAGTATAAACAATAAATTTTTAGCTACCAATAATTTAAATTTTGAAATTGGCTTTGTATAGATAATGGCCCAAGTACCATATTTATTTTCAGAAGAAATGATAAAATACCACATACAAGCAGAAAAAATGCTTAATGTTAGTGTATTAAAGTTTACTGTAATTGCTATAAGATTGGTAGATTGAAACATATCGGGATTATTCTTGTTAAAAATAAAATAACTTATACACAAAAAGCTTGATAATAATGAAAAAAACAAAGAAAATAACAATAATTTTCTCCCACCAATTTTCCTCATCTCACTATTCACACTTTTAAAGATAAATTGCATTATGATAGTAACTCCTTTTGTTTAGTTAAAGATAAAAATACTTCCTCTAATGTTAGCTTTTTAGGGCTAACTTCAAAAATATCTATATCATTTTCAATTAATATTCTGGTTAAATTAGGAACACTATTTTGATCTATCATTAGTTTGAATACCCCTTCTTTTTCTATGCTATAAGGTATATCCTTTGTATTTAAAATCCTTTCTGTGGCAGTAGTGTCATTAACATGAACAATATATTCTCTAGAAGTAATAATACTGTTTAAATTCCCTTCATAGAGTAAGTTACCCTTATGCAGTAATGCTATGTAATCTGCTATCATTTCGATTTCAGCTAAATTATGACTAGAAATAAATATTGTCTTCCCCTCATTTTTTACTAAATGAAGTAATAAGCGCCTTATCTCGTGTACTCCCTCTGGGTCTAATCCATTTGTCGGCTCATCAAGAATTAGTATTTTAGGATCATTAAGAATAGCAAAGGAAATACCTAGCCTTTGTTTCATTCCTAAAGAAAAATCTTTCACTTTTTTATCTTTTGCTTTATATATTCCTGTAATTTTTAATACTCTGTCTATTTCATCTAAAGGCTTATTAATCATTTTTTGTACATAGGCTAAATTTTCATATGCTGTCAAATTAGGATAGTAGGTAGGTGAATCAATAAAACAACCCACATTTTCAAATAATTCTTCTTTCCAATCTTTGATATCTTTATTATTAAATAATATCGTTCCTTTATCTAATGGAATTAATCCTAATAGCATTCGCATTGTTGTTGTTTTTCCTGCACCATTTGGTCCAATAAAAGCATATAAACTTCCTTCAGGAACAGATAAGTTTAAATCCCTTACAATGGATTGATTGTCAAAAGACTTTGTTAAATTTTTAGTTTGGATTGCTAATTTCATTTTGTTCCTCACTTTCTTATTCTCAAATAAAAGTAAGTATCTATTAAATTCTTGATCTATTAGATATTCTATTTACCCTTTACCTAATGTTTGCAATTTGGATTTAGAAAATATTTCTCGTTGTTTTAATTTTTTATCAGTAAGTTCATAAACTCTATCTACACGCTCTAAAATATTTTCATCATGAGTGATAATAATCTTAGTGGATGGTATGGCAAATATATTTTCAATAACAGCATTTGCCGTTTCCCTATCTAAATGGTTAGTTGGTTCATCCAAAATAAGAATCGGATATTCTTTCATGAACAATCGTAACATAGCCAACCTTTGTCTTTGGCCACCGGAGAAATTTGTTCCTCCCTCTGAGATTAAAATATCATTTATACTTTGAAATTGAATATAATTACTCAGATTCAATTTATCAATAGACTCCTGGATTTTTGAAGGATCATAATTTATATCAAAGTAAGATAAATTCTCAGATAAACTTCCTCTAAATAAAACTCCATCCTGAGTCATTAGTCCTACATTTTTTCTTAGTTCCTCACTGTTCCAATCTTTAATATCTCTATCGTCTATTAATATATTACCCTTATCGGGAGTATGCAATTTTAATAATAAGTTAGCTAAAGTGGACTTACCACTGCCACTTTCTCCAACAATAGCAATACTTTCGTTATTTTTAATGTTCAAACTTAAGTTTTCTATTATTTTTTTGTCATCAAAACTAAACGAAATATTATTAAATTGAATCGATCCTCTAAGAATGTTTACTTTAGTATCCGATCCATCTTTTTCTAGTTCCTCTTCCAAGATATCTAAAGTTCTTAATAAAAGTGGTTTAACTGAATTCCAATTTAAAATTGATCCTATAATTGTTGAAACCGGAGAAAATATAATACTAGATATAGCAATAAAGAAGAAAATATTACTTAAATCCCCTTCTCCTTTACTAAAATTATAAAATCCTACAGCAAGCATTATGATAATAAATATTGTTGATAGGGAACTATTAAATGATTCTAAAGCATTTTGTTTTAGCGCTCGTTTATGTACTGTATCTAAGTATTTATTATAATCTTTTTGCCATACAGATTTAATTCTAGAAAAAATCCCTGTAGCCTTTACATAATACATCGCTTTTAATGATTCATTAATTTTCCCTCTATGTTCTCCTAGGTAATACGTTTCGACTAAATTAGCATTATATATGAATTTCCCTAAGGAAATATTGATAAATAGGGAACTAGCAATAAAAAACATCAATATTAAAGTATAAAAAGTCTCAGTATATAATAGATACATCAATAATGGAATGATAACTACTATTGATATAATCATTTCAATTAGATCCTCTAAAATATATCCTCGTATTGATTCTAGGCTCATGATTCTTACATTTAAATCCCCTGCATGACGTGTAGTAATTTTTTCTATGGGGGTATTAAACATATGCTCAAAAACTTTTGTTGTACTTTTTCTATCAATTTCTTTCTGAAAGTTTACTTTTATAATAGAAGTTATAGATTTTATCCCTATAACAACGATTAAACTTCCCAAAATAATTAAAAGGTATCTTACATCCACTTCCTTTGAAGCATTCAAAATATCCATAAACTTTTTCAAAATAAAGGGGAAAGCAAGAATTGAGATTTGTGCTAACAATGCTAATAGTGCTAATAAAACAATCTGGCTAGGAGATACAAAATGATCAATATATCTAAGTAATATATGTTTTTCACGCCTTTTAGTATTTGGCTGTATATCTTTTTTGTTTTCTATTACTAGGATATAGTCACAGAACATTTCCTCAAAACTTCTTATATCAATTATCATCGGCCCTGCCTTAGGATCTACAATATAAAATTTATTTGCTTTAACCCTTTCAATAATTACAAAGTGATTAAAGCCCCAAAATGCTATCATCGGCTTTTGTATCCTTTGTAAATCAGTTATATTTTCTATCCTATATGCAGTTGCAGAAAGTCCATACGATTTACTAACTTTTTTAATATCCAACAAACTCCATGCCGATTTTTTATTGATAATTTTACTATTTCTAATTTCCGAAGCTTGTACTGCTGAACCATAATAATGCATAATCATTGCTAGACAACTAGGCCCACAATCAAAAGATGTCATTTGCGGTATAAATTTAATTTTCTTCCTTAACATATTTATCACCTATTCTAGGCATTAAAGTATAAAATGCTATTTAGTTTATTATCCTCTAACCTTTGTAAAAAATACAAAATACTTCCTAGCCCTGTAAATAATGCGATTTTATGAGGATTAAAATCTTCTATATCAAAATTATTCAAAAAGTTATTTGTTACTTTCATAACAATTTGTTTTTCATTGCGCTCGAGCATACCTCTATTTTCTAGCTCCAATAAAAAGTCTAGTTTCCCCATTAAACCATGACATAAACAATAATCACTATTTTCATGTAATGTACTTAAGATTTGTTTTTTAAAAAAAATTAAATCTGATTGTATATCCTCATTTTGATACGATTCTAAAATTTTAATTCTCGAGATTCCCATTCCACTAAATCCTTTACACCAGCTATCTGTATACTCATATTCACTCATACTCATTTTCTCACTTTTATGTATACCATGTATTACTCTATCGTACTTATTTGTTTTTAATATGCCTTTTAACTTATAAAATGCGAGAAGTTGTCCTGATAATCCATGAACAAACCCCTTTAATGAATCCTCTAAATTTAGAGTATCGTGAGCTTTCCAATATACAAATTCATTATGTGTTTGCATATTATTGACAATGTAGTCACCGAGATTTTCAGCCACTTTAAGCAATATTTTATAATTAGGATTAATTTGATAGAACTTAACTGCCGTTAAAATTAGACCAGCTGAACCCCCTAATATATCAAAATGCTTATCGCTATATTTACCTTTTTCAATATCAGAGTTGACATCCATAAATATTTGCAATATCAATCTTTCATCGATAGATGTTATATGTGGTAAATTTGTTAAATAGTAAGAAATAATTGCATTAACACCATTTACAAAACCATAATTATTTTTATTTTTGTTTAATATTTCTTGCAGGGCTTTTTGTTCAAGTTTATTTATAATTTCAATCTGATTAGGATCAGCATTATTTCTCAATAATGAACTTATTCCCAATAATCCATCATATATTCCACTATTCAAATGGTTTATCTCAAGCTCCCCTTGCCAATTTGTTTTTAATCCAATAAAATCTACAGCACCGTCTTTTGATATATAACTGCCATCAAGAATTTTCCCTGTAATTGCTTCCTTTATTTTATTTGTAGGTAAATTAGTTCGAAAATTAATTGTTTCCTTTACTGTATTCAGCTTATATAGATTGTCATTTTGATTAGAAAGGGATATTTCTATAATTTTTCTTTGTAATAAAAAGTCAGCATATGAAATTTTTTTAAATTTTGTTTCTAGATCTATTTCATTATTTGATAAACTTTCGCAAGACTCCCATTCATACTTATTAAAGAAATATGGAACATCATAGGATAATAGTGCCTCTATCTCTTTTTGAATAATATCTTTCCTAGTAGCCAAGTATATGTTTTTTTCAAGAATATTCCGCAAATATTTTATAGTCTTTAATTTATTCTCTAATAAATCCGGAACATTTAGTCTATCTAATAAAAGGGTATAGACTTTTGTATTTCGATATACTTTTCGATAATTCCATTTAGAAACACTATTTTTTAATATACACATATACTCCTCTTTGTTTTTTAAAACTGAGTTATATGCATTCTCAAAACCTTTAATAATATCATCTATATAATTATCATACTCGTATATATCATCGTTAAGAAACGGTAAATTCTGAACCGTATCTTCTACTATTGTCTCTTCTCTTATTTCAATTGGATTCGATGTAAATTCATTTTTTATTTTAATTGTTTTTACCAAAGACTTCATTACTCTCCCAATTGCACTGTAATCTCTTATTACCTCCCTACCTCCACTAAACCTTATAGGTAGCATTCTAGAATTTAATACTGAATTAGATAATATAAAACTACTGGTTTCTTCAGTAACACTTGGATTTAATGTACTTCCTAATGTCTCAAAATCAATTATGACTGGTGAGGATTGTTTAGCTATGATATTTTCATTATGTATGTCACTTCCATTAATCAAATACATTACCGAAAGCACTACACCTAAATTATAATAATAATCTTTAATTTGCTCATTAGATTCACATGATTCATGATTAATGTATTCCATATATCCATAATTTCTTTTATTAACTACTTGCGGAATAAAGATATCCGTATTTAAATCTTTATTCAGTTGATTAATTAAACTATGATATAGTATAGAATTTTTCAAATCGACTGGTTTTAAAATTACCTTATTATTAAATTTATCTTCTAAAAGAATTGTTCTCTTTCCATTATTATGGTAGTCACCTAGGTTAATAACTTTTTCTACATGCATAATATTAAAACCCGACATTTTATGAATTTCTTTTTGATCTTTTTCAAGAGCACCTTGCAACCAAATCATATAATTTATTTCATTTTCAATGATTGTAAAAATTTTTTCGTTTAATACGGGTAATATTTTAAAAAAATTCATATAAAATTTTTTAAAATGAAAGAAGTTCTTTGTAAAATACTTATATTCTTCATTGGTATCCATTCCTCTTAAAGCATTTTCTACCCTAAGTTTATTTATGTATACAATTAAGGATGGCCTTAATACTTGGTATAGTCTTTTTTCTAAATCAAACATTATCGATTCAGAAACCTTCTCAAAATTACCAAAAAAATCAATTTTATTTATTATCATATTTTTAAAGAATACTACATATGTTTGTATTATTTTTTGAAAGTTATAGTTTTCGTCATCCTTAATGACATCATTACATGAAAAACCCTCTAAAAAGATAACAAAATTTTTTTCTAGATATTCTTTTGTTACTAAAGAATGTGATCCATTATCTATTGTATTCAAAATTCCATGTGAATTATTTTTCATAATAACCTCTTCTTTTTATAGGTAGGAGAAACAGTTAGAAATTCATTCTCATTAAAGAGAGTAAGGCCTTTCGCCTTACTCTCTTTAACTATTAGTAGCATTTTTTACAAATTCTTTGTCCCATAATCGTGACACATGCTACAGGCACATTCCAAGGACAATCTTTAGTTAAAGTTTCAACCCAACCAGGTCCTGCTCCACCAACTAATTGTTCAATCTCTTCATCTTCTACGACTTGTAAATATTTTTCAGTTTCCATTTCAATACCTACCTTTTATTTTTTTGAAAATGATGTTATAAATTCAATAATTTACCACAACATTATCTAAAAACAATATACCATTAAATGTTAATAAAAAAAAGATATATTTATAGAAAAATAAAATAATAGGATATTTAACTATATTAATATAATTAAAGGAGACTCACTTACTATATCATTAAAATTTAAAGGCTATTTTCCAAGTTAATATTTTTAAAATTTTATAAATTTATTAATAATCAATGCTGTTTAATAATTTAAATAATTCTTCTAATACCTGAATAACGATTAAGAATATTAATTAAACTTGTAAAAAATAGATTTTCAGAAGATATAGCTCAAGAATATTTTACCATAAAATAGGATAATCAATTCTTTTTTTAATAGTATAATGCTTCGTCAGAAAATTTTAATCATGAATTGCAAAAAGCATGTTAAAAATAAAAAGATGAAATTGCAAGTATATTCACTTCACAATCCCATCTCTTCTAAAAATTGTATATCATCAACTCTCTTGATCCTCGCAATTCTACCAAATTAATAGCAACACATTATGTAGATATAGGTTTATACTATTCTACATTTAAATACATGTAATATTGCCTATACTTTGTTATTTTCATGTGAAACTAATTCACTATATATCTCTTTCTCTATATTTTTCAAGAGAATTTCCAACCTATTTTTTGTAGCTAGATTAAAATGTGGATTTAATTCTTTTATTATAAATTCAAGTATCTCAAAGCTTATGTAATCTCCTAATTCAAGTGAATATTGATCACCCAAGAAAGTTTTTAATTTTTTAGAGATCTCCACTTTATCTTTTTTGGTAATATTTACCCCATCCGATAACAAACCTCTCTTAGATTAGTCTTAAATACTAATTCAAAAAGTTTTTATAACATAATAAAAATATTTTGGAATTAATATAAATCATATAATGAAATTTTTACAAATAAATAAAAAATATACATCATAATCATTAAACAATATATTAAGCATCGTAAACATATATTGGCTCGTCATAAAGGCGTAAACATTTTGACCAATTTCCACATAACGTTTCCCATTCACCAGCAGGATATACTTCCGTACGCAATATATGAAGTGGTATACGTAAATTCAATAACCGACCACTAAGCGTCGTACGAGTAACTCCCGCTGGCATTAAAAACTTTGCTTCAATTACCTTTTTCAGTTCCATCAACGAGTATTCTGGATAACTCATCAATATTTCATTTGAATGAGGCCAGACCTCAGTATCATTTGAATGACGGCGTACCGTATACTCATGTTGATATAAGCTGACAATACGATGCCATATCTCTAAATGATTAAATAAATCGCTATTTAAATCCCTTGCATATAAAAAATGTACTTGTTTATTTGCTTCTGTAATTTTTGCAAGCAAACGTTTATTAGCAATAATCTTACCTGACCAAATTATTGTAGGTTCTTTCTTTAAATTATTTACCCATTCCCCCATTGGTAATAAATGATTCCAAGCACGAATTTTTATTTTTTCATCAGGTACTACCTGTACAGCAACGCGTTTACAACCTAATGCCATCATAGCCTTCATACGATGGGCACCATCAAGTAACAGATAATTACCATCACTAAGCTCAGATACTAAGGGAGGATTCCGTAAAACACCTTCGCTTTCCATTACACGACAAATTTGTTCCAATCTTTTGTGTTCATATGATTCATGAAAGCGAATTTGCTTGGGATGTAAGAGATCTAAGGCCGAAATAATTTCACTCATAAAAACAGCTCCTCAAATAATTACATAAATTCTAAGCTTATCAAATAACTAATTTTCTATAATGATTAGTCGTGTTTACCGATATTTAGCAGCTATTAGACAATTATTTCAAACTACGTTTAATAATTTTACTAGTCTTCTAATAAAACACCTATTTCCTTTTCCCTAGCCCTTTGATAGTAATAAGCAGCAATGGCTACATCAAAAATGGCCATACCCATTGGATTAAATAATATTGCTTTGTCATAAGGGAAGTTTTGCAGTGCTCCTCGGATTACAACGTCAGCAATTGATACTGTATCTTCTTTTTGTAAACCTCTTTCAAGGTGCATCCTTTCAACATCAGTTTTTTCACGACAGACTTCTTCCCAATTATCAACAACTAGGGATTTTGTATATTCTAAGATATCAGGCTTAAAATCACGAAGAGAAACATTGAGTAGTAGCGCTCCATCTTTAGGTTTCTTATCAATATAACCTTCTGCAGAAACAGTACAAGTAATAAAAATATCAGCATCATTATAGGCTTCTTCATAAGCATTAACCTTTTGAGTTTTAGAATAAATATCCTCAGGAATTAATTCGTCTTTAATTCCATTGATATCATATAGGTAAACACTTTCGATTTTATCGCCTAATAGGGCAGTTACCATCTTTAAATGCATCTGACCAATAGGTCCAAATCCAATAATTCCAACTTTAACATTATTTAAATCTCTTAACTTGGCAAACTCTCGAATCATTAACCCTGTTACAGAAGCTGTTCGTATTACACTGACCATTGCTGTATTAAGCGTCGCAAATGGTTTTCCAGTCATAGCATCATTTAATATAGTAACTGAATGAGCACGTTGAATACCTTTCTCAATATTCTTCGGAAAACTAGCTATCCATTTGATTCCTGAAACCTTAAATTCTCCTCCAATATAAGCTGGCATAGCAATAATTCTATTAGCCGGATCATCAAAACGTAAGTATGGCTTAATTGGCTGAGAAAAATCTTCTGCATCAAGACTTTGTACTGCCTTTGAAATGACATTTATTGTTTCTTCCCAATTAACACCCATTTCATTGATATGTTTAGTATTTAAATACATCATAATTGCTTCCTCATTTCTATATCATTTTTATAAAGAAACTAATTGGTCTTCTACTGATTTTTTTGTATTAAGCCATTCTACCCATTCTACGTTGTAAATAGTTGATGTGTAAGCTTGTCCATTATCAGGACACAAGAAAACAACATTAGGTGTATTTTGAACATCCCTATTTTCAAAATACTTTTGGATTGCATAATATGAAGTCCCTGATGATCCACCAGCAAATATTGCATGTTTATTAAATAGCTCATAACAACCTGCAACAGTATGGACCTCCGGTACAATCATTACATCATCAATCAATGCTTTTTTAACCATACCAGGTATCATACTGGCACCAATTCCTGGTATGTACCGTTTACGAGGCTTATCACCAAAAATAATGGACCCTTGACTATCTACCGCAATAATTTTAATATTAGGGAATTTTTCTTTTAATCGTGTAGATACCCCAGCGATAGTTCCTCCAGTACTCACTCCAATGAAGGCATAATCTAACTGTTTAAAATCATTAGATATTTCTTCACCAATTCCTTGATAATGGGCTTCAAAGTTATCAGCATTATTATATTGATTTGTCCAGTATGCATTAGGAATAGTATTTAAAAGTTCTTCCACCTTATTTAAACGCGTTAATAAATAGCCACCTGTTTCATCTCGTTCATCCACTTTAGCTACTTGATAGGAAGTTGCTCTCAAAAAATTCTCATAACTATCATTAATATTTGGATCAATAACTGGTATGAATTTCAGGCCGATATACCTACAAAGAGTAGCAAGAGCAACCGCAAAGTTACCAGATGAAGATTCGATAATTGTAGAATTTTCAGTCACTTCACCGCGACTTATTGCTGATTTTAAAATATGGTGAGCAGCACGCACTTTAACACTATTCATTAAATTGTGATATTCTAGTTTTGCATACAGATTAATCTTTTCATGCTCTAATTTAATCATAGGTGTGTTGCCGATTACCCTTTCTAAACTCTCTAATTTCTTGAGCATAATTTGCTTCCTTTCTTGGTTAAAATCTAAAAAAATAATTAAAGATAAATAGTTGTAAATGTTCTTTCGAATGTATTTCAAATAGAACTTATACCTGAAAGACAATAAATGCTAGATTCTTTAGTATCGATAAGCTAAGCACCATTCAAGTACTGCCATAGCACTATTTAGCTTGTTTTCGGCCTGGGTAAATACAATACTTTTCGGTCCATCAATAACTGTTGCTGCTACTTCTTCCCCCCGGTGTGCTGGGAGGTCATGCATGAAAGTAGCTTCCGGATATAAAGTCAAAATTCTCTCTGTCACTGCAAATGGATTAAAGGCATCTTTCCATAAAGGATCTATTTTGCTTGTCCCTGTTGTTTGCCACCTTGTTGTATATACAACATCCACTTCACTAGGTAACATTTTTATATCATGACATTCAATAACTTTAGCACCAGATAATCGTGCATATTCTTTTGATTTTTCAAGCACACTTGGAGACACTCCATAACCAGCAGGTGTAAAAAGATATAACTCCGTACCTGGAAAACGAGATAAAGAAAGAGCTAGAGCTGCAGCACTGTTATTACCTTCTCCCATATATAAAACCCGTAATCCAGCTATTTCACCAAATTTTTGTTTCATTGTTGTTAAGTCTGTTAGCGCTTGTGTTGGATGTTCTTCCGTACTCATTGCATTAATGACTGCCATACGACTTTGAGATGCCAGCTTTTCCATTTCCTTTTGACTATCTGCAGTTCTTGCTACTAGTGCATCCAGCATTCGCGAAAGTACTTGAGTAGTGTCTTCTATAGACTCCCCTGTATTTTCTTGCAAATCATTTGGACCATACGTTACAATTTGTGCACCCATCTTTAATGCTGCAACAGAAAATGCCGATCGAGTTCTAGTCGATGTTTTACGGAAGTAAATCCCAATTATATTTCCCGCTAATATTTGATTAGGTTGTGATTTACCTGTGGCAAACTCTACCCCACGAGTTACAATTTGGTTAATATCACTATCAGTAAGATCCTCAAGAGTAATTAAATGTTTTCGAATAGCCATTATTATATTACCTCCCTTTTGAATAATTGCATGTAACAAAAATGTATTTACATATGTATGTAAATAAAAACTACTAAACAGCTAAAGTAAAAATAATAGATTGGAAATATAATTCATGTGGATCCATTACCACACTTTTCTATAATTTTTTTTAAACTAATGTATAGCAGTTGATATATAGCCTATATGAACTCGAATACTATTTTTACTCAGAAATACTTACATATAATTCTATTGTTAGTCTATATTCCTATTTTTTATGATATAGCTCTGCTGAATATATTACACACTACAACACTAAAAATATACTCAAAATACAGAATTTTCAGCTTTCTCAATTCTTTCATCAATTTTCTGTCAATTCATACTCAAATTATCTTTTTTACTAATAATATACATACAATGTTGGCTGTTTCTTTTATAGCCCACTTCTAAAAAATCCACCCATTCAAAAATAACGTTTTCCGGATTGTCAGTTAAGATACAAATTACTGCTGCAGTCATTAAAGAGTCCTCTTTAAAAGAGATATACTCCTTCTACTGGCTCGCTATTTTTAGTTATGATAACCTCTTAATTGTTACAGCTATAACGCTAACAATTGTTATTGCAACCATTGATCTAAGTAAACAAACAATGAATGACATACTAATATATACACAAAGTAATCTATTAAGCGTAAAATGATTGCATCTCCAGCAAGTTTAATTACTTATTCAGATCTTTGCTTTTATAATTTTCAAAATTAAATTTCCGAAAATTTTCTTTCCTTAAGGAAATATCGATCATTTTAAACTATGTAAGAATGCTTATCTTCTATAAAGTAATTACACAAATCAACAAAGTCAATTTCAATTATTGATTCTGGAAAACTATCTAAAGAAGCACAACAAGAAAGCTTATAATCTTTACCAATAGTAAACTTTTTATAATAAACCTCATTAGATAGCATGTCACCATAAAATATTATATTTTTTTTAATAACAAAAGAGTTTAAAGGTATAGTTAGTCCTTTCCCTATCCATTTAACAAAGCTTTCTTTTATAGTCCATAATTCATAAAACACATCTAATTGTTGTTCTACATTTAATGAATTTAAATAATTAAATTCTTCCTCAGTAAATATATTTTTAATCATTGTAGTATCCATGGGCCGGACTTTTTCTACATCTATTCCTACTTCTTCCTCATGAATAGCTCCAACAACCCAACTTTCGGAATGCGATATATTAAAATGAAAATTATTTAACTCATCCACATAAGGCTTTCCGTATTCATTGTATTTATATTTAATATCTTTATTTTCTTTTGAGAAATTTGTAATAATTAAATATCTTATTATAATGTCTCCAATTAAGGCGCTATATTGGTCGGGCTTTCTTTTATATTTCTGTATTTTCATCTGCTTTTCTTTTGAGACCAAACTCATAAGTTTCTGCATAATGTTATGTTCTATATTTCCTGGCACACGTACCTTATATATATTCATTATTTCCTTCCCACCTAACTTCTTATATAAATATAAAAAACTCTCAATGAATAATTCCTATTTATTTTTGATTAGTAAAAAACATTTTCAGAGGATAATATTCTATTTTACATTCATATTTTCGACTTATTTTCTAACCTCTTTATGTCTATTATTGAATGGGCTCAAAGAAAAAAGTCTCATAATATGTTATACACGAATTTATGAGACTTTTAGAAAACTAATAGTGTACAATTTCAAATAATATTTCACCTATTTACATGTGCTAATAAGAGCATTTGTTTAAAACTCATTTTTATACTCTTTACCAACTCATGAATATGCGGTTTCTCTAACATACTCAGATGGGAACCTGGCACTTGCAAAGCGTATACTTCCCCTGATGTATATTCATTCCATCTGTTATAATCTACTAGTGGGTGAATGTCATTAATAGATGCATTAAATAGAAAGATATCTGCTTTTATTTTTTGTTTACAATTATATTTTAGGTATGCATATCTATTAGCTATCATTACTTTTAACTTATTCATCATTGGATCTTCAAAATTTTGCTGACAACTATTTTTATTCAATGTAAATTTTTTCAGTAAAGATTCTATTAATTGTTCTTCACTCATTTGCTCAAAACTAATTTTCTCAATCCCTAACTGATCATTGAATTTTTCCAATTCTTCAAAGGCATTCTTAATATTTAAACTAAGAATCTCCTTACCTTGTTCAATAGGATGAACATCAAGTAAACCTAGAAAACTAACTTTATCACCTAGTTCTTCTAATTTTCTAGCCATTTCAAATGCAACTATTCCTCCAAAAGACCATCCTAATAAGGCATATGGCCCTTCTTTTTTTACTTGTTTAATCTCTTCTATATATCTTACCGCCATTTCCTCTACAGATAAGTTTGAAAATCTATTATCATCATATCCTATAGATTGTAACCCATATACAGTCTTATCCTCTCCTAATTCTCTAGCCAAATCATAATAGTTTAATATGCCCCCTCCTTGTCCATGAACAATGAACCATTGACTATCCTTGTTTGTCCCATTTTGAATTGGAATTAAACACTCACTGTCTATTCCTTTATTCCTACTTATTACATCACTTAATTGTTCAATAGTAGCATTTTGAAATAATAAACTTAATGGCAGTTGCACATTAAACATTCTCTTGATATTTTCGAATAACTTTAATCCCTTAAGAGAATGACCACCTAATTCAAAGAAATTATCATTTATACCAATATTATTTACGCCTAATATACTACTCCAAATATCAATTAAACTACTATCTATGTCATTTCTTGGTGGTACATAATTACTATTGTCCAGTGTATTTAGCTTCGGTAACTTACTTCTATCTATTTTCCCATTTTGTGTTAATGGTATATTGTGAATTGGTATGAGTTGTTGAGGTATCATATAATGTGGTAACTTTGTTGCCAAATATGCTCTTACCTCTGGAATAGGAATATCTTTTTCTGTAACTACGTATGCACATAAATACTTTTCCCCAGCTTCATCTTCTTGATCTATTACTACGGCCGTTTTGATTGTCTCATATTTTAACAAACTCGCTTCTATCTCACCTAGTTCTATTCGATATCCCCTTATTTTTACTTGATGATCGACTCGGCCCAAGTATTCAATGTTACCATCAGGTAGCCACCTTGCTATATCCCCTGTTTTATATAGTTTCTCACCTCGTTCAAATGGATGATCAATAAATTTATCTGCTGTTAATTCTGTTCGATTGATGTATCCTCTAGCTAACCCTATGCCAGAGATACATATTTCACCTGGAACACCTATAGGTTGTATCCTATAAAATGAATCCAATATATAAATTTTAGTATTTAACAATGGTGAACCTATCGGTACGTTTTGTGTTGTAATTTCTTTATTACTCTCATATCGATAAGATGTACAACAAACTGTAGCCTCTGTAGGTCCGTATCCATTTAATATTTGGAGGTTCCCCCTAAATAGATGATCGTATTTTGCGAGTAATTCTGTTTTAATGGGTTCTACTCCCACAAGTAGCTTATTTAACACTATCTTCTGGTTATCTCTAACAAAATAATCATATATTTCATTTAATAAAGTAGGTGGAATATATGATAATGTAACCTGTTCTTCAAGAATAACTTGTACAAGTTTTGATACATCAAACTTCTCACCTTGATAAATAGTCATTCTTGCGCCATATATCAATGGGACAAATATCTCAAAAATAGTAACATCAAAAGAAATACTACTTGAGAATAGAACGTTATCAGTTATCCCTATATCTTGAGAGAAATCTTCATACATTGCACACAAAAAATTAGTCAAGGATCGATGTTCAATCATTACTCCTTTAGGTTGTCCTGTAGAACCTGATGTATAAATAACATACGCTAAGTTGTGAGGTTCTATCATCATTTGCATGTCTTCTCCTGGTTCTTCTTCAAAAGACATATCCATTAGATCTATTACATTACCTTGGAATTCTATCCCCTTTATAATAGAGTTTTGATGTACTAATACATGTGAACACCCACTGTCTGTCAGCATATATTCCACTCTTTGTTTTGGTAAGGCGGTATCAATTGGTAAATACGCCCCACCTGCTTTTAAAACACCTAATATACCTATAACCATCTCGATGGATCGTTCCATCATCACACCAACAATGGATTCTCTTTTAACTCCTTGGTCTAATAACCTTCTCGCCAACTGATTAGCTTTTATATTTAATTCATTATATGTAATTCCTTTTTCATTACATACAACGGCTATTTGATTAGGGTTCCGTTTTACTTGTTCTTCAAACATTTTATGCACTAATAAATGATTAGAATTTGAATTCTCTTTTTTGTTAAATTCATTCATAATACAATGTTCTTCTTCTATAGATAACATATTAATATTACGCAATCGTACTCTAGGGTTATTAGTTACTTCCTCAACTATATTTGTAAAATGTACCATTAACCTTTCTATTGTCTCCGCTTTAAATAACTTAGTACTATATTCTACTTTTAAATGAATGTTATTATCTATTTCCGTTGCTACTAATGACAAATCAAATTTTGAAACTGACTGCTTAAACGGATACGGTGTAAATTCTAATTCACCAATAGATATTGGATTCATATCCATGTTTTGAAAAACAAACATGGTATCAAATAATGGATTTCTACTTGTATCCCTATGCAAGTCTAAACCTTCTAATAGTTCTTCAAAAGGATAGTCTTGATTTTCATAAGCTTCTAATGTATTGAGTTTTAATCTACTCAAAAACTCAATAAACTCATCGTCATTTTCTAGATAATTTCTCATTACCAAAGTATTAATAAACATACCAATCATATGATTAGTATCAGAATGAGACCTTCCAGCAATAGGCGAACCTACAATAATGTCTTCTTGACCTGTGTATCTAGATAAAAGTATGTTATAAATGGCCAATAAAATCATATATGGCGTAGTACCAGTTTCAGTTGCTAGCTTATTTACTTTAAAAGTTAAATCTCTTCCCAAATTAAAAGAACAAACATTACCTTTAAAGCTTTGTATAGTCGGTCTTTGAAAATCGGTTGGAAAATTTAAAACCGGGAGTTCTCCTTTTAAAGTTGTTAACCAATAATTCTTTTGTTCACTAATTAGATTCTTATAGTAAGGTCCATTTTGCCACATCACATAGTCTTTGTACTGCACTCTCAATTTTGGAAGCTCATTTCCTTTATACAATTCTACAAACTCTTTTATTAATATCCCCATTGATAAACCATCAGATATTATATGATGCATATCTACTACAAGGATATGTCTTTCTTCTGCTATCCTTAAAAGCAACACCCTTAATAATGGAGGTTTTGATAAATCAAATGGACTTATAAACTCATGTATTAAATAATCCGCATCTTTTTCATTTACATGAACGTATTCAATATTGAAATCTACATTAGGCTCAATTTTCTGCACTAATTCCCCATCTAAAATTTGAAAAGAAGTTCTTAATATTTCATGTCTCTCAATTAAAGATTGAAATATATTTTCAAACTTATCTTTACAAATATCCCCTTCTACTTTAAGTATTGTGGGCATATTATAAGTTGTATTTGTACCATCTTCGAATTGATCTACTATAAACATTCTCTTTTGTGACGTAGAAGCTAAATAATACTCTTGTTGTTTTACAGGTTCTATGGAAATATAATTACTTTTTTCCATTTCTAATATACATTTTGAAAAATCAACCAAAATAGGAAACTTAAATAGGGATTTAATAGATAATTGCACATTAAATTCTTTATTAACGATAGAAATTAGCCTAGCAGCCTTTAATGAATGACCACCTATCTCAAAAAAATTATCCCGTATTCCTACCCTTTGTATTCCTAATACATCTTTCCAAATCTCAACTAATTTTCTTTCTGTAGAATTTGTCGGCTCTAGATGACTAGATTTCAAATTATTTATAGGTTGAGGTAATTTTTTTCTATCTATTTTTCCGTTTTGTGTTAACGGTATGTTTTGAATGGATATGATTTGTTGAGGTATCATATAATATGGTAACTTGGTTGCTAAATATGCTCTTACCTCTGGAATAGGAATATCTTTTTCGGTAACTACATACGCACATAAATACTTTTCTCCACTTTCATCTTCTCGCTGTATTACTACGGCGGTTTTGATTGTCTCATATTTTAACAAACTTGCTTCTATCTCACCTAGTTCTATTCGATATCCCCTTATTTTCACTTGATGATCAACTCGTCCCAAGTATTCTATGTTACCATCAGGTAGCCATCTCGCTATATCTCCTGTTTTATATAATTTCTCACCATGTTCAAATGGATGATCAATAAATTTATCTGCTGTTAATTCTTTTCGATTGATGTATCCCCTAGCTAACCCTATGCCAGAAATACATATTTCTCCTGGAACACCTATAGGTTGTAGCCTGTGAAATGAATCCAATATATAAATTTTAGTATTTAACAATGGCGAACCTATTGGTACGTTTTGTATTGTAATTTCTTTATCCCTCTCATATTGATAAGATGTACAACAGACTGTAGCCTCTGTAGGTCCGTATAAATTTAATATTTGGAGGTTCCCCCTAAATAGATGATCGTATTTTGCAAGTAATTCTGTTTTAATTGGTTCTACTCCCACGAAAAGTTTATTTAACGATATCTTCTGATTCGCCCTTACAAAATAATCATATATTTCATTTAATAAGGTAGGTGGAATATATGCTAACGTGACCTGTTCCTCAAGAATGACTTGTACAAGTTTTGGTACATCAAACTTCTCACCTTGATAAATAGTCATTCTTGCTCCATATACCAATGGGACAAATATTTCAAATATAGTAACATCAAAAGAAATACTACTTGAGAATAGTACATTATCACTTATCCCTATATCTTGAGAGAAGTCCTCATACATTGCACACAAAAAATTAGTTAAGGAACGATGTTCAATCATTACCCCTTTGGGCTGGCCTGTAGAACCTGATGTATAAATAACATATGCCAAATTTTGAGGTTCCATCGTTATCTGCAAATCTTCTACTTGTTCTTCTTCAAAAGGAATATCCATTAGATTTATTACACTACCTTGAAATGCTACTCCCTTTATAATAGAGTTTTGATATGTTAGTACATGTGAACACCCGCTGTCTGTCAGCATATATTCCACTCTTTGTTTTGGTAACTCGGTATCAATCGGTAAATACGCCCCACCCGCTTTTAAGATACCTAATATGCCTACAATCATCTCAATAGAGCGTTCCATCATCACACCAACAATGGATTCTCTTTTAACTCCCTGGTCTAATAACCTTCTCGCCAACTGATTAGCTTTTATATTTAATTGTTTATATGTAATTTCTTTTCCATTACATACAATCGCTATTTGATTAGGATTCTGTTTTACCTGCTCTTCAAATAATTGAGGAGCTGTTACAGTCTCACAAAGTAATGTTTTATGAACTCTAGTCGTATGATTGAAATCAAATAATATTTGATTCTTTTCTGTTTTAGGCATTACATCTAGATCCATTGCAGATTTATTTGGATCTTTCATTAATATATCTAAAATATTATATAAATGATTAACTATTCTACTAACTAATCCTTCACTATATAAAATAGAATTGTAATCCACTTGAACCTTTAACTGTTCTTCATTTTTCATAAACCTAATAACCATATCAGAGTTTATTTTATCTGTACTCTCATAACAATGAATATCATCTAACATTACTATTGTATTTAATAAGGGAAGATTATTACTCTCTCCATCTAGACTTAATAATTGAGTAAGTTTATTAAAAGGAAAATGACAATGTTCATTTGACTCTAAAATTGTTTCTTTAATTTTATATATTATTTCTTTAAAATTATCTTCTTGATTTATTTGAGTACGTAATAATAAAAAGTTGTTTTGAAAAACCGTCTCTTCTTGACCTTGTTTAAA |
Table 3. The contents of “Bt_all_genes.table”
- | Cry1Ac16 | Cry1Ca15 | Cry1Da2 | Cry1Ia40 | Cry2Aa10 | Cry2Ab35 | Cry78Aa1 | Cry9Ea9 | HMM_Cry_len_492 | HMM_Cyt_len_531 | HMM_Cyt_len_533 | HMM_Cyt_len_615 | Sip1A2-other | Vip1Aa2 | Vip3Aa12 | Zwa5B-other |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1126_1 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 32.66 | 100.00 | 100.00 | 99.07 | |||||||
AFS094730 | 36.14 | 1 | 1 | 92.41 | 29.46 | |||||||||||
AFS095482 | 1 | 1 | ||||||||||||||
AK47 | 100.00 | 100.00 | 100.00 | 1 | 100.00 | 100.00 |
Footnote: The float number represent the identity of blast search, and the integer number represent the number of toxins predicted by HMM and SVM.
License
BtToxin_Digger is free software, licensed under GPLv3.
Feedback/Issues
Please report any issues about usage of the software to the issues page.
Citation
If you use this software please cite: Hualin Liu, Jinshui Zheng, Dexin Bo, Yun Yu, Weixing Ye, Donghai Peng, Ming Sun. (2021) BtToxin_Digger: a comprehensive and high-throughput pipeline for mining toxin protein genes from Bacillus thuringiensis. Bioinformatics, 2021;, btab506, https://doi.org/10.1093/bioinformatics/btab506.
If you used the genome assembly function, please also cite: Liu H, Xin B, Zheng J, Zhong H, Yu Y, Peng D, Sun M. Build a bioinformatics analysis platform and apply it to routine analysis of microbial genomics and comparative genomics. Protocol exchange, 2021. DOI: 10.21203/rs.2.21224/v5+.
FAQs
Updates
v1.0.2
- Fixed a “Can not find path” error.
v1.0.3
- Fixed a bug of “get_all_info_nucl.pl”, which can not get the gene location and strand information of some toxins.
v1.0.4
- Updated the database and models to support the latest clasiffication of Bt toxins.
v1.0.5
- The name of strains with no toxin found will be outputed into the file “Strains_without_toxins_found.txt”.
v1.0.6
- Update to adapt to PGCGAP v1.0.23, and to delete the intermediate files.
v1.0.7
- To delete the intermediate files.
v1.0.8
- To delete the intermediate files.
v1.0.9
- Update the database to keep it consistent with the BPPRC (to 2021-06-07).
More HMM models are added.
The filter length of amino acid sequence was reduced from 115 to 75.
Solved a problem: BioPerl reported an error when the length of the sequence in the input file was inconsistent.
- Update the database to keep it consistent with the BPPRC (to 2021-06-07).
v1.0.10
- Users can use the command “BtToxin_Digger –update-db” to update their local database.
The toxin database was updated to 2021.08.02.
- Users can use the command “BtToxin_Digger –update-db” to update their local database.