____ ____ ____ ____ _ ____
U| _"\ u U /"___|u U /"___| U /"___|u U /"\ u U| _"\ u
\| |_) |/ \| | _ / \| | u \| | _ / \/ _ \/ \| |_) |/
| __/ | |_| | | |/__ | |_| | / ___ \ | __/
|_| \____| \____| \____| /_/ \_\ |_|
||>>_ _)(|_ _// \\ _)(|_ \\ >> ||>>_
(__)__) (__)__) (__)(__) (__)__) (__) (__) (__)__)
PGCGAP is a pipeline for prokaryotic comparative genomics analysis. It can take the pair-end reads, Oxford reads or PacBio reads as input. In addition to genome assembly, gene prediction and annotation, it can also get common comparative genomics analysis results such as phylogenetic trees of single-core proteins and core SNPs, pan-genome, whole-genome Average Nucleotide Identity (ANI), orthogroups and orthologs, COG annotations, substitutions (SNPs) and insertions/deletions (indels), and antimicrobial and virulence genes mining with only one line of commands.
The software was tested successfully on Windows WSL, Linux x64 platform, and macOS. Because this software relies on a large number of other software, so it is recommended to install with Bioconda.
Step1: Install PGCGAP
$conda create -n pgcgap python=3
$conda activate pgcgap
$conda install pgcgap (Users in China can input "conda install -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda pgcgap" for instead)
Step2: Setup COG database (Users should execute this after the first installation of pgcgap)
$conda activate pgcgap
$pgcgap --setup-COGdb
$conda deactivate
Users with docker container installed have another choice to install PGCGAP.
$docker pull quay.io/biocontainers/pgcgap:<tag>
(see pgcgap/tags for valid values for <tag>)
$pgcgap --help
$pgcgap [modules] [options]
$pgcgap [Assemble|Annotate|ANI|AntiRes|CoreTree|MASH|OrthoF|Pan|pCOG|VAR|STREE|ACC]
$pgcgap Examples
$pgcgap --setup-COGdb
Modules:
[--All] Perform Assemble, Annotate, CoreTree, Pan, OrthoF, ANI, MASH, AntiRes and pCOG functions with one command
[--Assemble] Assemble reads (short, long or hybrid) into contigs
[--Annotate] Genome annotation
[--CoreTree] Construct single-core proteins tree and SNPs tree of single-copy core genes
[--Pan] Run "roary" pan-genome pipeline with gff3 files, and construct a phylogenetic tree with the sing-copy core proteins called by roary
[--OrthoF] Identify orthologous protein sequence families with "OrthoFinder"
[--ANI] Compute whole-genome Average Nucleotide Identity ( ANI )
[--MASH] Genome and metagenome similarity estimation using MinHash
[--pCOG] Run COG annotation for each strain (*.faa), and generate a table containing the relative abundance of each flag for all strains
[--VAR] Rapid haploid variant calling and core genome alignment with "Snippy"
[--AntiRes] Screening of contigs for antimicrobial and virulence genes
[--STREE] Construct a phylogenetic tree based on multiple sequences in one file
[--ACC] Other useful gadgets (now includes ‘Assess' for filtering short sequences in the genome and assessing the status of the genome only)
Global Options:
[--strain_num (INT)] [Required by "--All", "--CoreTree", "--Pan", "--VAR" and "--pCOG"] The total number of strains used for analysis, not including the reference genome
[--ReadsPath (PATH)] [Required by "--All", "--Assemble" and "--VAR"] Reads of all strains as file paths ( Default ./Reads/Illumina )
[--scafPath (PATH)] [Required by "--All", "--Assess", "--Annotate", "--MASH" and "--AntiRes"] Path for contigs/scaffolds (Default "Results/Assembles/Scaf/Illumina")
[--AAsPath (PATH)] [Required by "--All", "--CoreTree", "--Pan", "--OrthoF" and "--pCOG"] Amino acids of all strains as fasta file paths, ( Default "./Results/Annotations/AAs" )
[--reads1 (STRING)] [Required by "--All", "--Assemble" and "--VAR"] The suffix name of reads 1 ( for example: if the name of reads 1 is "YBT-1520_L1_I050.R1.clean.fastq.gz", "YBT-1520" is the strain same, so the suffix name should be ".R1.clean.fastq.gz")
[--reads2 (STRING)] [Required by "--All", "--Assemble" and "--VAR"] The suffix name of reads 2( for example: if the name of reads 2 is "YBT-1520_2.fq", the suffix name should be "_2.fq" )
[--Scaf_suffix (STRING)] [Required by "--All", "--Assess", "--Annotate" "--MASH", "--ANI" and "--AntiRes"] The suffix of scaffolds or genomes. Here, "-8.fa" for Illumina data, ".contigs.fasta" for PacBio data and Oxford data. Users can also fill in other suffixes according to the actual situation (Default -8.fa)
[--filter_length (INT)] [Required by "--All", "--Assemble" and "--Assess"]> Sequences shorter than the ‘filter_length' will be deleted from the assembled genomes. ( Default 200 )
[--codon (INT)] [Required by "--All", "--Annotate", "--CoreTree" and "--Pan"] Translation table ( Default 11 )
[--suffix_len (INT)] [Required by "--All", "--Assemble" and "--VAR"] (Strongly recommended) The suffix length of the reads, that is the length of your reads name minus the length of your strain name. For example the --suffix_len of "YBT-1520_L1_I050.R1.clean.fastq.gz" is 26 ( "YBT-1520" is the strain name ) ( Default 0 )
[--logs (STRING)] Name of the log file ( Default Logs.txt )
[--threads (INT)] Number of threads to be used ( Default 4 )
Local Options:
--Assemble
[--platform (STRING)] [Required] Sequencing Platform, "illumina", "pacbio", "oxford" and "hybrid" available ( Default illumina )
[--assembler (STRING)] [Required] Software used for illumina reads assembly, "abyss" and "spades" available ( Default abyss )
[--kmmer (INT)] [Required] k-mer size for genome assembly of Illumina data ( Default 81 )
[--genomeSize (STRING)] [Required] An estimate of the size of the genome. Common suffixes are allowed, for example, 3.7m or 2.8g. Needed by PacBio data and Oxford data ( Default Unset )
[--short1 (STRING)] [Required] FASTQ file of first short reads in each pair. Needed by hybrid assembly ( Default Unset )
[--short2 (STRING)] [Required] FASTQ file of second short reads in each pair. Needed by hybrid assembly ( Default Unset )
[--long (STRING)] [Required] FASTQ or FASTA file of long reads. Needed by hybrid assembly ( Default Unset )
[--hout (STRING)] [Required] Output directory for hybrid assembly ( Default ../../Results/Assembles/Hybrid )
--Annotate
[--genus (STRING)] Genus name of your strain ( Default "NA" )
[--species (STRING)] Species name of your strain ( Default "NA")
--CoreTree
[--CDsPath (PATH)] [Required] CDs of all strains as fasta file paths ( Default "./Results/Annotations/CDs" ), if set to "NO", the SNPs of single-copy core genes will not be called
[-c (FLOAT)] Sequence identity threshold, ( Default 0.5)
[-n (INT)] Word_length, -n 2 for thresholds 0.4-0.5, -n 3 for thresholds 0.5-0.6, -n 4 for thresholds 0.6-0.7, -n 5 for thresholds 0.7-1.0 ( Default 2 )
[-G (INT)] Use global (set to 1) or local (set to 0) sequence identity, ( Default 0 )
[-t (INT)] Tolerance for redundance ( Default 0 )
[-aL (FLOAT)] Alignment coverage for the longer sequence. If set to 0.9, the alignment must cover 90% of the sequence ( Default 0.5 )
[-aS (FLOAT)] Alignment coverage for the shorter sequence. If set to 0.9, the alignment must covers 90% of the sequence ( Default 0.7 )
[-g (INT)] If set to 0, a sequence is clustered to the first cluster that meets the threshold (fast cluster). If set to 1, the program will cluster it into the most similar cluster that meets the threshold (accurate but slow mode, Default 1)
[-d (INT)] length of description in .clstr file. if set to 0, it takes the fasta defline and stops at first space ( Default 0 )
--Pan
--OrthoF
[--Sprogram (STRING)] Sequence search program, Options: blast, mmseqs, blast_gz, diamond ( Default diamond)
[--PanTree] Construct a phylogenetic tree of single-copy core proteins called by roary
--ANI
[--queryL (FILE)] [Required] The file containing paths to query genomes, one per line ( Default scaf.list )
[--refL (FILE)] [Required] The file containing paths to reference genomes, one per line. ( Default scaf.list )
[--ANIO (FILE)] The name of the output file ( Default "Results/ANI/ANIs" )
--VAR
[--refgbk (FILE)] [Required] The full path and name of reference genome in GENBANK format ( recommended ), fasta format is also OK. For example: "/mnt/g/test/ref.gbk"
[--qualtype (STRING)] [Required] Type of quality values (solexa (CASAVA < 1.3), illumina (CASAVA 1.3 to 1.7), sanger (which is CASAVA >= 1.8)). ( Default sanger )
[--qual (INT)] Threshold for trimming based on average quality in a window. ( Default 20 )
[--length (INT)] Threshold to keep a read based on length after trimming. ( Default 20 )
[--mincov (INT)] The minimum number of reads covering a site to be considered ( Default 10 )
[--minfrac (FLOAT)] The minimum proportion of those reads which must differ from the reference ( Default 0.9 )
[--minqual (INT)] The minimum VCF variant call "quality" ( Default 100 )
[--ram (INT)] Try and keep RAM under this many GB ( Default 8 )
[--tree_builder (STRING)] Application to use for tree building [raxml|fasttree|hybrid] ( Default fasttree)
[--iterations (INT)] Maximum No. of iterations for gubbins ( Default 5 )
--AntiRes
[--db (STRING)] [Required] The database to use, options: argannot, card, ecoh, ecoli_vf, ncbi, plasmidfinder, resfinder and vfdb. ( Default ncbi )
[--identity (INT)] [Required] Minimum %identity to keep the result, should be a number between 1 to 100. ( Default 75 )
[--coverage (INT)] [Required] Minimum %coverage to keep the result, should be a number between 0 to 100. ( Default 50 )
--STREE
[--seqfile (STRING)] [Required] Path of the sequence file for analysis.
[--seqtype (INT)] [Required] Type Of Sequence (p, d, c for Protein, DNA, Codons, respectively). ( Default p )
[--bsnum (INT)] [Required] Times for bootstrap. ( Default 1000 )
--ACC
Paths of external programs
Not needed if they were in the environment variables path. Users can check with the "--check-external-programs" option for the essential programs.
[--abricate-bin (PATH)] Path to abyss binary file. Default tries if abyss is in PATH;
[--abyss-bin (PATH)] Path to abyss binary file. Default tries if abyss is in PATH;
[--canu-bin (PATH)] Path to canu binary file. Default tries if canu is in PATH;
[--cd-hit-bin (PATH)] Path to cd-hit binary file. Default tries if cd-hit is in PATH;
[--fastANI-bin (PATH)] Path to the fastANI binary file. Default tries if fastANI is in PATH;
[--Gblocks-bin (PATH)] Path to the Gblocks binary file. Default tries if Gblocks is in PATH;
[--gubbins-bin (PATH)] Path to the run_gubbins.py binary file. Default tries if run_gubbins.py is in PATH;
[--iqtree-bin (PATH)] Path to the iqtree binary file. Default tries if iqtree is in PATH;
[--mafft-bin (PATH)] Path to mafft binary file. Default tries if mafft is in PATH;
[--mash-bin (PATH)] Path to the mash binary file. Default tries if mash is in PATH.
[--modeltest-ng-bin (PATH)] Path to the modeltest-ng binary file. Default tries if modeltest-ng is in PATH.
[--muscle-bin (PATH)] Path to the muscle binary file. Default tries if muscle is in PATH.
[--orthofinder-bin (PATH)] Path to the orthofinder binary file. Default tries if orthofinder is in PATH;
[--pal2nal-bin (PATH)] Path to the pal2nal.pl binary file. Default tries if pal2nal.pl is in PATH;
[--prodigal-bin (PATH)] Path to prodigal binary file. Default tries if prodigal is in PATH;
[--prokka-bin (PATH)] Path to prokka binary file. Default tries if prokka is in PATH;
[--raxml-ng-bin (PATH)] Path to the raxml-ng binary file. Default tries if raxml-ng is in PATH;
[--roary-bin (PATH)] Path to the roary binary file. Default tries if roary is in PATH;
[--sickle-bin (PATH)] Path to the sickle-trim binary file. Default tries if sickle is in PATH.
[--snippy-bin (PATH)] Path to the snippy binary file. Default tries if snippy is in PATH;
[--snp-sites-bin (PATH)] Path to the snp-sites binary file. Default tries if snp-sites is in PATH;
[--unicycler-bin (PATH)] Path to the unicycler binary file. Default tries if unicycler is in PATH;
Setup COG database
Check the required external programs (It is strongly recommended that this step be performed after the installation of PGCGAP):
$pgcgap --check-external-programs
Example 1: Perform all functions, take the Escherichia coli as an example, total 6 strains for analysis.
Notice: For the sake of flexibility, The "VAR" function needs to be added additionally.
$pgcgap --All --platform illumina --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --suffix_len 11 --kmmer 81 --genus Escherichia --species "Escherichia coli" --codon 11 --strain_num 6 --threads 4 --VAR --refgbk /mnt/h/PGCGAP_Examples/Reads/MG1655.gbff --qualtype sanger
Example 2: Genome assembly.
Illumina reads assembly
In this dataset, the naming format of the genome is "strain_1.fastq.gz" and "strain_2.fastq.gz". The string after the strain name is "_1.fastq.gz", and its length is 11, so "--suffix_len" was set to 11.
$pgcgap --Assemble --platform illumina --assembler abyss --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --kmmer 81 --threads 4 --suffix_len 11 $pgcgap --Assemble --platform illumina --assembler spades --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --threads 4 --suffix_len 11 $pgcgap --Assemble --platform illumina --assembler auto --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --kmmer 81 --threads 4 --suffix_len 11
Oxford reads assembly
Oxford nanopore only produces one reads file, so only the parameter of "\-\-reads1" needs to be set, where the value is ".fasta". "\-\-genomeSize" is the estimated genome size, and users can check the genome size of similar strains in the NCBI database for reference. The parameter was set to "4.8m" here. The suffix of the reads file here is ".fasta" and its length is 6, so "\-\-suffix_len" was set to 6.
$pgcgap --Assemble --platform oxford --filter_length 200 --ReadsPath Reads/Oxford --reads1 .fasta --genomeSize 4.8m --threads 4 --suffix_len 6
PacBio reads assembly
PacBio also produces only one reads file "pacbio.fastq", the parameter settings are similar to Oxford. The strain name is "pacbio" with the suffix ".fastq" and the suffix length is 6, so "--suffix_len" was set to 6.
$pgcgap --Assemble --platform pacbio --filter_length 200 --ReadsPath Reads/PacBio --reads1 .fastq --genomeSize 4.8m --threads 4 --suffix_len 6
Hybrid assembly of short reads and long reads
Paired-end short reads and long reads in the directory "Reads/Hybrid/" were used as inputs. Illumina reads and long reads must be from the same isolates.
$pgcgap --Assemble --platform hybrid --ReadsPath Reads/Hybrid --short1 short_reads_1.fastq.gz --short2 short_reads_2.fastq.gz --long long_reads_high_depth.fastq.gz --threads 4
Example 3: Gene prediction and annotation
$pgcgap --Annotate --scafPath Results/Assembles/Scaf/Illumina --Scaf_suffix -8.fa --genus Escherichia --species "Escherichia coli" --codon 11 --threads 4
Example 4: Constructing single-copy core protein tree and core SNPs tree
$pgcgap --CoreTree --CDsPath Results/Annotations/CDs --AAsPath Results/Annotations/AAs --codon 11 --strain_num 6 --threads 4
$pgcgap --CoreTree --CDsPath NO --AAsPath Results/Annotations/AAs --codon 11 --strain_num 6 --threads 4
Example 6: Conduct pan-genome analysis and construct a phylogenetic tree of single-copy core proteins called by roary.
$pgcgap --Pan --codon 11 --identi 95 --strain_num 6 --threads 4 --GffPath Results/Annotations/GFF --PanTree --AAsPath Results/Annotations/AAs
Example 7: Inference of orthologous gene groups.
$pgcgap --OrthoF --threads 4 --AAsPath Results/Annotations/AAs
Example 8: Compute whole-genome Average Nucleotide Identity (ANI).
$pgcgap --ANI --threads 4 --queryL scaf.list --refL scaf.list --ANIO Results/ANI/ANIs --Scaf_suffix .fa
$pgcgap --MASH --scafPath <PATH> --Scaf_suffix <STRING>
Example 10: Run COG annotation for each strain.
$pgcgap --pCOG --threads 4 --strain_num 6 --AAsPath Results/Annotations/AAs
Example 11: Variants calling and phylogenetic tree construction based on the reference genome.
$pgcgap --VAR --threads 4 --refgbk /mnt/h/PGCGAP_Examples/Reads/MG1655.gbff --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --suffix_len 11 --strain_num 6 --qualtype sanger --PanTree
Example 12: Screening of contigs for antimicrobial and virulence genes
$pgcgap --AntiRes --scafPath Results/Assembles/Scaf/Illumina --Scaf_suffix -8.fa --threads 6 --db ncbi --identity 75 --coverage 50
Example 13: Filter short sequences in the genome and assess the status of the genome
$pgcgap --ACC --Assess --scafPath Results/Assembles/Scaf/Illumina --Scaf_suffix -8.fa --filter_length 200
Example 14: Construct a phylogenetic tree based on multiple sequences in one file
$pgcgap --STREE --seqfile proteins.fas --seqtype p --bsnum 1000 --threads 4
The directory where the PGCGAP software runs.
Pair-end reads of all strains in a directory or PacBio reads or Oxford nanopore reads (Default: ./Reads/Illumina/ under the working directory).
Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).
QUERY_LIST and REFERENCE_LIST files containing full paths to genomes, one per line (default: scaf.list under the working directory). If the "--Assemble" function was run first, the list file will be generated automatically.
Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).
Amino acids file (With ".faa" as the suffix) and nucleotide (With ".ffn" as the suffix) file of each strain placed into two directories (default: "./Results/Annotations/AAs/" and "./Results/Annotations/CDs/"). The ".faa" and ".ffn" files of the same strain should have the same prefix name. The name of protein IDs and gene IDs should be started with the strain name. The "Prokka" software was suggested to generate the input files. If the "--Annotate" function was run first, the files will be generated automatically. If the "--CDsPath" was set to "NO", the nucleotide files will not be needed.
A set of protein sequence files (one per species) in FASTA format under a directory (default: "./Results/Annotations/AAs/"). If the "--Annotate" function was run first, the files will be generated automatically.
GFF3 files (With ".gff" as the suffix) of each strain placed into a directory. They must contain the nucleotide sequence at the end of the file. All GFF3 files created by Prokka are valid (default: ./Results/Annotations/GFF/). protein sequence files (one per species) in FASTA format under another directory were also needed (default: "./Results/Annotations/AAs/"). If the "--Annotate" function was run first, the files will be generated automatically.
Amino acids file (With ".faa" as the suffix) of each strain placed into a directory (default: ./Results/Annotations/AAs/). If the "--Annotate" function was run first, the files will be generated automatically.
Pair-end reads of all strains in a directory (default: ./Reads/Over/ under the working directory).
The full path of reference genome in fasta format or GenBank format (must be provided).
Genomes files (complete or draft) in a directory (Default: Results/Assembles/Scaf/Illumina under the working directory).
Multiple-FASTA sequences in a file, can be Protein, DNA and Codons.
Results/Assembles/Illumina/
Directories contain Illumina assembly files and information of each strain.
Results/Assembles/PacBio/
Directories contain PacBio assembly files and information of each strain.
Results/Assembles/Oxford/
Directories contain Oxford nanopore assembly files and information of each strain.
Results/Assembles/Hybrid/
Directory contains hybrid assembly files of the short reads and long reads of the same strain.
Results/Assembles/Scaf/Illumina
Directory contains Illumina contigs/scaffolds of all strains. "*.filtered.fas" is the genome after excluding short sequences. "*.prefilter.stats" describes the status of the genome before filtering, and "*.filtered.stats" describes the status of the genome after filtering.
Results/Assembles/Scaf/Oxford
Directory contains Oxford nanopore contigs/scaffolds of all strains.
Results/Assembles/Scaf/PacBio
Directory contains PacBio contigs/scaffolds of all strains.
Results/Annotations/*_annotation
directories contain annotation files of each strain.
Results/Annotations/AAs
Directory contain amino acids sequences of all strains.
Results/Annotations/CDs
Directory contain nucleotide sequences of all strains.
Results/Annotations/GFF
Directory contain the master annotation of all strains in GFF3 format.
Results/ANI/ANIs
The file contains comparation information of genome pairs. The document is composed of five columns, each of which represents query genome, reference genome, ANI value, count of bidirectional fragment mappings, total query fragments.
Results/ANI/ANIs.matrix
file with identity values arranged in a phylip-formatted lower triangular matrix.
Results/ANI/ANIs.heatmap
An ANI matrix of all strains.
Results/ANI/ANI_matrix.pdf
The heatmap plot of "ANIs.heatmap".
Results/MASH/MASH
The pairwise distance between pair genomes, each column represents Reference-ID, Query-ID, Mash-distance, P-value, and Matching-hashes, respectively.
Results/MASH/MASH2
The pairwise similarity between pair genomes, each column represents Reference-ID, Query-ID, similarity, P-value, and Matching-hashes, respectively.
Results/MASH/MASH.heatmap
A similarity matrix of all genomes.
Results/MASH/MASH_matrix.pdf
A heat map plot of "MASH.heatmap".
Results/CoreTrees/ALL.core.protein.fasta
Concatenated and aligned sequences file of single-copy core proteins.
Results/CoreTrees/faa2ffn/ALL.core.nucl.fasta
Concatenated and aligned sequences file of single-copy core genes.
Results/CoreTrees/ALL.core.snp.fasta
Core SNPs of single-copy core genes in fasta format.
Results/CoreTrees/ALL.core.protein.*.support
The phylogenetic tree file of single-copy proteins for all strains based on the best-fit model of evolution selected using BIC, AIC and AICc criteria.
Results/CoreTrees/faa2ffn/ALL.core.snp.*.support
The phylogenetic tree file of SNPs of single-copy core genes for all strains based on the best-fit model of evolution selected using BIC, AIC and AICc criteria.
Results/CoreTrees/"Other_files"
Intermediate directories and files.
Results/PanGenome/Pangenome_Pie.pdf
A 3D pie chart and a fan chart of the breakdown of genes and the number of isolates they are present in.
Results/PanGenome/pangenome_frequency.pdf
A graph with the frequency of genes versus the number of genomes.
Results/PanGenome/Pangenome_matrix.pdf
A figure showing the tree compared to a matrix with the presence and absence of core and accessory genes.
Results/PanGenome/Core/Roary.core.protein.fasta
Alignments of single-copy core proteins called by roary software.
Results/PanGenome/Core/Roary.core.protein.*.support
A phylogenetic tree of Roary.core.protein.fasta based on the best-fit model of evolution selected using BIC, AIC and AICc criteria.
Results/PanGenome/Other_files
see roary outputs.
*.COG.xml, *.2gi.table, *.2id.table, *.2Sid.table
Intermediate files.
*.2Scog.table
The super COG table of each strain.
*.2Scog.table.pdf
A plot of super COG table in pdf format.
All_flags_relative_abundances.table A table containing the relative abundance of each flag for all strains.
Results/Variants/directory-named-in-strains
directories containing substitutions (snps) and insertions/deletions (indels) of each strain. See Snippy outputs for detail.
Results/Variants/Core
The directory containing SNP phylogeny files.
PGCGAP is free software, licensed under GPLv3.
Please report any issues to the issues page or email us at liaochenlanruo@webmail.hzau.edu.cn.
If you use this software please cite: Liu H, Xin B, Zheng J, Zhong H, Yu Y, Peng D, Sun M. Build a bioinformatics analysis platform and apply it to routine analysis of microbial genomics and comparative genomics. Protocol exchange, 2020. DOI: 10.21203/rs.2.21224/v5
If you use "--Assemble", please also cite one or two of Fastp, ABySS, SPAdes, Canu, or Unicycler.
If you use "--Annotate", please also cite Prokka.
If you use "--CoreTree", please also cite CD-HIT, MAFFT, PAL2NAL, ModelTest-NG, RAxML-NG, and SNP-sites.
If you use "--Pan", please also cite Roary, MAFFT, ModelTest-NG, and RAxML-NG.
If you use "--OrthoF", please also cite OrthoFinder.
If you use "--ANI", please also cite fastANI.
If you use "--MASH", please also cite Mash.
If you use "--VAR", please also cite Sickle, Snippy, Gubbins, ModelTest-NG, RAxML-NG, and SnpEff.
If you use "--AntiRes", please also cite Abricate and the corresponding database you used: NCBI AMRFinderPlus, CARD, Resfinder, ARG-ANNOT, VFDB, PlasmidFinder, EcOH, or MEGARES 2.00.
If you use "--STREE", please also cite Muscle, Gblocks, and IQ-TREE.
Check the log file named in "strain_name.log" under Results/Variants/<strain_name>/ directory. If you find a sentence like "WARNING: All frames are zero! This seems rather odd, please check that ‘frame' information in your ‘genes' file is accurate." This is a snpEff error. Users can install JDK8 to solve this problem.
$conda install java-jdk=8.0.112
Click here for more solutions.
When running the Annotate function, this error could happen, the error message shows as following:
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: minced has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
[01:09:40] Could not determine version of minced - please install version 2.0 or higher
Users can downgrade the minced to version 0.3 to solve this problem.
$conda install minced=0.3
Click here for detail informations.
This error may happen when running function "VAR" on macOS. It is an error of openssl. Users can solve this problem as the following:
#Firstly, install brew if have not installed before
$ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
#Install openssl with brew
$brew install openssl
#Create the soft link for libraries
$ln -s /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib /usr/local/lib/
$ln -s /usr/local/opt/openssl/lib/libssl.1.0.0.dylib /usr/local/lib/
Click here for more informations
This warning may happen when running function "Pan". It is a warning of Roary software. The content of line 61 is "require Encode::ConfigLocal;". Users can ignore the warning. Click here for details.
V1.0.3
V1.0.4
V1.0.5
V1.0.6
V1.0.7
V1.0.8
V1.0.9
V1.0.10
V1.0.11
Optimized display of help information. Users can check parameters for each modulewith command "pgcgap [Assemble | Annotate | ANI | AntiRes | CoreTree | MASH | OrthoF | Pan | pCOG | VAR]", and can look up the examples of each module with command "pgcgap Examples". |
V1.0.12
V1.0.13