NextGENe is compatible with the Applied BioSystems SOLiD™ System,Roche Genome Sequencer FLX™ and Illumia Genome® Analyzer and is designed in a biologist friendly Windows® environment significantly reducing the need for additional bioinformatics resources and costs. NextGENe utilizes low cost desk 64-bit hardware configuration with a minimum of 8 GIG ram.
SNP & Indel Detection (SOLiD System & Illumina Genome Analyzer)
- 99% Accuracy in SNP Detection
- INDEL Detection (up to 30bp demonstrated)
- Easy Gene Annotation
- Simple Navigation within the Genome
- Rapid review of Variants, Nucleotides, and Amino Acids
- Quick Links to data bases
- Easy exporting
- Variant Reports
- Consensus Sequence
NextGENe is a unique tool for resequencing projects conducted on massively parallel sequencing systems, providing +99% Accuracy and INDEL detection across short fragments.
Compatible with outputs from the ABI SOLiD system, Illumina Genome Analyzer and Roche 454 sequencing technology features SoftGenetics’ Condensation Tool™( patent applied for) which statistically polishes short reads dramatically improving sequencing accuracy, identifying and grouping identical anchor sequences followed by elongation of the identical short reads by a factor of greater than 1.5 fold.
SNP Analysis

INDEL Detection

A comprehensive SNP & INDEL viewer is included in the NextGENe™ software suite for massively parallel sequencing systems. This new tool developed at the request and in collaboration with several prominent researchers permits easy gene annotation; simple navigation within the genome; rapid view of variations in nucleotides and amino acids; quick links to NCBI and dbSNP databases as well as the ability to view and export consensus sequence providing biologists with an easy-to-use interface for review of the massive amounts of data generated by the next generation sequencing systems.
The Condensation Tool statistically polishes sequence data removing errant base calls and other sequencing anomalies significantly increasing base call accuracy and variant detection, while the elongation process allows detection of micro INDELS across short fragments. NextGENe has demonstrated a 30bp INDEL detection capability in 36 bp reads.
SNP & INDEL detection of Pyrosequencing Reads for the Roche Genome Sequencer FLX™ System
The new NextGENe module specifically addresses the homopolymer related errors of the FLX system by utilizing the FLX’s high coverage to statistically polish and correct the inherent system errors. Additionally NextGENe’s exclusive alignment tool incorporates an automated flexibility tool that permits base pair mis-match between the sample sequences and reference in addition to the absolute alignment value. This allows NextGENe to accurately align reads with long INDELs and identify them as mutations.
Increased accuracy by further correction of the FLX SNP & INDEL data can be accomplished through cross platform alignments. The consensus FLX sequence can be easily compared to either Sanger and Illumina® Genome Analyzer sequence within NextGENe in order to further correct homopolymer related mis-calls and errant alignment. This new consensus can be saved as a reference file for future analyzes.
NextGENe’s SNP detection application for data from the Genome Sequencer FLX System is able to effectively identify single nucleotide polymorphisms (SNPs) by accurately aligning sample reads with a reference. Additionally, the NextGENe Alignment tool is designed to match the sequence reads to a user-defined annotated reference sequence. Multiple methods are available for aligning the reads to the reference. Once the reads have been aligned, SNPs and Indels are highlighted for quick identification.
The Sequence Alignment Tool also provides information about amino acid changes, exon-intron boundaries, and copy numbers, and assists with the determination of methylation sites. Interactive reports displaying the variations and statistics can be produced and exported. Variations identified by the software can be linked directly to NCBI dbSNP database.

NextGENe FLX SNP & INDEL viewer provides “at a glance” information on the analysis results. Software’s color-coding differentiates between Known and Novel variants, indicates depth of coverage, CDS region, as well as mRNA. The software also provides Gene Name annotation as well as resultant changes in Amino Acids
de novo Assembly
- Automatically forms anchor sequences
- Paired Read (Mate Pair) Assembly forms contigs greater than 100kb (data specific)
- Assembles Illumina Genome Analyzer, SOLiD System and Roche FLX data
- Completely automated, no script writing necessary
- Provides critical review documentation on assembly results
de novo sequence assembly with the short reads from the genome analyzers presents many challenges. With many of the current techniques, it is difficult to assemble the short reads into a large contig. These techniques often create many false alignments due to two major issues; short reads with high base calling errors and ambiguity within the genome. The short reads with SNPs and Indels are often discarded, which is problematic in the determination of copy number variations in applications such as chromatin immunoprecipitation (ChIP), gene expression and transcriptome studies. NextGENe sequence assembler was developed to solve the current problems. The software is able to assemble the short reads into contigs greater than 100kb (data specific). It uniquely aligns these contigs to a reference genome. The short reads used in the assembly of a contig are recorded to show the copy number and Indel positions. NextGENe is capable of detecting Indels of 1-30 bps.
de novo Assembly
NextGENe statistically polishes high coverage (20-100x) datasets to remove random sequencing errors and roughly double the read lengths with the use of the Condensation Assembly Tool (Patent Pending). Repeating the Condensation removes systematic errors and further lengthens the sequence reads. The polished and elongated reads can then be assembled into large contigs while removing redundant reads. The first step is utilizing the Condensation Assembly Tool to generate the first assembly. All of the reads with the same anchor sequence of 12 bps are collected into a cluster. The two shoulder sequences of 10 bps are used to sort the short reads into multiple groups. The consensus sequence in each group is obtained from the short reads. The ending bases are ignored from the consensus when the base has covered only one sequence read or inconsistency between multiple reads. The 5’ sequence has higher weight than that of 3’ end because of quality. With 50x coverage, confidence of the condensed sequence is about 99.8%. Then all of the possible anchor sequences with 16.7 million possibilities are calculated 
Condensation Assembly Tool elongated the 35 bp reads to approximately 60 bp while removing many of the random errors produced by the instrument.
Paired Read (Mate Pair) Assembly with NextGENe
NextGENe uses a de Bruijn graph method for assembly of paired read (mate pair) data from Next Generation Sequencers such as the SOLiD System and the Illumina Genome Analyzer (Solexa). This method involves using short words, not entire reads, as indexes to develop the graph which reduces redundancy. Reads are mapped as a path along the graph with nodes representing overlaps and arcs between nodes representing links. This assembly technique for paired reads (mate pairs) is able to accurately produce large contigs greater than 100 kbps from short next generation sequencing reads. 
NextGENe is able to produce large contigs, many between 1 kbps and 100 kbps, from paired read data. Additionally, several assembled contigs are generated that exceed 100 kbps. Results shown were obtained using genomic data from E. coli, which has a total genome size of roughly 4.6 Mbp.
The use of paired-end or mate-pair sequence reads is a valuable tool for constructing de novo assemblies from short sequence reads. Next Generation Sequencing platforms have allowed for sequencing paired reads in a shorter time span for lower cost. However, the volume of data produced in the form of short reads with high error rates presents a challenge for data analysis.
Paired read analysis involves the use of DNA fragments containing two regions of sequenced DNA separated by an unsequenced insert of known length. Paired reads enhance assembly of short reads by improving the specificity of the reads since single short (25-36bp) sequencing reads, as produced by next generation technologies, are not significantly unique in the genome for accurate assembly.
Transcriptome/ChIPSeq Analysis
- Reduced errors and better matching due to use of polished data.
- Condensation of sample reads
- Use any mRNA sequence database as reference
- Provides accurate copy number even in presence of several variations
- Expression ratio of multiple alleles differing by SNPs/Indels
Analyzing an organism’s transcriptome with the Next Generation Sequencing technology presents several challenges, including a high level of sequence variation to the reference genome due to SNPs/Indels, a single analysis often including multiple transcripts for each gene and high variability in expression rates. Short reads (25 to 35 bases) are not always unique, causing ambiguities between the various isoforms. In addition, high expression of some genes can mask genes of low expression levels.
By using the Condensation Tool, short reads are statistically polished and nearly doubled in length, allowing for noise and error to more reliably be filtered out. When using the Alignment Tool, the highly expressed sequences are matched to the reference. The low level reads, often mistaken for sequencing errors, are rescanned and matched to the reference allowing for more accurate detection of genes expressed at lower rates.
The results of the analysis can be saved as a reference file, allowing for direct comparison to the results from another analysis. This is a useful feature for comparison studies such as Chromatin Immunoprecipitation (ChIPSeq). 
High frequency variations between the transcriptome and the sample reads, automatically highlighted in blue, can easily be aligned to the reference. Reports are available for viewing, editing and exporting this information.
Digital Gene Expression Studies
- Expression Report (Gene count, ambiguities)
- New Genes are listed separately
- Coverage Plot
- Search Tool
- Display Biological Information for each tag
Gene expression studies are often currently analyzed using the technologies of microarray and DNA sequencing such as Serial Analysis of Gene Expression, or SAGE. In the microarray experiment, cDNA probes are hybridized to the sequence targets of the gene of interest on the microarray, where many probes of interests are located in different spots. The cDNA is labeled with a chromophore, and fluorescence intensity is proportional to the cDNA concentration of the probes. SAGE technology measures the counts of the sequence tags relative to the genes of interest. The SAGE tags are produced from the restriction enzymes cut to the cDNA with the poly-A end bounding to the biotin-labeled dT primer. The portion bound to the solid surface will be kept. The NlaIII restriction enzyme of SAGE targeting CATG, in addition to the techniques such as MicroSAGE, LongSAGE, RL-SAGE, SuperSAGE and more offer powerful solutions to read the absolute expression number by counting the tags.
The next generation DNA sequence technologies generate millions to hundreds of millions of the short sequence reads. Illumina® Genome Analyzer utilizing the Solexa sequencing technology uses PCR on a surface and the Applied Biosystem SOLiD™ System uses emulsion PCR and sequencing by ligation. Both of these systems can produce the short reads ideal for analyzing gene expression.
NextGENe software package takes full advantage of the short sequencing reads and has tools for analyzing the SAGE tags. SAGE Libraries are available that contain lists of sequence tags associated with particular genes. NextGENe can load these libraries as a reference and align the sequence reads to the appropriate sequence tags. The alignment to the tag library is only performed in the forward orientation of the sequences, no reverse complementation is implemented. Digital gene expression reports are created to show the sequence of each tag, the coverage, gene names, and the location in the genome. New gene tags that are not in the library are also reported.

The Sequence Alignment Tool has a Whole Genome View at the top of the screen, which shows each sequence of the library. Mousing over the library activates a yellow box containing the biological information for the tag that is currently at the cursor. The bottom of the screen contains all reads as they have been aligned to the library.

NextGENe produces a chart with the sequence tag number on the x-axis and coverage of each tag on the y-axis. Most tags are expressed less than 500 times, but several genes show very high expression levels. Positions on this chart after 23K are new genes that have been added to the reference file because the sequence was found many times. Several of these new sequences were found in the project with expression levels above 4000.
Small RNA Quantification and Discovery
Next Generation sequencing technologies such as the Applied Biosystems SOLiD™ System Illumina® Genome Analyzer (Solexa), and the Genome Sequencer FLX System from Roche Applied Science (454 Life Sciences) present promising opportunities for evaluating the expression of known small RNAs as well as revealing novel small RNAs. However, the volume of data and high error rates of these systems require efficient and effective software for analysis.
Your browser may not support display of this image.NextGENe’s small RNA analysis tool can be used to determine expression levels of known small RNAs as well as for discovery of novel small RNAs. Reads are aligned to a whole reference genome to determine transcript locations. Regions of high coverage are used to indicate transcript regions. These regions of the genome can be saved and used as a reference transcript sequence. Samples are then aligned to the transcript reference and coverage counts are made for each transcript.

After using Peak Detection Tool, Sequence Alignment window displays brown ticks to indicate regions that meet transcript requirements. This figure shows two small RNA transcripts located within Gene 6241476. Blue arrows are used to indicate gene locations in the reference file. The green and gold arrows below gene indicator identify the mRNA and coding sequence respectivley.

Once NextGENe completes aligning sample file(s) to the transcript reference file, the results are shown in the sequence alignment window which provides a graphic representation of expression levels for each transcript. Red lines indicate transcript boundaries. Sequence reads that align with each transcript are shown beneath where they align. Gray bars indicate coverage (expression level).

The Expression report displays quantitative information about each segment (transcript) including its length, the maximum and average count numbers and the read counts. |