Manual

GenomeView is a stand-alone sequence browser specifically designed to visualize and manipulate a multitude of genomics data interactively. GenomeView enables users to dynamically browse high volumes of aligned short read data, with dynamic navigation and semantic zooming, from the whole genome level to the single nucleotide. At the same time, the tool enables visualization of whole genome alignments of dozens of genomes relative to a reference sequence. GenomeView is unique in it capability to interactively handle huge data sets consisting of dozens of aligned genomes, thousands of annotation features and millions of mapped short reads both as viewer and editor.

Installation and system requirements

System requirements
Java 1.6u10+ is required to run the application. You can get a recent version from http://www.java.com. It is recommended that you have 1 Gb of memory, but GenomeView will work with less. Similarly it is recommended to have a dual-core or better processor, but GenomeView will work with less.

Installation
The most straightforward way to start GenomeView is by Java Webstart.

Launch

Clicking the above button will immediately launch the application.

Local installation
You can download the latest version from http://sourceforge.net/projects/genomeview/

Unpack the zip file to a directory and start the genomeview-.jar file. You can start this file either by double clicking the jar file or by issuing the following instruction at the command line.
java -Xmx1g -jar genomeview-<version>.jar,
where <version> is replaced by the appropriate version number.

Supported platforms

This is a compatibility matrix of GenomeView with various OS/browser combinations. The Web Start version matrix only has the operating system as it is independent of the browser. The Applet version of GenomeView has many more combinations, not all of which work equally well.

The green means we have confirmed this combination to work. A red means we could not get it to work. Blanks in the table are combinations we haven't been able to test, your help is appreciated in filling the blanks.

GenomeView requires at least Java 1.6u10. This is a higher version than by default available on some platforms. In those cases you need to update Java first.

Webstart version

Only major names are included as we have not yet received reports of any recent unsupported versions or minor versions. If you come across any, please let us know.

Operating System Status
Windows
Linux
OS X


Applet version

There is a table per operating system as there are typically many flavors of each OS. If a version is missing, feel free to contact us and we'll add it. We try only to include major versions, but if some minor versions behave differently they will be included. For example Firefox 3.6.14 broke applet support completely.

Windows

Windows XP Windows Vista Windows 7
Firefox 3.6.15+
Firefox 4
Firefox 5
Firefox 6
Firefox 7
Firefox 8
Chrome 12
Chrome 13
Internet Explorer 7
Internet Explorer 8
Internet Explorer 9
Safari 5.0
Safari 5.1
Opera 11

Linux

Ubuntu 10.04
Lucid Lynx
Ubuntu 11.04
Natty Narwhal
Firefox 3.6.15+
Firefox 4
Firefox 5
Chrome 12
Chrome 13
Internet Explorer 7
Internet Explorer 8
Internet Explorer 9
Safari 5
Opera 11

Mac OS X

OS X 10.5.8
Leopard
OS X 10.6.8
Snow Leopard
OS X 10.7
Lion
Firefox 3.6.15+
Firefox 4 *
Firefox 5 *
Chrome 12 *
Chrome 13 *
Internet Explorer 7
Internet Explorer 8
Internet Explorer 9
Safari 5.0
Safari 5.1 *
Opera 11

*On Mac OS X, some browsers need a work-around to get the applet visible:

  1. Wait for GenomeView to load, even if it is blank
  2. Press CMD+Shift, en keep it pressed
  3. At the top of the applet there appears a bar
  4. Click the bar (you can now release CMD+Shift) and drag it outside the browser


Your assistance needed!

We need help to fill in the blanks.

To test the applet please go to http://genomeview.org/start/applet.html. If it starts GenomeView, you can load data from one of the demo instances and it shows the data. The GenomeView applet works on that OS/browser combination.

Send your experiences to support@genomeview.org and they will be included in the table.

Make sure you provide detailed version information for both the browser you tested and your operating system as well as your exact Java version.

If your experiences do not match with the table, let us know and we'll investigate further.

For Mac OS X testers: if the regular test fails, please test the following work-around and report those results as well:
Press CMD+Shift and the top bar of the applet appears in the browser. Drag the applet out of the browser. It should now paint correctly.

List of testers

Your name will appear here if you contribute to the compatibility matrix
Peter Sisk
Jon Goldberg
Ken Heyndrickx
Bram Verhelst
Michiel Van Bel

Navigation

Navigation can either be done with the buttons in the toolbar (arrows and magnifying glasses), with the navigator panel (ruler at the top with a block box), with the mouse or with your keyboard.

Navigator
Drag the edges to zoom in/out. Cursor will change to a resize icon when you are at an edge. Drag the box or the half-circle handles to move to another place in the genome. Click somewhere on the ruler to move to that place.

Mouse

Ctrl+scroll wheel
Zooming
Scroll wheel
Scroll through tracks
Drag left-right
Move left-right
Shift+ drag left-right
Select sequence

Keyboard

+/- keys
Zooming
Arrow keys left-right
Move left-right.

Keyboard shortcuts

Navigation
You can use the arrows on the keypad of your keyboard to move around in the evidence and structure panels. The + and - sign buttons can be used to zoom in and out.

Keypad 4
Moves the evidence and structure panel to the left
Keypad 6
Moves the evidence and structure panel to the right
Keypad +
Zoom-in in the evidence and structure panel
Keypad -
Zoom-out in the evidence and structure panel

Various keys

ctrl-C
Copy the selected sequence to the clipboard
ctrl-D
Load the contents of a directory as new entries
ctrl-E
Edit the selected feature
ctrl-F
Open the SearchView
ctrl-L
Load new entries from a file
ctrl-M
Merge two or more selected features
ctrl-N
Create a new feature from the selected sequence
ctrl-O
Load features from a file
ctrl-Q
Open the SequenceView
ctrl-S
Save
ctrl-U
Split a feature between two selected locations

Mouse shortcuts

Structure and evidence panel mouse actions

Left-clicking a feature
Select this feature
Double left-clicking a feature
Select this feature and center the view around it
Shift left-click a feature
Add this feature to the current selection
Shift left-click a selected feature
Remove this feature from the current selection
Mouse-wheel scrolling
Zoom-in and zoom-out
Right-click or mouse-wheel click
Show pop-up menu

Structure panel specific actions

Dragging the edge of an exon
Will modify the coordinate of the location you dragged
Dragging (not the edge of an exon)
Will select the region over which you dragged the mouse

Evidence panel specific actions

Dragging
Will move the panel in the direction you dragged

Note: dragging is holding the mouse button and moving the mouse.

Load data

Sources
You can load data from files from your computer, or you can load data from a file that lives on the internet by its URL.

File formats
GenomeView supports a whole list of file formats. See the data format page for a complete list. GenomeView tries to limit the listed files to the supported data formats. However, there are many extensions in use and your files may be hidden. In that case, select "all files" from the drop-down list.

Some file formats can/need to be preprocessed for optimal performance.

Opening a file
Steps to open a file:
File > Load data > Local file > pick your file

Note that you can select multiple files at once.

There are two sample data files attached to this page you could use. One file contains genomic sequence, the other one annotation for this sequence. The data represents the mitochondrial DNA of C. elegans (WS200).

The video's below show how to open data from a local file and from a URL.

Watch the video full screen for best quality

Watch the video full screen for best quality
Loading data from a URL

AttachmentSize
CHROMOSOME_MtDNA.fasta13.75 KB
MtDNA.gff3.35 KB

Data formats

Currently several commonly used formats are recognized by GenomeView. GenomeView using the identifiers present in each format to link different sources, so make sure that the identifiers match (case-sensitive).

Make sure you index your large files: reference genome, NGS data sets (SAM / BAM), annotation and read coverage plots While this is not required, it will speed up the process of browsing and loading data, as well as significantly reduce the amount of memory you need.

Input formats

Data type File format Index* Max size** Comments
unindexed*** indexed
Reference sequence fasta ¤ Recommended 50 Mb unlimited GenomeView will automagically create index for you if you don't have one.
embl, genbank Not possible 50 Mb -- EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.
Annotation gff ¤ Not recommended 50 Mb unlimited
embl, genbank Not possible 50 Mb -- EMBL and genbank are mixed file formats that can contain both annotation and reference sequence at the same time.
bed Not possible 50 Mb or less -- By default data from a bed file is added to the CDS track, if you want it in a different track, you have to add a line a the top of the file 'track name=Track_name'. No white-space is allowed in the track name.
ptt, tbl Not possible 50 Mb or less -- Other standard annotation formats GenomeView understands
various formats Not possible 50 Mb or less -- GenomeView can directly parse the output of the following programs: Blast, GeneMark, TransTermHP, FindPeaks, MaqSNP, tRNA-scan
Multiple genome alignment maf ¤ Recommended 100 Mb unlimited GenomeView will prompt you to create a compressed maf file and index it for you, if you're trying to load an unindexed maf file.
MAF is the recommended file format for whole genome alignemnt of large/complex genomes
multi-fasta ¤ Not possible 100 Mb -- Recommended for small/simple genomes with a near 1:1 relationship.
aln, ClustalW Not possible 100 Mb --
Sequence read alignment bam ¤ Required -- unlimited GenomeView will prompt you if there is no index and will create one for you.
MAQ, MapView, BroadSolexa Not possible 100 Mb --
Read coverage summary tdf ¤ Native unlimited unlimited TDF files can be created with the tdformat tool that is available for download.
bigwig Native unlimited unlimited This format can be used for any wig file, not just read coverage
pileup Required -- unlimited The pileup format becomes slow when you have extreme read depth (>5000 x coverage)
wig Not possible 50 Mb -- We strongly recommend to convert your wig files to bigwig or TDF. GenomeView can automatically convert wig files to TDF. Caveats: 'track' information should all be on a single line, 'browser' lines will be ignored as the are specific to the UCSC Genome Browser. WIG files need to be sorted by chromosome and by genomic coordinate within the chromosome. BedGraph as well as Wiggle_0 format is supported. For the wiggle_0 type, both variableStep and fixedStep should work.
Allele diversity summary pileup ¤ Required -- unlimited The pileup format becomes slow when you have extreme read depth (>5000 x coverage)

* Indicates whether this file format can/should be indexed.

** Recommended maximum file size. First value is without index, the second with index. This values are only guidelines. When loading multiple data sets, you should add the sizes.

*** Unindexed data files can be gzip compressed.

¤ Recommended file format for this data type.

Output formats

(Modified) annotations can be saved as either GFF or EMBL.

All data that is loaded can be exported in their original format. This will not include modifications.

Preparing fasta files

To be able to easily handle large vertebrate reference genomes, it is required that they are indexed. This can be done with the faidx command from the samtools package.

If you are also preparing HTS data sets in the BAM format, this step will also be part of that procedure, so either you move right to the short read preparation page or you can skip the step there whenever you're ready.

To index a fasta file you run

samtools faidx reference.fasta

Attention
If your files was reference.fasta, GenomeView will search for reference.fasta.fai in the same directory. If you want to be able to load large files, make sure those two files are correctly named and in the same folder.

You can download the samtools package from Sourceforge.

Preparing feature files

Large feature files need to be indexed before you can use them properly in GenomeView.

The definition of large is not strict in the sense that it depends on both the real size of the file, as well as the number of features in the file.

Recommendations:

Instructions:
To index a file, you need to pre-process it with tabix, much like is done with pile-up files.

Tabix can be downloaded from the tabix download page.

For BED formatted files:

sort -k1,1 -k2,2n input.bed | bgzip -c > compressed.bed.gz
tabix -p bed compressed.bed.gz

Note that indexing will not work with BED files that have a UCSC header ("track name=blah")

For GFF formatted files:

sort -T /group/tmp -k1,1 -k4,4n input.gff | bgzip -c > compressed.gff.gz
tabix -p gff compressed.gff.gz

In both cases, you will get two new files: (1) a gz file and (2) a tbi file.
Load the gz file in GenomeView.

Caveat:
The structure of genes will be lost when indexing gff files.

Preparing short read alignments

The best format to present short read alignments to GenomeView is the SAM/BAM format, which is emerging as the standard.

There are a number of tools available to convert the output from numerous aligners to SAM on the SAMtools website.

Once you have a SAM file you need to convert it to BAM and index it. Let us suppose you have a reference sequence called 'reference.fasta' and a read alignment in SAM format called 'alignment.sam'.

Steps to get from the various aligner formats to the SAM format are available on the SAMtools website.

Steps to go from SAM to indexed BAM.

samtools faidx reference.fasta
(will create reference.fasta.fai for the next step)

samtools view -bS -t reference.fasta.fai alignment.sam -o alignment.bam


samtools sort alignment.bam sorted
(will create sorted.bam)

samtools index sorted.bam
(will create sorted.bam.bai, which is read by GenomeView together with the bam file)

Preparing pileup

If you are looking for assistance to load your BAM file, see the short read alignment preparation page

Please see the description of the pile-up track for more information on what can be done with the pile-up track.

There are three formats supported for pileups. The first one is generated with a specific tool that is available from this page. The second one can be generated by samtools, the final one is a simple tab delimited file format. All are explained below, links to samtools and tabix can be found at the bottom of this page.

Important: TDF should not be indexed. The samtools pileup and tab delimited format MUST be indexed before GenomeView understands them.


TDF coverage plot (recommended, coverage only)

TDF is a tiled data format which contains the coverage plot, as well as multiple resolution summaries which allows fast retrieval at any scale.

Download the latest version of tdformat, a small program to generate TDF files from BAM files. The BAM file has to be indexed, i.e. there has to be a BAI file as well.

Once you've downloaded and extracted the program (you need at least the lib folder and the tdformat jar file) you can invoke it with the following commands:

java -Xmx1g -jar tdformat-1576.jar <path to your BAM file>

Replace 1576 with the version number of the file you downloaded.

For large genomes, mammalian genomes for example, you may need to increase the memory allotment for the program:

java -Xmx4g -jar tdformat-1523.jar <path to your BAM file>

The TDF format does not have to be indexed.


SAMTools pileup (includes diversity information, i.e. SNP track)

Note: file name extension should contain .pileup
The first step to be able to browse a pileup is to generate one from your BAM file.

samtools pileup -f reference.fasta sorted.bam >sorted.pileup

As you run this command, you'll see that the generated file can be huge, even for small BAM files.

To be able to browse it in GenomeView, it needs to be indexed with tabix, a tool that is also available from the SAMtools web page.


sort -k1,1 -k2,2n sorted.pileup | bgzip -c > compressed.pileup.bgz
tabix -s 1 -b 2 -e 2 compressed.pileup.bgz

Tab delimited pileup (extension should contain '.swig')

The file should be organized in four columns.
The first column holds the identifier of the sequence, the second column contains the genomic position, the third column contains the number of reads on the forward strand, the final column contains the number of reads on the reverse strand.

  1. Identifier
  2. Genomic position (one-based)
  3. # forward reads
  4. # reverse reads

Example:

chr1 11 46 43
chr1 12 47 50
chr1 13 48 61
chr1 14 53 79

Note that the white-space between the columns are tabs, one tab between each column.

Once you have such a file, you can again index it for faster access and shorter download times.

sort -T . -k1,1 -k2,2n filename | bgzip -c > filename.bgz
tabix -s 1 -b 2 -e 2 filename.bgz

Resources

Download samtools
Download tabix

User interface

This page introduces the different components of GenomeView and explains a number of naming conventions we use throughout the documentation.

Components of GenomeView
The GenomeView GUI is divided into two columns. The left side is a graphical representation of the data, while on the right side you can find additional information, controllers and options in the form of tables.

Visual description of the user interface (click to enlarge)Visual description of the user interface (click to enlarge)

Tracks

All visualizations in GenomeView are organized into tracks. A track typically holds on particular type of data or one particular data set. There can be multiple tracks of each type.
When loading new data, a new track is added.

On the right side of the window there is an overview of all tracks that are currently available.

You can reorder the tracks by dragging them up and down in this table, hide them by clicking the eye icon or remove them with the garbage bin icon.

Gene structure

Gene structure track

Gene structure track (click to enlarge)Gene structure track (click to enlarge)

This tracks shows a number of things, some of which only are visible when you are sufficiently zoomed in.
Things to know about this track:

  • This track is divided into two by a ruler which indicates the current genomic location.
  • Within this track, everything above the ruler is on the forward strand, anything below the ruler is on the reverse strand
  • Both the forward strand part and the reverse strand part have a nucleotide band and 3 possible translation frame bands.
  • In the default configuration, potential donor and acceptor sites will be painted in yellow and blue on the nucleotide band.
  • The six reading frames have potential start and stop codons indicated in green and red.
  • The light blue rectangles visualizes the structure of a gene in terms of strand and the phase of each exon.

Feature track

The feature track can display a multitude of annotation information, supplied as GFF or BED files. Features like CDS, RNA, SNP, etc... are displayed as rectangles in different colors. A triangle on one side can indicate the strand. When zoomed in enough, feature names are displayed when available.

Short read track

Short read are displayed in the Short read track as color boxes that are in some cases connected with pink lines. The pictures belows should give you an idea what the meaning is of the various visual clues.

Short read track, zooming in from left to rightShort read track, zooming in from left to right

Default color scheme

Color Description
Green Read mapped to the forward strand from a sense fragment in a PE library or from a single end library
Blue Read mapped to the reverse strand from a sense fragment in a PE library or from a single end library
Cyan Read mapped to the reverse strand from an anti-sense fragment in a PE library
Orange Read mapped to the forward strand from a anti-sense fragment in a PE library
Yellow Mismatch between the read and the reference, the read nucleotide will be shown when zoomed in
Red Gap/deletion in the read
Black Insertion in the read. Hover over them to see inserted bases.
Gray Insertion in the read that is a multiple of 3. Hover over them to see inserted bases.
Purple/Pink Connection between two reads from a paired-end library (thin line), or connection between parts of a single read aligned over a splice junction (thick line). Both the PE connections and splice junctions ones will be shown simultaneously in data sets that have that information.

Note that some older alignment software does not include the correct information in the BAM file and that the color scheme may be off for those files. Use common sense when interpreting results!

Overview of visual clues in the short read trackOverview of visual clues in the short read track

Hovering over reads shows detailed information about the readHovering over reads shows detailed information about the read

Pile up track

The pile up track can consists of to information parts. The first one, the coverage plot, is always present, the second, the SNP plot, is only displayed if the loaded data set has the required information.

Typically coverage-only data files are TDF files, while coverage+SNP files are prepared using samtools pileup. More information on preparing pile-ups

Pile up track overviewPile up track overview

Detailed description of component of the pile up track.Detailed description of component of the pile up track.

Multiple alignment track

Multi-fasta/ClustalW multiple alignment

Multi-fasta data can be displayed on three zoom levels.

  1. Zoomed out: Will show conservation plots.
  2. Medium zoom: Shows conservation as a gray scale. Gaps in the alignment are displayed in red.
  3. Zoomed in: Shows the individual nucleotides. Reference gaps are in yellow, alignment gaps in red . At the bottom of the track, the sequence logo for the aligment is shown.

MAF formatted multiple alignment
Details on the MAF format
Demo video showing the multiple alignment track.
Watch video full screen in HD mode for best quality, the video contains no sound

Multiple alignments can be displayed in three zoom levels.

The most detailed level shows mismatches and gaps for each alignment. Hovering over the track displays the names of the species on the left.

On the middle level, we can still hover the track to see the species. An alignment on the forward strand is drawn in green, one to the reverse strand in blue.

When we zoom even further out, the alignments are displayed in gray. The more species align to a certain part of the reference sequence, the longer the gray line. Individual species are not displayed anymore.

After this, zooming further out will not display alignments anymore because of performance reasons.

Color key:

Gray mismatch with reference
Red gap in alignment
Green Alignment to forward strand
Blue Alignment to reverse strand

Wiggle track

No screenshots or description available yet.

Plugins

Plugins are the basic extension mechanism to create new functionality for GenomeView, without modifying the core application.


Installing a plugin

Places to get data

Demo data

We provide a number of demo instances of GenomeView that come with data preloaded.

Preloaded instances

Furthermore we provide a significant number of pre-loaded GenomeView instances through the Genome Explorer.

Other data

Here is a list of a couple of places where you can find data that may be of interest. This data will generally require some processing to get it into one of the standard file formats.

Reference genomes and annotation

If you would like any of the genomes at UCSC or Ensembl included in the Genome Explorer, drop us an e-mail, we have scripts to automate the download and data massaging.
UCSC Genome Browser downloads
Ensembl downloads

Sequencing data sets

Sequence Read Archive (SRA) is a repository that stores raw sequencing data from next generation of sequencing platforms.
EBI Sequence Read Archive is the European cousin of the SRA.

Whole genome multiple alignments

The UCSC data repository has a large number of whole genome multiple alignments. Look under the heading 'Multiple Alignments' for each species.

Some examples:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/maf/ hg19 aligned to 45 vertebrates
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz44way/maf/ hg18 aligned to 43 vertebrates
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way/maf/ hg18 aligned to 27 vertebrates
http://hgdownload.cse.ucsc.edu/goldenPath/dm2/multiz15way/ dm3 aligned to 14 other insects