MochiView

  

Johnson Lab: UCSF

Overview

The two key pieces of data needed to get started with your genome (AKA Sequence Set) of choice are:

[1] Chromosome/contig sequences: These must be in FASTA format (though GFF files with FASTA at the end are allowed).  Consult the section of the manual titled Import->Sequence Set->Format: FASTA’ for additional details.

[2] Gene data: Genes can either be represented in MochiView’s custom format (‘Import->Location Set (genes)->Format: MochiView’ ) or in GFF version 3 files (‘Import->Location Set (genes)->Format: GFFv3′ ).  The latter requires a very specific type of organization for the gene entries, and may require some tweaking prior to import.

Downloads

The MochiView website does not currently host any genome/gene files, but here we provide links to files for a few different genomes to get you started.

Drosophila melanogaster
FlyBase downloads

This page contains a download link for a file called dmel-all-no-analysis-r5.24.gff.gz (the r#.## portion may change). The file is compressed… this is OK, you can leave it that way.  Follow the instructions outlined below for Saccharomyces cerevisiae to get up and running (except ignore the comment about the checkbox).

Saccharomyces cerevisiae
Saccharomyces Genome Database ftp directory

This directory contains a file called saccharomyces_cerevisiae.gff that contains everything you need to import both the genome sequence and genes.  First, import the sequences using Import->Sequence Set->Format: FASTA’ (the gff file contains the FASTA sequence of the genome at the end).  Next, import the genes using Import->Location Set (genes)->Format: GFFv3′.  This import utility will contain a checkbox with text that includes “check if SGD/CGD file!”.  Check this box, or gene sub-feature information (e.g. introns) will not be extracted from the file.  Finally, if you want to create Location Sets from other genomic features such as centromeres or repeat regions, you can extract these Locations from the file using Import->Location Set->Format: GFFv3 (by Type)’.

Candida albicans
Candida Genome Database ftp directory

This directory contains a file called candida_21_with_chromosome_sequences.gff.gz that contains everything you need to import both the genome sequence and genes.  The file is compressed… this is OK, you can leave it that way.  Follow the instructions outlined above for Saccharomyces cerevisiae to get up and running.

Aspergillus Nidulans
Aspergillus Genome Database ftp directory

This directory contains a file called A_nidulans_FGSC_A4_version_s03-m05-r04_features_with_chromosome_sequences.gff.gz that contains everything you need to import both the genome sequence and genes.  The file is compressed… this is OK, you can leave it that way.  Follow the instructions outlined above for Saccharomyces cerevisiae to get up and running.

Homo sapiens
Human_Genome_Guide.pdf; Link to chromosome FASTA (NCBI); Gene import file; Warnings file

Properly formatting a gene file for the human genome is quite time consuming, so I have prepared a ready-made file for use with the most recent genome assembly (GRCh37).  The steps required to acquire and install all of the necessary files are provided in the Human_Genome_Guide.pdf file.  In a few cases, workarounds were necessary to reconcile either irregularities in the available gene data or create a workaround for limitations in the MochiView gene format.  These are also explained in detail in the guide, and the specific loci involved are listed in the Warnings file.  Both the Warnings and Genes files are archived (zipped or gzipped) tab-delimited files… you can extract and view/adjust them if desired.

Mus musculus
Mouse_Genome_Guide.pdf; Link to chromosome FASTA (NCBI); Gene import file; Warnings file

See the description for Homo sapiens for details.  The genome used is Build 37 of reference assembly C57BL/6J.

Rattus norvegicus
Rat_Genome_Guide.pdf; Link to chromosome FASTA (NCBI); Gene import file; Warnings file

See the description for Homo sapiens for details.  The genome used is reference assembly RGSC v3.4.

Other genomes
Support for additional genomes will be added here in the future.  The MochiView tutorial contains a section titled ‘Creating Your Own Database (overview)’ that will give you tips on getting started.  If you need some advice or have a file format that you would like to see supported, feel free to contact me ([CONTACT & MAILING LIST]).  If I’m too busy to help, I’ll let you know, but it never hurts to check :).

Contributions are Welcome!

Please contact me ([CONTACT & MAILING LIST]) if you would like to contribute files (or tips on obtaining the necessary files) for your genome of choice.