sff2fastq

The basic premise of genetic sequencing involves preparing a DNA sample into a form suitable for use on a DNA sequencer.  Afterwards, the sequencer ascertains the sequences of bases on the preapred sample and stores these results into a digital file.  These file formats are related to the sequencing methodology taken by the sequencer.

In 454 sequencing, the SFF format is the native currency of storing the sequence data; ABI-Sanger it is the AB1 or SCF chromatogram file format; Illumina/Solexa it is the QSEQ or Illumina FASTQ format; and in ABI-SOLiD it is the colorspace CSFASTA format.

Most scientists/biologists are more interested in the final sequence data produced rather than the particular vendor technology itself.

During the course of the biological investigation, one often is confronted with data from various sequencing platforms.  A format is needed that is common across platforms.  In the era of next-generation sequencing, it appears that the Sanger FASTQ format is the popular lingua franca of sequence file formats.  It holds both the sequence and quality data generated by the sequencer.  Many of the currently popular (and open-source) aligner and assemblers such as maq, bwa, bowtie, SSAHA2 and velvet accept Sanger FASTQ files as their inputs.

In the world of 454 sequencing, Roche 454 has their own set of tools to work with the data.  Unfortunately, they are not freely available.  While the 454 tools from Roche provide a way to convert their data into a FASTA file format, another device independent sequence file format; there is not a direct SFF to FASTQ conversion utility.

To that end, and for curiosities sake, I decided to write a program to do so, called sff2fastq.  The idea is by no means unique.  There are other similar tools such as flower (haskell-based) and sff_extract (python-based), and other alternative approaches as discussed on seqanswers.  As they say, variety is the spice of life.

About these ads
This entry was posted in biology, dna sequencing, software. Bookmark the permalink.

6 Responses to sff2fastq

  1. James Casbon says:

    Thanks for this I find it very useful. Can you state in the README exactly which FastQ scores you are using?

  2. Dan says:

    Hi,

    I created an entry for sff2fastq here:

    http://seqanswers.com/wiki/Sff2fastq

    All the best,
    Dan.

  3. Daniel Brami says:

    Thanks for this utility – its a great time-saver!

  4. fadista says:

    Does your sff2fastq handles demultiplexing? Thanks.

  5. Dan says:

    Does what it says on the box. Re: demultiplexing, GALAXY and Geneious both do this OK in my experience.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s