The basic premise of genetic sequencing involves preparing a DNA sample into a form suitable for use on a DNA sequencer. Afterwards, the sequencer ascertains the sequences of bases on the preapred sample and stores these results into a digital file. These file formats are related to the sequencing methodology taken by the sequencer.
In 454 sequencing, the SFF format is the native currency of storing the sequence data; ABI-Sanger it is the AB1 or SCF chromatogram file format; Illumina/Solexa it is the QSEQ or Illumina FASTQ format; and in ABI-SOLiD it is the colorspace CSFASTA format.
Most scientists/biologists are more interested in the final sequence data produced rather than the particular vendor technology itself.
During the course of the biological investigation, one often is confronted with data from various sequencing platforms. A format is needed that is common across platforms. In the era of next-generation sequencing, it appears that the Sanger FASTQ format is the popular lingua franca of sequence file formats. It holds both the sequence and quality data generated by the sequencer. Many of the currently popular (and open-source) aligner and assemblers such as maq, bwa, bowtie, SSAHA2 and velvet accept Sanger FASTQ files as their inputs.
In the world of 454 sequencing, Roche 454 has their own set of tools to work with the data. Unfortunately, they are not freely available. While the 454 tools from Roche provide a way to convert their data into a FASTA file format, another device independent sequence file format; there is not a direct SFF to FASTQ conversion utility.
To that end, and for curiosities sake, I decided to write a program to do so, called sff2fastq. The idea is by no means unique. There are other similar tools such as flower (haskell-based) and sff_extract (python-based), and other alternative approaches as discussed on seqanswers. As they say, variety is the spice of life.