Functional genomics — putting biological meaning to genomics sequence — is the ultimate goal of genome sequencing projects. Here, as elsewhere, the results you obtain can reflect less what you do than the way that you do it. The consequences of using different computational programs to predict gene structure, for example, are well exemplified by the wide range in estimates of the total number of human genes (30,000–100,000). In his article, Michael Zhang reviews the gene-prediction algorithms that are available for analysing the genome sequence of complex organisms, and their theoretical basis. Ideally, an algorithm should predict, from sequence alone, every feature of a gene and distinguish it from, say, a pseudogene or two overlapping genes. But, at present, gene prediction can only be done by combining ab initio computational analysis with EST and cDNA sequences, and with genome data from other organisms. It is therefore a pity that, according to Zhang, this is right where gene prediction hits a functional genomics bottleneck — so much genomic sequence, yet so little functional data with which to build and verify the performance of new gene-prediction algorithms.

One organism to which this certainly does not apply is Saccharomyces cerevisiae. Many large-scale functional genomics projects have been carried out to catalogue, for example, RNA expression profiles and protein–protein interactions. But the information extracted from these genomic data sets is only as reliable as its source. As Björn Grünenfelder and Elizabeth Winzeler discuss in their review, inconsistencies often arise when independently generated data sets are compared — whether due to experimental error or natural genetic variation. And if the problem of reproducibility exists for this simple yeast, then the functional analysis of our own genome is unlikely to be a smooth ride.