Currently, the cost of cellulase enzymes remains a key economic impediment to commercialization of biofuels (1.). Enzymes from glycoside hydrolase family 48 (GH48) are a critical component of numerous natural lignocellulose-degrading systems. Although computational mining of large genomic data sets is a promising new approach for identifying novel cellulolytic activities, current computational methods are unable to distinguish between cellulases and enzymes with different substrate specificities that belong to the same protein family. We show that by using a robust computational approach supported by experimental studies, cellulases and non-cellulases can be effectively identified within a given protein family. Phylogenetic analysis of GH48 showed non-monophyletic distribution, an indication of horizontal gene transfer. Enzymatic function of GH48 proteins coded by horizontally transferred genes was verified experimentally, which confirmed that these proteins are cellulases. Computational and structural studies of GH48 enzymes identified structural elements that define cellulases and can be used to computationally distinguish them from non-cellulases. We propose that the structural element that can be used for in silico discrimination between cellulases and non-cellulases belonging to GH48 is an ω-loop located on the surface of the molecule and characterized by highly conserved rare amino acids. These markers were used to screen metagenomics data for “true” cellulases.