From: Linear normalised hash function for clustering gene sequences and identifying reference sequences from multiple sequence alignments