123 The Revolution of Hash Databases in cgMLST

21.3.2024

Micro binfie podcast

0:00

17:42

In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics. The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources. Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions. The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases. Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types. Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics. This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide.

Flere episoder fra "Micro binfie podcast"

Flere episoder

Få adgang til hele det store podcastunivers med gratisappen GetPodcast.

Abonnér på dine favoritpodcasts, lyt til episoder offline, og få spændende anbefalinger.

123 The Revolution of Hash Databases in cgMLST

Micro binfie podcast

Flere episoder fra "Micro binfie podcast"

138 Decoding Tetracycline Resistance in MRSA

137 Contagion part 2

135 Assembling ATCC bacterial strains

136 Contagion part 1

134 Diving into MRSA, Genomics, and Public Health

133 The Role of Bioinformatics in Public Health and Disease Outbreaks

132 Unlocking the Secrets of Antimicrobial Resistance in Metagenomes

131 Bioinformatics Evolution: Torsten Seemann on Snippy, Open-Source Support, and Global Genomics

130 Exploring Genomic Innovation and Machine Learning in Public Health

129 Genomics on the Frontier: Bactopia, Bioinformatics, and Pathogen Surveillance with Robert Petit