A pan-cetacean MHC amplicon sequencing panel developed and evaluated in combination with genome assemblies

The major histocompatibility complex (MHC) is a highly polymorphic gene family that is crucial in immunity, and its diversity can be effectively used as a fitness marker for populations. Despite this, MHC remains poorly characterised in non-model species (e.g., cetaceans: whales, dolphins and porpoises) as high gene copy number variation, especially in the fast-evolving class I region, makes analyses of genomic sequences difficult. To date, only small sections of class I and IIa genes have been used to assess functional diversity in cetacean populations. Here, we undertook a systematic characterisation of the MHC class I and IIa regions in available cetacean genomes. We extracted full-length gene sequences to design pan-cetacean primers that amplified the complete exon2 from MHC class I and IIa genes in one combined sequencing panel. We validated this panel in 19 cetacean species and described 354 alleles for both classes. Furthermore, we identified likely assembly artefacts for many MHC class I assemblies based on the presence of class I genes in the amplicon data compared to missing genes from genomes. Finally, we investigated MHC diversity using the panel in 25 humpback and 30 southern right whales, including four paternity trios for humpback whales. This revealed copy-number variable class I haplotypes in humpback whales, which is likely a common phenomenon across cetaceans. These MHC alleles will form the basis for a cetacean branch of the Immuno-Polymorphism Database (IPD-MHC), a curated resource intended to aid in the systematic compilation of MHC alleles across several species, to support conservation initiatives.

The dataset contains 85 fastq files. Each file contains reads of amplicons from five MHC loci (DQA, DQB, DRA, DRB, and class I genes) combined across separate sequencing runs from a single cetacean. Details on individual cetacean sample abbreviations can be found in the manuscript. Reads are paired and merged with the Illumina adapter removed.

It also contains one fastq file with all class I alleles found and one fastq file with non-functional DRB alleles found. Alleles are labeled with four letter species abbreviation followed by locus designation (DRB or N for class I) and are numbered in the order they were discovered.

Further details are provided at: Heimeier, D., Garland, E. C., Eichenberger, F., Garrigue, C., Vella, A., Baker, C. S., & Carroll, E. L. (2024). A pan-cetacean MHC amplicon sequencing panel developed and evaluated in combination with genome assemblies. Molecular Ecology Resources, 00, e13955. https://doi.org/10.1111/1755-0998.13955

GET DATA: https://doi.org/10.5061/dryad.wh70rxwvb

Data and Resources

This dataset has no data

Additional Info

Field Value
Theme
Author
Maintainer
Maintainer Email d.heimeier@auckland.ac.nz
Update frequency Irregular
Source
Source Created 2024-02-26T00:00:00
Source Modified 2024-05-08T02:32:37
Language English
Spatial {"type": "Polygon", "coordinates": [[[-168.73, -66.8606], [175.0303, -66.8606], [175.0303, -53.8196], [-168.73, -53.8196], [-168.73, -66.8606]]]}
Source Identifier 5c43c5d8-f6f7-438d-991b-48c78aee8e4c
Dataset metadata created 25 March 2025, last updated 25 March 2025