A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 1,82 MB, PDF-dokument

Databases of genome sequences are growing exponentially, but, in some cases, assembly is incomplete and genes
are poorly annotated. For evolutionary studies, it is important to identify all members of a given gene family
in a genome. We developed a method for identifying most, if not all, members of a gene family from raw genomes
in which assembly is of low quality, using the P-type ATPase superfamily as an example. The method is based
on the translation of an entire genome in all six reading frames and the co-occurrence of two family-specific
sequence motifs that are in close proximity to each other. To test the method’s usability, we first used it to identify
P-type ATPase members in the high-quality annotated genome of barley (Hordeum vulgare). Subsequently, after suc-
cessfully identifying plasma membrane H+
-ATPase family members (P3A ATPases) in various plant genomes of varying
quality, we tested the hypothesis that the number of P3A ATPases correlates with the ability of the plant to tolerate
saline conditions. In 19 genomes of glycophytes and halophytes, the total number of P3A ATPase genes was found
to vary from 7 to 22, but no significant difference was found between the two groups. The method successfully iden-
tified P-type ATPase family members in raw genomes that are poorly assembled.
OriginalsprogEngelsk
Artikelnummer26
TidsskriftBMC Genomics
Vol/bind25
Antal sider13
ISSN1471-2164
DOI
StatusUdgivet - 2024

Bibliografisk note

Funding Information:
Open access funding provided by Copenhagen University The research was supported by Novo Nordisk Fonden (NovoCrops; M.P.), Carlsbergfondet (RaisingQuinoa; M.P), Australian Research Council (S.S.), and the National Natural Science Foundation of China (S.S).

Publisher Copyright:
© 2023, The Author(s).

ID: 380151820