A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences. / Nørrevang, Anton Frisgaard; Shabala, Sergey; Palmgren, Michael.

In: BMC Genomics, Vol. 25, 26, 2024.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Nørrevang, AF, Shabala, S & Palmgren, M 2024, 'A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences', BMC Genomics, vol. 25, 26. https://doi.org/10.1186/s12864-023-09859-4

APA

Nørrevang, A. F., Shabala, S., & Palmgren, M. (2024). A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences. BMC Genomics, 25, [26]. https://doi.org/10.1186/s12864-023-09859-4

Vancouver

Nørrevang AF, Shabala S, Palmgren M. A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences. BMC Genomics. 2024;25. 26. https://doi.org/10.1186/s12864-023-09859-4

Author

Nørrevang, Anton Frisgaard ; Shabala, Sergey ; Palmgren, Michael. / A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences. In: BMC Genomics. 2024 ; Vol. 25.

Bibtex

@article{2fc83c905e564a0d802ab09dc6a8de31,
title = "A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences",
abstract = "Databases of genome sequences are growing exponentially, but, in some cases, assembly is incomplete and genes are poorly annotated. For evolutionary studies, it is important to identify all members of a given gene family in a genome. We developed a method for identifying most, if not all, members of a gene family from raw genomes in which assembly is of low quality, using the P-type ATPase superfamily as an example. The method is based on the translation of an entire genome in all six reading frames and the co-occurrence of two family-specific sequence motifs that are in close proximity to each other. To test the method{\textquoteright}s usability, we first used it to identify P-type ATPase members in the high-quality annotated genome of barley (Hordeum vulgare). Subsequently, after successfully identifying plasma membrane H+-ATPase family members (P3A ATPases) in various plant genomes of varying quality, we tested the hypothesis that the number of P3A ATPases correlates with the ability of the plant to tolerate saline conditions. In 19 genomes of glycophytes and halophytes, the total number of P3A ATPase genes was found to vary from 7 to 22, but no significant difference was found between the two groups. The method successfully identified P-type ATPase family members in raw genomes that are poorly assembled.",
keywords = "Fragmented genomes, Gene families, Halophytes, P-type ATPase, Plasma membrane H-ATPase, Two-motif",
author = "N{\o}rrevang, {Anton Frisgaard} and Sergey Shabala and Michael Palmgren",
note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",
year = "2024",
doi = "10.1186/s12864-023-09859-4",
language = "English",
volume = "25",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",

}

RIS

TY - JOUR

T1 - A two-sequence motif-based method for the inventory of gene families in fragmented and poorly annotated genome sequences

AU - Nørrevang, Anton Frisgaard

AU - Shabala, Sergey

AU - Palmgren, Michael

N1 - Publisher Copyright: © 2023, The Author(s).

PY - 2024

Y1 - 2024

N2 - Databases of genome sequences are growing exponentially, but, in some cases, assembly is incomplete and genes are poorly annotated. For evolutionary studies, it is important to identify all members of a given gene family in a genome. We developed a method for identifying most, if not all, members of a gene family from raw genomes in which assembly is of low quality, using the P-type ATPase superfamily as an example. The method is based on the translation of an entire genome in all six reading frames and the co-occurrence of two family-specific sequence motifs that are in close proximity to each other. To test the method’s usability, we first used it to identify P-type ATPase members in the high-quality annotated genome of barley (Hordeum vulgare). Subsequently, after successfully identifying plasma membrane H+-ATPase family members (P3A ATPases) in various plant genomes of varying quality, we tested the hypothesis that the number of P3A ATPases correlates with the ability of the plant to tolerate saline conditions. In 19 genomes of glycophytes and halophytes, the total number of P3A ATPase genes was found to vary from 7 to 22, but no significant difference was found between the two groups. The method successfully identified P-type ATPase family members in raw genomes that are poorly assembled.

AB - Databases of genome sequences are growing exponentially, but, in some cases, assembly is incomplete and genes are poorly annotated. For evolutionary studies, it is important to identify all members of a given gene family in a genome. We developed a method for identifying most, if not all, members of a gene family from raw genomes in which assembly is of low quality, using the P-type ATPase superfamily as an example. The method is based on the translation of an entire genome in all six reading frames and the co-occurrence of two family-specific sequence motifs that are in close proximity to each other. To test the method’s usability, we first used it to identify P-type ATPase members in the high-quality annotated genome of barley (Hordeum vulgare). Subsequently, after successfully identifying plasma membrane H+-ATPase family members (P3A ATPases) in various plant genomes of varying quality, we tested the hypothesis that the number of P3A ATPases correlates with the ability of the plant to tolerate saline conditions. In 19 genomes of glycophytes and halophytes, the total number of P3A ATPase genes was found to vary from 7 to 22, but no significant difference was found between the two groups. The method successfully identified P-type ATPase family members in raw genomes that are poorly assembled.

KW - Fragmented genomes

KW - Gene families

KW - Halophytes

KW - P-type ATPase

KW - Plasma membrane H-ATPase

KW - Two-motif

U2 - 10.1186/s12864-023-09859-4

DO - 10.1186/s12864-023-09859-4

M3 - Journal article

C2 - 38172704

AN - SCOPUS:85181238864

VL - 25

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 26

ER -

ID: 380151820