Expansion and collapse of VEGF diversity in major clades of the animal kingdom

In this study, we have identified homologs of PDGFs/VEGFs in most animal phyla that show tissue organization (i.e., excluding sponges) and for which more than a few genome assemblies and gene predictions exist. Despite their pervasive occurrence in many branches of deuterostomes and protostomes, the data clearly support the notion that some animal phyla are completely or partially devoid of PDGF/VEGF-like molecules, and this might, above all, apply to clades with secondarily reduced body plans like Tunicata, or for the phyla Xenacoelomorpha or Dicyemida. Especially for many of the protostome phyla, not much genomic or mRNA data is available. Genome assemblies are often lacking, and the available prediction algorithms might not be very reliable as these animals are rarely the subject of genomic research. For these phyla, the lack of PDGF/VEGF-like proteins is a provisional hypothesis.

After hypothetical proteins are predicted from genomic sequences, programmatic bioinformatics workflows typically assign them to homology groups (e.g., using PANTHER [72]), resulting in automatic annotation like PREDICTED, VEGF-C. Despite this approach, many PDGF/VEGF homologs fail to be programmatically categorized into one of the 10 ortholog groups (VEGF-A, PlGF, VEGF-B, VEGF-C, VEGF-D, VEGF-F, PDGF-A, PDGF-B, PDGF-C, PDGF-D). Our algorithm emulates crowd-sourcing by comparing uncategorized homologs to the most closely related manually and programmatically annotated PDGFs/VEGFs and establishes a majority opinion, allowing for the categorization of the majority of uncategorized vertebrate proteins into one of the ortholog groups. This crowd-sourcing combines human annotation of gene and protein records with the tree-building and clustering methodology used for the protein trees available at Ensemble (https://m.ensembl.org/info/genome/compara/homology_method.html), which are used for the automatic annotation.

Based on the phylogenetic tree of the animal kingdom and our analysis of PDGF/VEGF homologs in different animal clades, the emergence of the earliest PDGF/VEGF-like molecule (“proto-PDGF/VEGF”) predates the establishment of the bilaterian body plan [73] and the split of the animal kingdom into deuterostome and protostome organisms before the start of the Cambrian about 540 MYA [74]. Intriguingly, this “proto-PDGF/VEGF” most likely featured a domain structure characteristic for the modern lymphangiogenic VEGF-C/VEGF-D subclass, having long N- and C-terminal extensions flanking the VHD and a characteristic repetitive cysteine residue pattern (the BR3P repeat) at its C-terminus. Concurrent with the relatively rapid evolution of different body plans during what has been termed the “Cambrian Explosion,” the proto-PDGF/VEGF undergoes diversification establishing a VEGF-A-like and a VEGF-C-like branch and spinning off the PDGF branch. The deuterostome/protostome split predates both the VEGF diversification and the PDGF spinoff, which explains the difficulty of classifying PDGF/VEGF-like molecules on the protostome branch (“Drosophila VEGFs,” “C. elegans VEGF-C”) as either PDGFs or VEGFs.

In the protostome branch, we can detect PDGF/VEGF-like factors (PVFs) in all but one clade for which substantial sequencing data are available. All 26 genome-sequenced flatworms seem to get along without any PVFs. Contrary to this, insects, mollusks, and segmented worms (Annelida) often feature more than one Pvf gene, whereas most nematodes feature only one. C. elegans PVF-1 is remarkable in that it is still—after more than 500 MYA of evolutionary separation—able to activate the human VEGF receptors-1 and -2 [38]. Protostome animals do not feature a cardiovascular system, with exceptions among mollusks and segmented worms (Annelida). The phylum Annelida contains species such as the earthworm Lumbricus terrestris, which features a closed circulatory system displaying vascular specialization and hierarchy, including a vessel-like heart with valves, larger blood vessels, capillaries, blood containing free hemoglobin, and a renal filtration system [75]. Until now, no protostome PVF has been shown to play any role in vascular development, and a unifying picture of PVF function in invertebrates has yet to emerge.

Whether this functional diversity reflects an absence of a cardiovascular system in the last common ancestor between vertebrates and invertebrates is unclear. A possible cardiovascular system in the last common ancestor did not necessarily require the existence of endothelial cells since the annelid cardiovascular system does not feature endothelial cells either [76]. In such a scenario, the cardiovascular system of vertebrates and invertebrates would be homologous, but endothelial cells would represent convergent evolution. However, some invertebrates feature endothelial-like cells [77, 78], and octopuses have molecular makeup surprisingly similar to vertebrates (e.g., VEGF receptors, Notch) [79]. Even the regulation of blood vessel formation in leeches was claimed to be inducible by human VEGF-A, arguing that perhaps even the endothelial cell was a feature of the last common ancestor of vertebrates and invertebrates [80]. In the invertebrate lineage, ECs are speculated to have developed from hemocytes [76], and also in vertebrates, such origin seems likely due to the close relationship of the hematopoietic and endothelial cell lineages, both featuring VEGF receptors and a common developmental origin [31, 81].

With PlGF, VEGF-B, VEGF-D, and VEGF-F, further specializations happen in both the VEGF-A and the VEGF-C lineage after the Cambrian period in the deuterostome branch. Instrumental for this specialization is most likely the VGD2, which doubled the number of PDGF/VEGF-like genes. When ignoring teleost fishes, whole-genome duplications are responsible for about half of the newly emerging PDGFs/VEGFs, the other half requiring duplication events at the gene or chromosome level (Fig. 4).

Echinodermata are the most simple animals that display PDGF/VEGF specialization at the gene level. The least complicated explanation for the existence of both VEGF-A-like and VEGF-C-like proteins in Echinodermata is that the first gene duplication of the proto-PDGF/VEGF happened prior to the VGD1. For the same reason, the separation of the PDGF lineage also likely predates the VGD1, resulting in PDGFs being present in cephalochordates (lancelets), which are the most simple organisms to have a pressurized vascular system in which the blood is moved around by peristaltic pressure waves created by contractile vessels [82]. In line with this notion is the important role of PDGFs in the supportive layers that stabilize blood vessels (pericytes, smooth muscle cells) [83].

Despite the availability of genome assemblies from six different tunicate species, the programmatic approach identified only one VEGF-like molecule in tunicates, Ciona intestinalis. This was surprising since there is prior data indicating the role of VEGF/VEGFR signaling in the circulatory system of tunicates [84]. However, the relationship between the receptor tyrosine kinase that was cloned from the tunicate Botryllus schlosseri and PDGF receptors, FGF receptors, and c-kit is not clear. The immunological detection used antibodies directed against human VEGFRs or VEGF-A, and the TK inhibitor PTK787 might have as well inhibited PDGF receptors and c-kit. Although the C. intestinalis VEGF-like growth factor appears most similar to VEGF-A, its phylogeny also allows descendence from the PDGF or VEGF-C branch. In that case, the similarity to VEGF-A might have originated from convergent evolution (“long branch attraction”). Since none of the other tunicate species seems to feature PDGF/VEGF homologs, horizontal gene transfer could be an alternative explanation.

Absence of PlGF and VEGF-B from entire animal vertebrate classes, VEGF-F more common than expected

Most striking was the absence of some VEGF family members from entire animal classes. We did not find any PlGF ortholog in Amphibia and also no VEGF-B ortholog in the clade Archosauria, which includes extant birds and crocodiles as well as extinct dinosaurs. Since bony fish feature both PlGF and VEGF-B, the absence of VEGF-B in extant Archosauria and PlGF in extant Amphibia likely represents an example of lineage-specific gene loss. While five avian protein sequences are annotated as “VEGF-B” or “VEGF-B-like” in the searched database, they did separate on a phylogenetic tree to the same branch as the VEGF-A sequences (data not shown). In addition, their genes did not show the typical exon–intron structure, which is characteristic of VEGF-B with overlapping open reading frames leading to two different protein sequences due to a frameshift [85].

As a counterpoint to the missing PlGF and VEGF-B, we observed, to our surprise, that VEGF-F is more common than generally thought. Discovered as a venom compound of vipers [21, 22], it was initially thought to be limited to venomous reptiles. However, we did detect VEGF-F-like sequences in non-venomous lizards and gekkos. VEGF-F is more or less pervasive throughout large parts of the lepidosaurian lineage, with occurrences in species so diverse as the Green anole and the Japanese gekko (which are located distantly from each other on the lepidosaurian tree). For this reason, VEGF-F likely evolved early on in the evolution of Lepidosauria prior to the invention of venom (Supplementary file1, Fig. S3). At this moment, it is unclear which functions VEGF-F might have originally fulfilled before it became co-opted as an integral viper venom component. In vipers, VEGF-F expression is highly restricted to the venom glands [86], where it acts by accelerating venom spread by inducing vascular permeability and by incapacitating the prey by lowering blood pressure. However, it is conceivable that VEGF-F still fulfills its original, non-venom function in the non-viper branch of the VEGF-F tree.

Complete absence of PDGFs/VEGFs

The absence of individual PDGFs/VEGFs from a species’ proteome can be either real or only apparent due to incomplete sampling or an artifact of the bioinformatics analysis pipeline. We generally found very few exceptions to the clade-specific pattern of PDGF/VEGF occurrence in terrestrial vertebrates, which all featured the same set of PDGF/VEGF paralogs, confirming the reliability of the respective genome sequencing and gene prediction pipelines. In our programmatic screen, PDGF/VEGF-like sequences were apparently completely absent in some clades for two different reasons:

1.

A lack of data (false negatives): PDGFs/VEGFs were apparently absent from clades where there was no comprehensive genomic data, or the genomic data had not been analyzed (e.g., sea spiders or velvet worms).

2.

A true absence: PDGFs/VEGFs were absent from clades that are likely truly devoid of VEGF-like molecules (e.g., flatworms, where a substantial number of genomes have been sequenced and analyzed).

With increasing sequencing coverage, false negatives will disappear, as has happened for Cyclostomata during the writing of this manuscript. The occurrence of four PDGF/VEGF-like genes in Cyclostomata supports the currently largely accepted Early-1R hypothesis (i.e., that the VGD1 happened before the divergence of Cyclostomata). While recent data suggest an early hexaploidization event for the Cyclostomata branch [87], we did not find evidence for more than four PDGF/VEGF genes in any cyclostomate genome.

Viral VEGFs

While many viruses indirectly induce angiogenesis [88], some viruses encode their own VEGF homologs. These proteins have been collectively termed “VEGF-E.” Viral VEGFs have been reported from parapoxviruses, which cause skin lesions in their respective mammalian hosts [89,90,91]. In these viruses, the VEGF-E gene is specifically responsible for swelling and vascular proliferation [92]. Based on the sequence homology to VEGF-A, VEGF-E is believed to have been captured from a host during viral evolution [90], similar to the v-sis oncogene, which is believed to be derived from captured host PDGF-B sequences [25]. Our database search confirms that viral VEGF homologs exist not only in four species of the parapoxvirus genus but also in the very distantly related megalocytiviruses, which infect fish [93]. Surprisingly, despite their non-overlapping host range, both the fish and mammalian viral VEGF-Es might originate from one single acquisition from a mammalian host. Unlike megalocytiviruses, which infect fish (and occasionally amphibians), parapoxviruses have a very broad mammalian host range, which occasionally includes humans but is mostly covering domesticated and wild ungulates [94, 95]. While parapoxvirus infections are typically self-limiting, megalocytiviruses cause considerable economic damage to aquaculture. The pathophysiology of megalocytiviral diseases is not well understood. The infection leads to perivascular cell hypertrophy [96], and VEGF-E might facilitate virus dissemination via increasing vascular permeability.

The “silk homology” domain (SHD)

Aligning the VEGFs’ accessory domains is non-trivial as they contain a variable number of repeats. Especially the evolutionary history of the SHD of VEGF-C and VEGF-D is perhaps impossible to deduce with reasonable accuracy because it consists of several complete and incomplete Balbiani ring-3 protein (BR3P) repeats. The C-terminal tails of VEGF-A165 and VEGF-B167 show a reduced number of repeats, and for VEGF-A, this domain was named “heparin-binding domain” (HBD). “Heparin-binding” or binding to the extracellular matrix (ECM) and cell surfaces is one function of the SHD [97], and the HBDs of VEGF-A165 and VEGF-B167 have developed a stronger heparin affinity compared to VEGF-C or VEGF-D, which may have allowed for their size reduction. In addition, the SHD also keeps VEGF-C inactive, likely by sterical hindrance [98], which is not required for the longer VEGF-A isoforms, whose HBD can mediate inactivity by sequestration [99]. However, some signaling appears possible when ECM-bound VEGF-A189 or VEGF-A206 are in direct contact with endothelial cells [100]. Also, ECM-associated VEGF-A165 can signal, although distinctly from free VEGF-A165 [101, 102]. Our analysis shows that the SHD was likely an essential part of the proto-PDGF/VEGFs. Despite the SHD being larger than the receptor-activating VHD, it has been maintained for hundreds of millions of years. Since much shorter propeptides can achieve protein inactivity, we suspect an additional function for this domain: establishing a VEGF-C gradient.

VEGF-A gradients are instrumental in vascular network patterning [103,104,105]. They are believed to result from the interaction of the longer VEGF-A isoforms with ECM and cell surfaces and to be essential for embryonic vascularization [106, 107]. Although VEGF-A165 is considered the major isoform in humans [108], the stronger ECM-binding VEGF-A189 and other long isoforms dominate sequence databases. Splice prediction algorithms do not even predict the existence of VEGF-A165 (and sometimes VEGF-A121) for numerous animals such as cattle, horses, and many birds (data not shown). In addition to mRNA splicing, teleost fish have diversified VEGF-A by gene duplication. Both zebrafish VEGF-As (Vegfaa and Vegfab) are indispensable [109]. While comparable in length, they differ significantly in their charge, but whether this translates into a differential interaction with the ECM is unknown.

Two splice isoforms might be specific to mammals: VEGF-AXXXb and VEGF-Ax. VEGF-AXXXb isoforms are generated by using a non-canonical splice acceptor site in exon 8, thus changing the last 6 amino acids of the protein [69]. In contrast, VEGF-Ax isoforms are generated by translational read-through within exon 8. Unlike VEGF-AXXXb, VEGF-Ax contains the same 6 C-terminal amino acid residues as the canonical VEGF-A isoforms but extended by another 22 amino acid residues (see Supplementary file1, Figure S5c) [70].

Both VEGF-AXXXb and VEGF-Ax have been reported to be antiangiogenic [69, 70, 110]. Others have shown weak angiogenic potential of VEGF-Ax, which resulted from reduced or abolished interaction with NRP-1 [111], an important co-receptor in the context of angiogenesis [112]. Since exon 8-derived sequences are crucial for NRP-1 interaction [113, 114], VEGF-AXXXb would fail to interact due to the lack of these sequences, and VEGF-Ax due to active interference mediated by the read-through tail. If VEGF-Ax binding to VEGFR-2 remained strong, VEGF-Ax could displace VEGF-A from VEGFR-2, eliminate the NRP-1 contribution, and act as a partial agonist suppressing angiogenesis, e.g., in a high-VEGF-A environment [115]. However, experiments failed to observe such competition [111].

Our data cannot inform about the functional characteristics of these isoforms but confirm that exon 8 of VEGF-A is strongly conserved at the DNA level. There is some evidence for VEGF-AXXXb and VEGF-Ax being under purifying selection at the protein level. However, while the conservation of protein-coding sequences is mostly enforced at the protein level, RNA constraints can mimic protein conservation [116], especially perhaps for short sequences. In line with the latter explanation are reports that dispute the existence of inhibitory isoforms altogether [117, 118]. Interestingly, the diversifying selection of the first amino acid of exon 9 (serine/proline) manifests in its frequent conversion to cysteine, which is observed in several species (e.g., bonobos, sheep, alpaca, camels). It is tempting to speculate whether this reversal to cysteine would affect the angiogenic potency of VEGF-AXXXb.

How did the complex splicing landscape of VEGF-A evolve? Likely, transposable elements played a role [119]. The effect of a retrotransposon insertion (long interspersed nuclear element, LINE) during mammalian evolution can clearly be seen in the VEGF-A intron preceding exon 8 (Supplementary file1, Fig. S5a). Transpositions and subsequent deletions and rearrangements resulted in rather large differences in the length and internal intron structure of VEGF-A genes of different species. Also interesting, although not rare in the human genome, is that the same intron harbors the remnants of a MER20B transposable element, which is a mammal-specific progesterone-responsive enhancer instrumental in the regulatory network necessary for pregnancy [120].

Similar to VEGF-A, VEGF-C might form gradients by the interaction of its SHD domain with the ECM [97]. Such morphogenetic gradients might be crucial for developmental lymphangiogenesis but also for developmental angiogenesis and vasculogenesis [51, 121, 122], possibly explaining the strong purifying selection of VEGF-C in its VHD and SHD. Also, all other VEGFs, except for PlGF, showed strong conservation in the VHD (Supplementary file1, Fig. S6). For the VEGF-A isoforms, the sequence diversity outside the VHD increased with the length of the isoform (Supplementary file1, Fig. S3), perhaps facilitated by the existence of many isoforms.

Structural differences between protostome and deuterostome PDGFs/VEGFs

PDGF/VEGFs form a family within the superfamily of cystine knot growth factors. Their hallmark is a characteristically spaced pattern of eight cysteine residues, consisting of the 6-cysteine pattern of the cystine knot signature expanded by two cysteines responsible for the covalent dimer formation of PDGFs/VEGFs [123]. The 8-cysteine pattern is broken with respect to the intermolecular disulfide bond-forming cysteine by only one vertebrate member of the PDGF/VEGF family, PDGF-C. However, in protostomes, missing intermolecular disulfide bonds are the rule rather than the exception (Fig. 2). While disulfide bridges increase thermostability, low ambient water temperatures are typical for many freshwater and marine species, for which covalent dimer formation via disulfide bonds might not have any advantage over noncovalent dimer formation, while disulfide bond formation comes at a cost [124, 125]. Even at 37 °C, the cystine bridge is not strictly necessary for dimer formation: VEGF-C also forms noncovalent dimers [19], and stable VEGF-A can also be produced after the mutation of the intermolecular cystine bridge-forming cysteines [126].

Conserved when present, not needed when absent

Different from VEGF-A and VEGF-C, which are pervasively maintained within the vertebrate lineage, PlGF and VEGF-B are absent from major vertebrate classes. PlGF appears to be absent in amphibians, and VEGF-B is completely missing from birds and crocodiles. The gene duplication that led to the establishment of the PlGF and VEGF-B genes presumably happened shortly before the cartilaginous fishes branched off. Consequently, e.g., shark VEGF-B is much more similar to VEGF-A compared to VEGF-B of land animals (Supplementary file1, Fig. S7).

The absence of PlGF in amphibians and VEGF-B in birds and crocodiles is due to secondary gene loss events, which were apparently—very similar to knockout experiments of the same genes in mice [12,13,14]—well tolerated. While VEGF-B has been proposed to play a role in the regulation of endothelial fatty acid uptake [127] and vascularization and tissue perfusion via indirect activation of VEGFR-2 [128], its precise role remains controversial [129, 130]. Its evolutionary loss might have been a net benefit for birds, perhaps even instrumental to enabling the high metabolic turnover needed for flight [131]. In any case, our understanding of PlGF, VEGF-B, and VEGF-D, all having been conserved for 500 MYA despite their apparent present-day redundancy in mice, leaves ample room for future insights.

While very common in plants, polyploidy is rare among animals. Among vertebrates, it is tolerated best by fish and amphibians [132]. This tolerance is also seen at the gene level. We frequently found individual pdgf/vegf gene duplications in fish but not in higher vertebrates. Holostei fish, a sister clade of the teleost fish, show, for example, a duplicated vegfc gene. Whether the duplicated vegfc genes have been maintained in Holostei from one of the prior whole-genome duplications or whether they resulted from a limited gene duplication event early in the Holostei lineage is unknown and perhaps unknowable since the chromosomal context has likely been already lost. Surprisingly, both vegfc genes continue to be strongly conserved in Holostei. The conservation is strongest in the receptor binding domain but can also be seen in the SHD (Fig. 6b). Interestingly, only two of the conserved residues of the PDGF/VEGF signature were under strong purifying selection, and only three out of the nine residues under strong purifying selection were cysteines, arguing that a better, perhaps more sensitive search pattern for the detection of PDGF/VEGF proteins could be developed by taking conserved non-cysteine residues into consideration. Contrasting this strong conservation is the variability of the immediately N-terminally adjacent region, which is presumably instrumental in the activation of the inactive pro-VEGF-C into the mature VEGF-C by proteolysis [

Comments (0)

No login
gif