PRRT2 gene and protein in human: characteristics, evolution and function

This study was designed to characterize human PRRT2 gene and protein, in order to provide theoretical reference for research on regulation of PRRT2 expression and its involvement in the pathogenesis of paroxysmal kinesigenic dyskinesia and other related diseases. Biological softwares Protparam, Protscale, MHMM, SignalP 5.0, NetPhos 3.1, Swiss-Model, Promoter 2.0, AliBaba2.1 and EMBOSS were used to analyze the sequence characteristics, transcription factors of human PRRT2 and their binding sites in the promoter region of the gene, as well as the physicochemical properties, signal peptides, hydrophobicity property, transmembrane regions, protein structure, interacting proteins and functions of PRRT2 protein. (1) Evolutionary analysis of PRRT2 protein showed that the human PRRT2 had closest genetic distance from Pongo abelii. (2) The human PRRT2 protein was an unstable hydrophilic protein located on the plasma membrane. (3) The forms of random coil (67.65%) and alpha helix (23.24%) constituted the main secondary structure elements of PRRT2 protein. There were also multiple potential phosphorylation sites in the protein. (4) The results of ontology analysis showed that the cellular component of PRRT2 protein was located in the plasma membrane; the molecular function of PRRT2 included syntaxin-1 binding and SH3 domain binding; the PRRT2 protein is involved in biological processes of negative regulation of soluble NSF attachment protein receptor (SNARE) complex assembly and calcium-dependent activation of synaptic vesicle fusion. (5) String database analysis revealed 10 proteins with close interactions with the human PRRT2 protein. (6) There were at least two promoter regions in the PRRT2 gene within 2000 bp upstream the 5' flank, a 304-bp CpG island in the promoter region and four GC boxes in the 5' regulatory region of PRRT2 gene and we found 13 transcription factors that could bind the promoter region of the PRRT2 gene. These results provide important information for further studies on the role of PRRT2 gene and identify their functions.


Background
The proline-rich transmembrance protein 2 (PRRT2) gene located in chromosome 16 p11.2 has 4 exons with a total length of 3794 bases and encodes 340 amino acids. The PRRT2 protein is a presynaptic membrane protein that plays an important role in cell exocytosis and neurotransmitter release. However, the detailed functions of the protein remain unclear. Chen et al. discovered for the first time the causative mutation of this gene in paroxysmal kinesigenic dyskinesias (PKD) in 2011 [1]. Subsequent studies have further confirmed that mutations in the PRRT2 gene are a major cause of PKD. In addition, the PRRT2 gene is also involved in the benign familial infantile seizures (BFIS) and infantile convulsions with paroxysmal choreoathetosis (ICCA) [2,3].
Bioinformatics is a field of science that combines biology, computer science, engineering, and applied mathematics to process and analyze information on DNA and protein sequences and structures, based on the massively stored biological experiments and derived datasets. The bioinformatics discipline contributes to the establishment of theoretical models, the setup of experimental research, and the genomics and proteomics studies. In this study, we set out to analyze the physical and chemical properties and molecular structure of PRRT2 using the bioinformatics approach, and predict the functions of PRRT2 in cells. In addition, as the sequence of the human PRRT2 gene promoter has not been recorded in the NCBI database, and no bioinformatic analysis of the PRRT2 promoter has been reported, we also screened for potential promoter sequences of the human PRRT2 gene from the genomic database and analyzed transcription factors as well as their binding sites and CpG islands in this gene. The bioinformatics results will lay a foundation for in-depth study of functions of PRRT2 in the pathogenesis of PKD and other diseases, and for the design of gene therapy. This study will also provide theoretical reference for the construction of PRRT2 gene promoter expression vector and determination of the gene promoter function in subsequent experimental studies.

Methods
The homology of human PRRT2 with the other species was analyzed with the DNAMAN 8.0 software, and phylogenetic analysis was carried out by MEGA 5.10.
The molecular weight, theoretical isoelectric point (pI), amino acid composition, formula, protein stability, half-life, hydrophobicity and transmembrane regions of human PRRT2 protein were analyzed using the online softwares ProtParam, ProtScale and TMHMM. The signal peptides in human PRRT2 were predicted by the Sig-nalP 5.0 software. The phosphorylation site of human PRRT2 was analyzed by NetPhos 3.1 software, and the nuclear localization sequence of the protein was predicted by cNLS-mapper.
The functional domain, secondary and tertiary structures of the protein were analyzed by using the SMART, SWISSMODEL, Swiss-pdbviewer and Pymol tools.
The Gene Ontology (GO), signaling pathway, and protein interaction analyses were carried out by using the Compartments online software, The Human Protein Atlas database and QuickGO2 database.
The potential promoter in the 5′ regulatory region of human PRRT2 gene was predicted and analyzed by online softwares Neural Network Promoter Prediction, Promoter 2.0 and TSSG.
The transcription factor binding sites in the 5′ regulatory region of human PRRT2 gene, and common transcription factors, were analyzed with online softwares AliBaba2.1 and PROMO.
The CpG island in the promoter region of human PRRT2 gene was predicted with EMBOSS and MethPrimer softwares.
The download information and websites of these softwares are listed in the Additional file 1.

Results
The analysis of human PRRT2 protein The homology analysis of human PRRT2 protein The human PRRT2 gene was located in the short arm of chromosome 16 (16p11.2) and encodes 340 amino acids. Its specific position is chr5: 29812193-29815920, containing 4 exons. The homology of Homo sapiens PRRT2 protein with that of the species Pongo abelii, Cavia porcellus, Equus caballus, Rattus norvegicus, Mus musculus, Bos taurus, and Danio rerio were 97.35, 82.85, 83.24, 79.07, 78.03, 65.56, and 23.98%, respectively. The protein sequences of the eight species were aligned using the DNAMAN 8.0 software (Fig. 1), and the phylogenetic tree of PRRT2 protein was constructed using the neighbor-joining (NJ) method based on the sequence homology in MEGA7 software [11] (Fig. 2). The phylogenetic tree showed that Homo sapiens and Pongo abelii were the closest relatives in PRRT2 protein evolution. Mus musculus had a close relationship with Rattus norvegicus and they were grouped as a cluster. Other species were related more distantly. These results suggest that the human PRRT2 had smallest genetic distance (0.009) from Pongo abelii, followed by Cavia porcellus (0.090), and had longest genetic distance from Danio rerio (0.951) ( Table 1).

Physical and chemical properties of the human PRRT2 protein
The physical and chemical properties of PRRT2 protein were analyzed by ProtParam, and results showed that the protein was composed of 340 amino acids, with a molecular weight of 34944.91, and a theoretical pI of 4.64. The formula of PRRT2 protein was C1508H2414N426O507S10, having 4865 atoms in total, 45 negatively charged residues (Asp + Glu), and 25 positively charged residues (Arg + Lys). The estimated halflife was 30 h (mammalian reticulocytes, in vitro). The instability index was 68.54. Therefore, this protein was classified to be unstable according to the criterion that assigns a protein with instability coefficient [12] < 40 as stable, and > 40 as unstable.

Hydrophilicity/hydrophobicity analysis of the human PRRT2 protein
The hydrophilicity and hydrophobicity of human PRRT2 protein was analyzed online using the ProtScale program. The results of hydrophobicity based on the K-D method are shown in Fig. 3, where the score value higher than 0 indicates a hydrophobic amino acid, while the score lower than 0 indicates a hydrophilic amino acid. The highest score (3.278) was at alanine 330, which was the most hydrophobic site; the lowest score (− 2.678) was at aspartic acid 145, which was the most hydrophilic site. Of the 332 amino acids (5-336) in the human PRRT2 protein, 77.71% (258 amino acids) of the amino acids had a score < 0, and 22.29% (74 amino acids) had a score > 0, indicating that the human PRRT2 protein was a hydrophilic protein. Consistently, results from ProtParam analysis showed that the Aliphatic index of human PRRT2 was 68.06 and the Grand average of hydropathicity was − 0.538.

Prediction of signal peptide and nuclear localization sequence of human PRRT2 protein
The signal peptide of human PRRT2 protein was predicted with Signal P5.0, a signal peptide prediction server (Fig. 4). The values of C, Y, and S were all calculated by the program to be 0. From these data, it could be concluded that the human PRRT2 protein had no signal peptide. Nuclear localization sequence prediction with the cNLS-mapper revealed that the PRRT2 protein had no nuclear localization sequence [13]. When setting the cut-off at 8-10, the protein was specifically located in the nucleus. When the cut-off value was 7 or 8, part of it was predicted to be located in the nucleus. When setting the cut-off value at 3-5, it was predicted to be located in the nucleus and cytoplasm. When the cut-off value was 1-2, the predicted localization was in the cytoplasm [14].

Prediction of the transmembrane domain of PRRT2 protein
The TMHMM prediction showed that there were 340 residues in two transmembrane regions (Fig. 5). Amino acids at positions 291-314 had intracellular location, and amino acids at positions 268-290 and 315-337 form two typical transmembrane helical regions, and amino acids at positions 1-267 and 338-340 were located outside the cell.

Analysis of the phosphorylation sites of PRRT2 protein
Phosphorylation and dephosphorylation play an important role in the process of cell division and signal transduction in eukaryotes. NetPhos3.1 analysis predicted that the PRRT2 protein contained 77 phosphorylation sites, including 25 serine phosphorylation sites, 8 threonine phosphorylation sites, and 1 tyrosine phosphorylation site (Fig. 6).
Secondary and tertiary structure analysis of human PRRT2 protein SMART online software analysis showed the distribution of Pfam:CD225 domain in amino acids at positions 264-331 (Fig. 7). The secondary structure of human PRRT2 protein was predicted through the website Prabi. The results showed that the main types of secondary structure of this protein was alpha helix, with a total number of 79 (accounting for 23.24%), and the protein also contained 230 random coil structures accounting for 67.65%, and 31 extended strands accounting for 9.12%. The distribution of secondary structure was shown in Fig. 8.
The tertiary structure of human PRRT2 protein was analyzed by the homologous modeling method based on the Swiss-model website. The scores of GMQE and QMEAN were 0.08 and − 3.91, indicating that the prediction was not satisfactory, which might be related to the low degree of template coverage (only 10.94%). Further analysis of the similarity waveform of human PRRT2 protein with its homologous protein (Fig. 9) also showed a low prediction value (less than 0.6), so this model is not ideal.
Subcellular localization, tissue-specific expression and GO analysis of human PRRT2 protein Subcellular localization analysis was conducted through the Compartments online software, and the results showed that the protein was localized on plasma membrane (Source from PSORT, Evidence was 31/32). The Human Protein Atlas database showed that PRRT2 RNA tissue specificity was enhanced in brain. The Go analysis via QuickGO 2 showed that the human PRRT2 protein had cellular component located in the plasma membrane (GO:0005886), and had molecular functions of syntaxin- Fig. 2 The phylogenetic tree of PRRT2 proteins among different species

Protein interaction
The interaction network of human PRRT2 protein was constructed from the String database with confidence set at 0.400 and number limited to 10. The results showed that there were 10 proteins that may interact with the human PRRT2 protein, including KRAS, HRAS, ELK1, PRKD1, MAPK3, MAPK1, SDC3, KIT, ADRA1B, and VEGFC (Fig. 10, Table 2). The GO analysis results and signal transduction pathways of the human PRRT2 protein and the interaction proteins are shown in Table 3.  Table 4. BLAST tool comparison of the 2000 bp 5′ upstream sequence of the human PRRT2 gene with the human PRRT2 gene promoter sequence HPRM39687 found on the GeneCopoeia website showed a consistency of 81%. The full length of HPRM39687 was 1444 bp, and the transcription start site (TSS) was located at 1240 (G). The 557-2000 bp sequence within the 5′ upstream of the PRRT2 gene was completely consistent with HPRM39687, and the 1896 (G) base corresponded to the 1240 (G) base of the HPRM39687 sequence. We speculated that the PRRT2 gene promoter was located within this 1500 bp from 5′ upstream of the PRRT2 gene.

Identification of the TATA box, GC box and CAAT box
The TATA box sequence had a format of TATAWAW (W stands for A or T), the GC box sequence had a format of GGGCGG, and the CAAT box sequence had a format of CCAAT. There were four GC boxes in the 5′ regulatory region of human PRRT2 gene, located at -773--768, -1146--1141, -1950--1945 and -1956--1950, but no TATA box or CAAT box was found. Prediction of the CpG island in the human PRRT2 gene promoter region The EMBOSS [15] prediction result showed that there was a CpG island with a length of 304 bp, located at 1642 bp-1945 bp of the predicted sequence (Fig. 11). MethPrimer [16] prediction results showed that there were two CpG islands, located at 1271 bp-1391 bp with a length of 121 bp and 1642 bp-1945 bp with a length of 304 bp, respectively. The prediction of the second CpG island was completely consistent with the prediction results of the EM-BOSS software (Fig. 12).

Discussion
PRRT2 is a proline-rich transmembrane protein type II encoded by 3 exons (exons 2-4), with a total length of 340 amino acids. Recent studies had revealed that the long N-terminus of PRRT2 is located inside the cell, and the C-terminus, which contains only 2 residues, is located outside the cell [17]. PRRT2 is enriched in the presynaptic membrane of neurons in the cerebral regions such as the cortex, hippocampus, basal ganglia and cerebellum, and interacts with the core protein of the SNARE complex, participating in the regulation of synaptic neurotransmitter release and promoted exocytosis of vesicles. PRRT2 also plays an important role in synaptic triggering and synaptic function, and thus has been proposed to be a new synaptic protein [17]. In this study, the amino acid sequences of PRRT2 of different species were obtained from public databases. Homology analysis showed that the human PRRT2 gene and that of other mammalian species were highly conserved during evolution. PRRT2 was predicted to be an unstable hydrophilic protein located on the plasma membrane, which contained two transmembrane domains. The top 10 proteins predicted by the String database to interact with PRRT2 were involved in the Rap1 signaling pathway, the Ras signaling pathway and the MAPK signaling pathway. The promoter, located near the transcription initiation site, is a DNA sequence to which an RNA polymerase can bind to initiate transcription. Here, we used three softwares to analyze promoters within the 2000 bp 5′ upstream sequence of human PRRT2 gene based on different principles and algorithms, and found that the gene had at least two potential promoter regions in the chain of justice, and the TSS was located at 1240 bp G base. In the gene expression regulation network, the combination of transcription factors and cis-acting elements can switch on or off the expression of a specific set of genes. Here, we used AliBaba2.1 and PROMO to predict transcription factorbinding sites in the promoter region of PRRT2 gene. Thirteen transcription factors were simultaneously predicted by both softwares and at the same binding site. The probability of the existence of these transcription factors was relatively high. These predictions provided evidence for the functions of PRRT2, and suggested that PRRT2 expression was regulated by a variety of transcription factors and that PRRT2, which was in a complex metabolic network, had many important physiological functions. Methylation of the CpG island can inhibit the normal transcription process of the promoter, thereby reducing gene expression. In this study, the EMBOSS and MethPrimer softwares predicted consistently the CpG islands in the promoter region of PRRT2 gene. There was one CpG island in the promoter region of human PRRT2, which was located between 1642 bp-1945 bp of [18] the 2000 bp sequence in the 5 'regulatory region, close to the first exon, consistent with the distribution characteristics of CpG island. Some studies have proposed that the transcriptional repression of promoter methylation could hinder the recognition of the binding site by transcription factors, thereby exerting transcription repression [19]. Sp1 is a zinc finger structural protein belonging to the transcription factor SP family, whose classical binding sites are rich in CpG sites [20]. We speculated that Sp1 may directly bind to the promoter region of PRRT2 as a transcription factor, change the promoter activity, and then regulate transcription. It has been reported that methylation of the promoter region can block the binding of transcription factor Spl to the promoter sequence and inhibit the transcription of target genes [21,22]. Studies have confirmed that PRRT2 can interact with the SNARE complex component synaptosomal-associated protein (SNAP25) and is co-localized in the presynaptic and postsynaptic membranes [18,23]. Subsequent evidence has supported the localization of PRRT2 in the presynaptic membrane, especially enriched in the synaptic junctions [17,18,24,25]. However, the exact physiological role of PRRT2 in the presynaptic membrane remains unclear. Valente et al. have confirmed that PRRT2 is a presynaptic membrane protein that is enriched in presynaptic terminals and expressed upon the occurrence of embryonic synapses [17]. The clinical phenotypes caused by PRRT2 mutation vary broadly, including a variety of episodic phenotypes from dyskinesia to epilepsy. Even the same mutation (c.649dupC) can result in different phenotypes, such as PKD, ICCA, benign familial infantile epilepsy, and FS. These results indicated that the PRRT2 gene has the same pleiotropic characteristics as GluT1 and ATP1A2 genes [26][27][28]. PRRT2 protein is widely expressed in the nervous system, particularly in the globus pallidus, cerebellum, subthalamic nucleus, cerebellar foot, caudate nucleus, cerebral cortex, hippocampus, and cerebellum [29]. Studies have confirmed that the mRNA level of PRRT2 changes with the development of mouse brain. The PRRT2 mRNA began to be expressed on embryonic day 16 and then gradually increased. By the 7th day after birth, it is expressed in the brain and spinal cord, and by the 14th day after birth (corresponding to 1-2 years in humans), the mRNA level of PRRT2 reached its peak, and then declined to a relatively low level in adult mice [29]. Moreover, the change of PRRT2 expression with age is consistent with the pathogenesis of some PRRT2-related diseases, such   as the age-dependent characteristics of BFIS [30]. Therefore, the high expression of PRRT2 in the brain and the age-dependent expression pattern can partially explain the heterogeneity of PRRT2 mutation-related phenotypes.
PRRT2 mutations are associated with a variety of sudden diseases such as dyskinesia, epilepsy and migraine, indicating an overlap between the molecular pathogenesis of these diseases. It had been confirmed that PRRT2 proteins are mainly expressed in the cerebral cortex, hippocampus, basal ganglia and cerebellum, and enriched in the presynaptic membrane of neurons. More importantly, these areas are in line with the neuronal origin of putative PRRT2-related diseases. In the PRRT2related diseases, the heterogeneous distribution of PRRT2-positive excitatory neurons and inhibitory neurons in different brain regions, and the insufficient single dose of PRRT2 caused by mutations, may lead to regionspecific neuronal excitability and inhibitory. The imbalance between them can eventually lead to synaptic

Conclusion
In this study, we first obtained PRRT2 gene sequences from the NCBI GenBank database, obtained the 2000 bp sequence upstream to the 5′ flanking of the PRRT2 gene, and then used different bioinformatics softwares to predict the promoter, CpG island and transcription factors of the PRRT2 gene. The results provided a basic theoretical basis for the construction of vectors for PRRT2 gene promoter expression and the detection of promoter activity, and the in silico data can provide reference for future functional studies. However, more studies are needed to advance the research on PRRT2.