unilogo-db unilogo-db

 HOME 

 BROWSE 

 QUERY 

 BLAST 

 CLUSTER 

 DOWNLOAD 

 HELP 



ZNF Research In Thiesen's Lab

Proteomics Lab in Rostock

Biosino in SIBS

SysBio In SIBS

SIBS

  

Help Information

Last Updated: June 24th, 2008

Summary top

C2H2 Zinc Finger genes (C2H2-ZNF) constitute the largest class of transcription factors in humans and mouse. C2H2 zinc finger proteins primarily bind to DNA. In most cases, they attach to regions near certain genes and turn the genes on and off as needed. The researches on these genes show light on the evolution of gene regulation systems and development. Therefore, we develop SysZNF (Systematical information resource of Zinc Finger genes) to collect the information related to C2H2 Zinc Finger genes. The aim of SysZNF was to provide a user-friendly interface for rendering the information (DNA, Expression, Protein, Reference and so on) of each C2H2-ZNF (e.g., ZNF10) and to enable a comprehensive analysis of C2H2-ZNF. This project was supported by the Proteome-Center at Rostock University (PCRU) who conceives the concept of the database and Key laboratory of Systems biology at the Shanghai Institute for Biological Sciences (SIBS) who implemented the database. It is maintained jointly by PCRU and SIBS. The user can access the database from http://lifecenter.sgst.cn/SysZNF or http://kzfgd.pzr.uni-rostock.de:8080/KZGD2007/.

The C2H2 Zinc Finger genes are the genes whose proteins have at least one C2H2 Zinc Finger protein domain( PFAM:PF00096). By this definition, we identified 740 human C2H2-ZNFs and 780 mouse C2H2-ZNFs. The chromosome distributions of these genes are displayed in Figure 1 and Table 1. These genes are enriched and clustering in the chromosome 19 of human (~36% of all human C2H2-ZNF genes) and chromosome 7 of mouse (~17% of all mouse C2H2-ZNF genes). The C2H2 ZNF domain tends to co-occur with KRAB, SCAN, BTB domains in both human and mouse genomes (Table 2). Table 2 summaries the top ~10 co-occurred domains in SysZNF.

Figure 1. Ideograms of Human and Mouse with the C2H2 ZNF Gene marked with red bar.
Human Karyotype Mouse Karyotype

Chromosome 1 Chromosome 2 Chromosome 3 Chromosome 4 Chromosome 5 Chromosome 6 Chromosome 7 Chromosome 8 Chromosome 9 Chromosome 10 Chromosome 11 Chromosome 12 Chromosome 13 Chromosome 14 Chromosome 15 Chromosome 16 Chromosome 17 Chromosome 18 Chromosome 19 Chromosome 20 Chromosome 21 Chromosome 22 Chromosome X Chromosome Y Chromosome MT Unkown Chromosome UN Chromosome 19 Chromosome 17 Chromosome 18 Chromosome 15 Chromosome 16 Chromosome 13 Chromosome 14 Chromosome MT Chromosome 11 Chromosome 12 Chromosome 3 Chromosome 2 Chromosome 1 Chromosome 10 Chromosome 7 Chromosome 6 Chromosome 5 Chromosome 4 Chromosome 9 Chromosome 8 Chromosome Y Chromosome X

Table 1. The C2H2 ZNF Gene distributions in the Homan and Mouse Chromsomes.
Species NameChromosome NameZNF Gene Number
Homo sapiens139
Homo sapiens222
Homo sapiens329
Homo sapiens413
Homo sapiens517
Homo sapiens634
Homo sapiens744
Homo sapiens833
Homo sapiens928
Homo sapiens1022
Homo sapiens1115
Homo sapiens1220
Homo sapiens1310
Homo sapiens1411
Homo sapiens1510
Homo sapiens1642
Homo sapiens1717
Homo sapiens1816
Homo sapiens19265
Homo sapiens2025
Homo sapiens212
Homo sapiens224
Homo sapiensX21
Homo sapiensY1
Mus musculus115
Mus musculus295
Mus musculus313
Mus musculus453
Mus musculus543
Mus musculus630
Mus musculus7129
Mus musculus832
Mus musculus929
Mus musculus1034
Mus musculus1134
Mus musculus1235
Mus musculus1373
Mus musculus1412
Mus musculus1521
Mus musculus1615
Mus musculus1768
Mus musculus1818
Mus musculus1914
Mus musculusX15
Mus musculusY2

Table 2. Top ~10 domains co-occurred with C2H2 zinc finger domain in Proteins.
Species Name Accession Name Source Short Name ZNF Gene Number
Homo sapiensPF01352PfamKRAB320
Homo sapiensKRAB_Baznf.llnl.govKRAB_B207
Homo sapiensPF07754PfamDUF161090
Homo sapiensKRAB_B_longaznf.llnl.govKRAB_BL58
Homo sapiensPF02023PfamSCAN51
Homo sapiensPF00651PfamBTB48
Homo sapiensKRAB_Caznf.llnl.govKRAB_C24
Homo sapiensXRCC_Bznf.llnl.govKRAB-b(lc)15
Homo sapiensPF00856PfamSET8
Homo sapiensPF00628PfamPHD6
Homo sapiensPF00046PfamHomeobox5
Homo sapiensPF00320PfamGATA5
Homo sapiensPF02892Pfamzf-BED4
Homo sapiensPF09237PfamGAGA4
Mus musculusPF01352PfamKRAB327
Mus musculusKRAB_Baznf.llnl.govKRAB_B106
Mus musculusKRAB_Caznf.llnl.govKRAB_C59
Mus musculusPF07754PfamDUF161049
Mus musculusPF00651PfamBTB47
Mus musculusPF02023PfamSCAN37
Mus musculusKRAB_B_longaznf.llnl.govKRAB_BL10
Mus musculusPF00320PfamGATA9
Mus musculusPF00856PfamSET7
Mus musculusPF00628PfamPHD6
Mus musculusPF08790Pfamzf-LYAR6
Mus musculusPF00046PfamHomeobox5
Mus musculusPF04704PfamZfx_Zfy_act5
Mus musculusPF09723PfamCxxC_CxxC_SSSS5
Mus musculusPF07975PfamC1_45

500 kb (BMC Evolutionary Biology 2008, 8:176) was used to restrict the ZNF gene interval distance in a cluster.The distributions of C2H2 ZNF gene cluster in human and mouse were displayed in Figure A1. Note that, the user can make use of the tool "FunnyCluster" (Figure 11) of SysZNF to intefer C2H2-ZNF cluster by himself/herself.

Figure A1. Distribution of cluster sizes in Human and mouse.
Distribution of cluster size

SysZNF web interface top

Browsing the database.

All data in the database can be browsed from the page of BROWSE. You can browse the genes by chromosome location or domains. When you click the chromosome ideogram or the name of the domain, you will see all entries related to the item you click. For example, clicking on the human chromosome 1, then a new page will be popped up (Figure 2). A gene list with the basic coordinate information in the chromosome was laid on the top of this page. These entries (genes) can be selected by user and the DNA/Protein sequences of chosen genes can be downloaded (Figure 2) for further analysis. For domains (e.g., browsing with SCAN domain), you can also download the protein sequences in domain regions besides the whole DNA/Protein sequences.

Figure 2. The result page when click the chromosome 1 in the human ideogram of BROWSE page.
Output page of browsing

Search System

There are three types of search engines developed for SysZNF. The first one is searching the database by keywords (Figue 3), the second one is by bio-sequences (Figue 4), and the third one is using SQL language (MySQL syntax) (Figue 3). We combine the first and the third as the text type (Figue 3).

Text search

The user can type any keywords in the "QUICK SEARCH" textbox. In the advanced search, you can use the gene symbol, domain name or physical position to search the ZNF genes in the database. Note that, the syntax of "physical position" should be like

chr[\dXY]:StartPosition-EndPosition
or
chr[\dXY]
, where \d is digits. For example, chr12:132217232-132246140, chrX. The SQL searching is developed for the advanced user. Now, only the "search" privilege is opened for the public. The schema of the database can be accessed through the link of " THE SCHEMA OF THE DATABASE ".

Figure 3. Text search engines developed for SysZNF.
Text Search

Sequence search

The sequence search engine in SysZNF is based on the NCBI BLAST engine. It supports both protein searching and nucleic acid sequence searching. Thus, the user can compare the the sequence they have to the proteins in SysZNF. The result page of the search engine has BLAST hits ordered by E value. If the user wants to know the detailed alignment information of two sequences, anchor "BL2SEQ" can be clicked (Figure 5). All parameters used in the sequence search page were defined by the NCBI BLAST package.

Figure 4. Sequence search engines developed for SysZNF.
Sequence Search

Figure 5. The result page of sequence search.
Result of Sequence Searching

Data Display

The database is gene model centered. The genomic region of each gene model is defined as the region between the minimum left position of all of its transcripts and the maximum right position. A typical page of gene entry is displayed in Figure 6.

Figure 6. The entry page of ZNF10.
ZNF10 Entry

In Figure 6, the gene model, probesets, domain structures, homologs (putative orthologs/paralogs, synteny information between the human and mouse), clusters, literatures and some useful quick links for a gene can be accessed. In the top of the page, there is a compact tool bar giving quick links. In the gene model display system, hot regions are set for the structural units of the gene model. By these hot regions, sequences can be retrieved.

For protein sequence, we marked the domains with different colors (Figure 7). The colored region can also be linked to the local sequences and evidence (Figure 7).

Figure 7. The protein sequence with colored domain region.
Protein Sequence with Domains

Clusters of C2H2 ZNF genes in the genomes are highlighted with an image to display the cluster (Figure 8).

Figure 8. The largest cluster in chromosome 19 of Human
Cluster 166

Several related tools
We have developed several related tools for the ZNF analysis and integrated them into the SysZNF. The first one is the genomic sequence retrieval system (Figure 9). By this system, you can retrieve the genomic sequence in any region of the Human or Mouse genomes.

Figure 9. Genomic sequence retrieval system.
Retrieve Genomic

The second one is the Zinc Finger binding site analysis system (Figure 10). The user can combine the fingers they wanted and send the result protein sequence to Tommy's sever.

Figure 10. Zinc Finger binding site analysis system.
FunnyFinger tommyServer

The third one is the FunnyCluster(Figure 11), which is designed to cluster the adjacent C2H2-ZNF genes in the chromosome. The user can set any interval sequence distance separating two C2H2-ZNF genes to intfer the clusters in SysZNF.

Figure 11. FunnyCluster.
funnyClusterInterface

funnyClusterRst

Pipeline to build the SysZNF top

  1. Download original data from NCBI AceView (human April07 and Mouse June07)
  2. Domain prediction using HMMER package
  3. Collect the transcripts and the corresponding products of each gene model
  4. Integrate information from other databases, such as Swissprot, iHop, geneCard, treeFam, dbPTM, EPGD and so on
  5. Develop the gene model rendering system, protein display system, query system, genomic sequence retrieve system etc
  6. Construct a website using MySQL, JSP and Tomcat technologies

Data in the background database top

Schema of the database
The schema of the database can be viewed in Figure 12.

Figure 12. Database Schema.
Schema

Data download
Several compressed data files can be downloaded from Download Page. The user can also access the database by SQL language. If the user has some special request, he/she can email the developer of SysZNF (biosino.space@gmail.com).

Example Queries top

Query any C2H2-ZNF gene
The gene can be quried with the keywords in the Search page or with sequence in the Blast page. In the gene entry page (e.g., ZNF10), there are many cross references linked to other public databases.
Query several genes
SysZNF supports the SQL query. As an example, if you have a list of gene symbols, you can get the related informations from the database.
SELECT g.idx, g.geneModelID, g.chromosome, g.strand, 
g.minStartPosition, g.maxEndPosition, g.coordSystem
FROM genemodelsummary g
WHERE geneModelID in ("PRDM16","ZBTB48","CASZ1","PRDM2","ZBTB17","ZBTB40",
"ZNF436","ZNF683","RP1-27O5.1andZBTB8","FLJ25476","ZSCAN20","MTF1","RLF",
"ZNF643","ZNF642","C1orf176andZNF684","HIVEP3","ZNF691","KLF17","GLIS1",
"ZNF644","GFI1","ZNF697","ZNF687","POGZ","ZBTB7B","ZBTB37","ZNF648",
"ZBTB41","ZNF281","ZNF678","ZNF238","ZNF670andZNF695","ZNF669","ZNF124",
"LOC729806","ZNF496","ZNF672","ZNF692","KLF6","ZNF438","ZEB1","ZNF248",
"ZNF25") and taxID = 9606 
Exract the domain sequences
There are two ways to extract the domain sequences from the database.
The 1st method is searching the domain directly in the Search page or browsing the Browse page by domains. In the resulting page, you can select the genes you are interested in and click the button labeled "Get domain region" at the bottom to get the domain sequences in fasta format. For example, get the sequences of SCAN domain in the SysZNF .
The 2nd method is using the FunnyHmmer tool, which was also developed for SysZNF. When the user provides the protein sequences and hmm profile of a domain, FunnyHmmer will extract the domain sequences from the whole protein sequences. FunnyHmmer is engined by HMMER. Please try SCAN domain searching.
Retrieve all sequence in the SysZNF
Even thought you can retrieve it by sequence retrieve system or SQL searching, the simplest way is to download these sequences (Peptides or Nucleic acids sequences) in the download page.
Retrieve all C2H2-ZNF gene Model structures in the SysZNF
We have done this for the users. The user can get it from the download page.
Get the synteny rigions of human and mouse which contains the C2H2-ZNF genes
We have done this for the users. The user can get it from the download page.
Query the C2H2-ZNF clusters
The cluster should be searched by the C2H2-ZNF genes. That means, you should use the record of the gene to get the link to the cluster. For example, by the record of ZNF10, the cluster 196 can be accessed.
In SysZNF, 500kb was employed as a threshold of interval distance to infer the C2H2-ZNF gene clusters. Therefore, all C2H2-ZNF gene cluster accessed in SysZNF is in terms of this definition. However, SysZNF supports a tool named FunnyCluster to cluster the C2H2-ZNF genes according to the interval distance set by the user.
Some other queries to SysZNF
We plan to compress all dataset in the SysZNF database. Then, the users can mirror a database locally and extract any thing they want. If you have any question, feel free to email us.

Licence Information top

Data in the database and the code of the website are free for academic, personal, and non-profit purposes under the GNU General Public License, version 2.

About the authors top

Dr. Guohui Ding
Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, P.R.China

Dr. Peter Lorenz
Institute of Immunology, University of Rostock, Germany
Email: peter.lorenz@med.uni-rostock.de

Mr. Michael Kreutzer
Institute of Immunology, University of Rostock, Germany
Email: michael.kreutzer@med.uni-rostock.de

Prof. Dr. Hans-Juergen Thiesen
Institute of Immunology, University of Rostock, Germany
Email: Hans-Juergen.Thiesen@med.uni-rostock.de

Prof. Yixue Li
Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, P.R.China
Email: yxli@sibs.ac.cn


Copyright © 2006-2009 Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences,Chinese Academy of Sciences, China and Institute for Immunology, Medical Faculty, University of Rostock, Germany. All rights reserved.
This page was last updated 2008-10-1 | This website works well with Internet Explorer 7 and Firefox 1.5+