Outline
Help
 How To Use Our Web-Server
      Preparing the query file
      Setting Parameters
      How to upload selfdefined datasets
      Starting the Program

 How To Run Program locally
     Downloading source code and data
      Requirments
      How to run R script without parallel computing
      How to run R script using parallel computing

 What results will be like
     SortbyGONum.csv
      SortbyGOSig.csv
     Graph.pdf




How To Use Our Web-Server

      Preparing the query file
      After doing microarray experiments and data normalization, following process will be performed to prepare the query file:
         1. log-transforming signal value for each probe set;
         2. for each probe, calculating log-ratio of treatment vs. control profile;
         3. preparing query file based on this data: query file should be a two columes .csv table with first colume representing probe ID and the second colume representing log-ratio of corresponing probe(See Figure 1). If the platform is not HGU133a, gene-logRatio relationship will be provided instead of probe-logratio, and the first colume will be instead changed into gene symbol(values of probes mapping to one gene should be summarized to be one value, for example , calculating mean value after log-transformation).
     To make it more clear, an example file will be provided.Sample File

Figure 1


      Setting Parameters
         Following parameters can be set by users:
         1. Permutation Times: this parameter is used when calculate the P value of Similarity score. As the P value is calculated using random permutation, "Permutation Times" indicates the number of random permutation. In default it is set to be 1000. The program may run quite a long time if this number is too large.
         2. GOmode: this option is provied for facilitating users to Comparing GOMs with respect to different biological meanings(Biological Process, Molecular Function and cellular components).
         3. P-value threshold: As the expression pattern similarity of chemical pair in each GOM are evaluated using random permutation rather than using similarity score directly, threshold for P-value should be set to seperate signaficant matches from insignaficant ones. In default,P-value threshold is set to be 0.05. For expression pattern similarity comparison in certain GOM, only instances whose P value smaller than P-value threshold are regarded having similar pattern of expression with query one in this GOM.
         4.Persent top n instances: if this value is set, instances ranked in top n are writen into result file "SortbyGONum.csv". Maximun of this value is 453(maximun number of instances in connectivity map build 1).


      How to upload selfdefined datasets
         There are two places you can upload selfdefined datasets:
         1. Self-defined reference database: In that case, both the data and corresponding condition should be uploaded,and both two should be in .csv format. the reference database should be uploaded in the first column and the condition in the second. the first column of reference database is the name of probesets and the others are the corresponding rank value in each instances, the first row should be annotated use instance names. In the conditon file six factors should be annotated, they are"batch"," name","dose","cell","instance.id","perturb.scan","vehicle.scan","array.name"(and the first row in the condition file should be these 6 names), if some are missing, fill it with "/". Every column in condition file annotate on condition factor, and every row means an instance, with the instance name annotated in the first column
         2. Self-defined gene categories: the gene categories can also be uploaded by users.This file is two columns .csv file. The first column contains the names of categories and the second contains gene names
        


     Starting the Program
       Submiting your job into the server after adding your e-mail address. Results will be returned by e-mail after several minutes.



How To Run Program locally

     Downloading source code and data
       To run the program locally, you shall first download the R script and data set from our web site ( http://www.biosino.org/GEMS2/download.jsp).
       Data set "query.RData" contains all data of connectivity map(build 01) as well as it's annotations, if you want to query against other data, you shall prepare dataset yourself and modified the R code to fit into your needs.

      Requirments

       To run the program locally, R as well as several packages should be installed first.
         1. R can be downloaded from www.r-project.org/.
         2. packages need to be installed were listed here:
         "hgu133a" , "annotate" , "Category" , "Rgraphviz" , "GOstats" , "marray" , "limma" , "GO" , "Matrix (version 0.9975-11)"

      How to run R script without parallel computing
         1. Start R
         2. Load reference data set: load('query.RData')
         3. Source R script: source ('gems.R')
        4. Run the main function: functionmain ('gds1215logfc.csv') in default, If parameters need to be changed ,set corresponding parameter yourself.

      How to run R script using parallel computing
    Parallel computing have to be implemented in Linux system, and extra softwares and R packages should be installed.
         1. PVM need to be installed in your system.
         2. R package "rpvm" and "snow" should also been installed.
     PVM should be started before you running the program.


What results will be like

     Three files will be returned to your e-mail.
      SortbyGONum.csv
     In this file, instances are arranged descendingly according to the number of matched functional modules. Under each instance laid out all significant GO modules shared with the query one(Figure 2). Sample File are provided.File need to be opened using Excell.

Figure 2


      SortbyGOSig.csv
    In this file, GOMs affected by query intervention are ranked according to hyper-geometric P value, under each module listed all instances having the same or opposite pattern of inducement within this module(Figure3). Sample File are provided.File need to be opened using Excell.

Figure 3


      Graph.pdf(this file is available ,only when GO is chosen as gene Categories)
    In this file, a graph of GO sub-tree is generated for each top reference instance with circle denotes GO module and filled color indicates whether the instance have the same(red) or opposite(green) expression pattern in this GOM compared with query one(Figure4). Finally a heatmap is also added for ease of viewing result globally(Figure5). Sample File are provided.

Figure 4



Figure 5