What we actually do

The main areas of our research are: DNA sequencing and assembling (including design of algorithms for the NGS sequencers); protein & RNA structure analysis and modeling (including automatic tertiary structure prediction tools); nanotechnology and DNA computing.

CLAIM README
LICENSE
  The program has been distributed under GNU GPL.

INFORMATION
  This program is mainly designed to work under Linux. It has been tested under
  different distributions. Despite of this, it still should work under Windows
  with the use of Cygwin environment (http://www.cygwin.com/).
 
  The proper installation requires compilation of the program. However, this
  instruction contains the information on the process, elementary knowledge of
  the Linux-like environment might be required. It is worth mentioning also
  that the program DOES NOT have GUI (Graphical User Interface); this may result
  difficult for computer beginners.

  In the following, one may find instructions of the proper configuration,
  compilation, installation and running the program together with simple
  examples.

INSTALLATION
  The boost library is required for the compilation of the program as well as R
  binary. However, R is optional for the proper functioning of the program, it
  is used in the example. Both programs can be downloaded from the repository
  under Linux, or may be downloaded as sources from the developer website.
  www.boost.org/ and www.r-project.org/ respectively. On these websites, one
  can find instructions for their installations. Both, the library and the
  binary should be found in the default path by both the configuration script
  and the binary.

  To install, simply type:
  ./configure; make

  In order to obtain more information one should execute:
  ./configure --help


RUNNING
  To run the application, check the help first. Simply type
  ./claim -h.
  All the arguments may be passed either as common arguments or passed to the
  standard input. To check the program functionality simple type:
  ./claim < data/input.args
 

QUICK HELP
  The main purpose of this work is to create an easy tool for biologists to
  manipulate different types of biological data and help identifying functional
  modules, i.e. sets of genes performing similar tasks in living organisms.
 
  It is easily extensible and configurable. User may add his own packages to
  process the data.   

At present, claim implements the following packages:
- microarray: read/writes microarray data from/to file;
- corr: computes correlation matrix from a set of vectors (like microarray);
- shortest_path: computes shortest path matrix between nodes of a graph;
- ppi: reads ppi data from file
- graph: represents a graph as a data structure; reads and writes graph to a
  file;
- limit: define intersection of nodes sets;
- claim: perform claim analysis;
- kmeans: finds clusters using kmeans algorithm;
- results: summarize results of analysis into list of cliques


The analysis is defined by a data flow, e.g. a graph-like
  dependencies passing the output of one package(s) as an input of another. The
  user might define an input of a package in three different ways:
  <filename>, {-p <package> args}, \<package number>.
  In general the program launch looks like this:
  ./claim <program options>
      -p <package name> <package options>
      -p <package name> -i {-p <package name> <options>: <option2>}
      -p <name> -i \1
    where ':' is depicting that the program should use the same package as the
    former but with different arguments: <options2>.

  In order to obtain more information on the available packages, one should
  run:
    ./claim --help

EXAMPLE
An example has been prepared and can be used as a reference. Besides designing
new data flows, a user can obtain the application of claim described in the
related paper by simply changing the names of the input and output files,
provided that the indication on data format (see end of this file) are obeyed.


In order to run the example one should execute:
    ./claim < data/input.args
  "< data/input.args" means that the file "data/input.args" contains the actual
  configuration which content should be passed as the standard input to the
  executable.

  The referenced example is based on the publication, on the CLAIM software and
  have the following form (lines beginning with "#" are comments):
   
# General parameters. Set verbose level to info and output directory to out2.
  -v info -O out2
 
# Define the first package; calculate the correlation (package corr) out of the
# input package (-i {...}).
  -p corr -i {

# Read Microarray from file. The delimiter in the file is tabulation (-d "\t")
# and the data should be read from data/AffyNaCl_Time-course_for_cliques.csv.
# Look into the file to see the format of the file.
    -p microarray -d "\t" -i data/AffyNaCl_Time-course_for_cliques.csv
  } -r
 
# Define the third package; calculate the shortest path between the input graph
# (-i {...}) and store it as weights in the graph.  
  -p shortest_path -i {

# Define the fourth package. Read the graph from the file
# data/AI_interactions.csv, store it in boost adjacency_list structure (good
# for sparse graphs) and store the information the weights in the short
# data type (2 bytes per edge). See data/AI_interactions.csv for the file
# format.
    -p ppi -d ':' -g adjacency_list -t short -i data/AI_interactions.csv
  }
 
# Define fifth package, with its input (-i {...})
  -p graph -i {

# Define sixth package, which takes as the input package 3 an limits its
# vertices being a common subset of packages 3 and 1;
    -p limit -i \3 -t \1 -s
  }
 
# Define seventh package, with its input (-i {...})
  -p graph -i {

# Define eight package, which takes as the input package 5 an limits its
# vertices being a common subset of packages 5 and 1;
    -p limit -t \5 -i \1
  }

# Define ninth package, with its input (-i {...}) and store it to file
# claim_out
-p results -i {

# define tenth package claim computing the output clusters from the Microarray
# and PPI sets clustered with the use of kmeans algorithm.
  -p claim -i {

#    perform kmeans clustering, on the graph being the output of the fifth
#    package, with the use of the R package (-V) with varying number of
#    clusters and number of tries equal to 50.
     -p kmeans -V -i \5 -N 5  --best  -n 50
     -p kmeans -V -i \5 -N 10 --best  -n 50
     -p kmeans -V -i \5 -N 15 --best  -n 50
     -p kmeans -V -i \5 -N 20 --best  -n 50
     -p kmeans -V -i \5 -N 25 --best  -n 50
     -p kmeans -V -i \5 -N 30 --best  -n 50
  } -i {
     -p kmeans -V -i \7 -N 5  --best  -n 50
     -p kmeans -V -i \7 -N 10 --best  -n 50
     -p kmeans -V -i \7 -N 15 --best  -n 50
     -p kmeans -V -i \7 -N 20 --best  -n 50
     -p kmeans -V -i \7 -N 25 --best  -n 50
     -p kmeans -V -i \7 -N 30 --best  -n 50
  }
} -o claim_out

For the sake of clarity, it has to be mentioned that different packages accept
different data structures as input and deliver to output. In spite of different
representations of the internal, low level representations, the user is aware
of only 3 structures: vector of vectors (representing the MA array), graph
representation (either represented by adjacency matrix, or adjacency list) or,
finally, the results (sets of genes). There is also additional type
representing a set of structures: multiple. In the actual set of packages the
structures are taken as input and output:
* graph package can take graph or a set of graphs as input, and returns graph
  as an output;
* claim package can take results or a set of results as input, and returns
  multiple of results as an output;
* kmeans package can take graph as an input and results as an output;
* microarray package takes vector of vectors structure or multiple of them as
  as an input and returns vector of vectors as an output;
* ppi package takes graph as an input and provides graph as an output;
* shortest_path package takes graph as an input and provides graph as an
  output;
* corr package takes vector of vectors as an input and provides graph as an
  output;
* limit package takes vector of vectors or graph as an input and the same as an
  output;
* results package takes results as an input and the same as an output;

The user should be aware of these data structures while defining the data flow.
An output of a package should be compatible with the input the package it is
passed to.

For more details check the help of the program.