jClust: A tool for
clustering analysis

In this section, we introduce a new version of Medusa mainly developed by Dr. Sean Hooper, a tool for exploring, and visualizing clusters of biological networks calculated by jClust application. Medusa was developed with the help of biologists and it aims to go further than most visualization tools in implementing many different kinds of clustering and layout methods that can be applied to a network. The main reason for that, is to not only give intuitive layouts but also extract new scientific knowledge. Its compatibility with the most widely used visualization tools in both 2D and 3D makes it very competitive.
Medusa is a tool for visualization and clustering analysis of biological networks in 2D. It is highly interactive and it supports weighted and multi-edged directed and undirected graphs where each edge between two bioentities can represent a different biological concept. Medusa is optimized for accessing protein interaction data from the STRING database and it is currently enriched with a variety of layout and clustering methods such as grid, random, circular, hierarchical, Fruchterman-Reingold and distance geometry layouts together with user predefined clustering algorithms. It is now compatible with a bigger variety of other tools.
|
|
Data representation and GUI |
Medusa is currently offered as a standalone non open source java application. It is a program that does not require special hardware or computational power and it is easy to incorporate into various projects. Java 1.6 is required for Medusa to run. Medusa is able to visualize nodes with their connections in 2D. It comes with its own input file format where the user can define parameters referring to nodes such as annotation string, URL address, shape, coordinates or colors. Furthermore, medusa utilizes Bezier curves that allow the visualization of multi-edged connections. In that way, the user can display up to ten different types of connections between two bioentities at any time, show or hide them. Each connection between different bioentities can represent a different type of connections i.e. two nodes are found to be evolutionary related, or found to co-occur in a document or be connected in relational database schemas etc. Users can also define multi-edge, directed or undirected weighted connections that may represent a confidence or similarity measure such as sequence similarity for genes and proteins. Medusa is highly interactive and easy to use. Users can drag nodes and place them anywhere, add new ones on the fly or delete existing ones. In such a way a researcher can work with a subnetwork of a larger network or expand the already existing one. The status of the network can be saved and reloaded and graphs can be exported to image or postscript files. Medusa is currently compatible with Pajek, Arena3D, BioLayout Express3D , Cytoscape and GraphViz libraries for further analysis or visualization. Initially Medusa was developed to support visualization of the STRING database which stored evidence about interactions between different data types from various sources. Operations such as selection of subnetworks, zooming in/out, rotation, scaling, or translation are supported to make the exploration of the network easier. Nodes can also be selected graphically or with the help of regular expressions. Isolation of connections that refer to a set of node is also supported if dragged. Furthermore, connections can be filtered down according to a user defined confidence score. We used this newer version of Medusa to visualize the clustering results from jClust.
|
|
Layout Algorithms |
Medusa comes with a variety of layout algorithms to distribute nodes on the plane. The main purpose for that is to make the network more intuitive and to minimize the crossovers between the connections. Initially the coordinates of the nodes are pre-defined in the input file by the user. Medusa allows distributing the nodes randomly, on a grid or on a circle. Furthermore, Fruchterman-Reingold layout algorithm, which tends to place the highly connected nodes in the middle and spread the rest around, is embedded. Another hierarchical layout algorithm places the nodes in a hierarchy by placing the most important nodes on top of a tree-like structure. The layout algorithm comes with a rich color scheme to paint the clusters in such a way that they are very distinct between each other.
|
|
Predefined Clustering |
The layout used to visualize pre-defined clusters is very efficient. We use this algorithm to show clusters in Medusa application calculated by jClust clustering library. Initially the medusa canvas area is split in N squares with different sizes where N is the number of calculated clusters. Then N centers are chosen and the nodes that belong to one cluster are placed circularly around these clusters. So the layout algorithm ends up with N circular clusters with different Radius. The clustering algorithm is very efficient even though its concept is very simple and is especially informative for bigger datasets.
|
|
Medusa within jClust |
jClust provides input file formats readable by Medusa application. A predefined clustering algorithms allows neat layouts so that clusters tend to group together and are separated from the rest of the network. Each cluster is painted differently since Medusa is able to offer advanced coloring schemes. Initially N centers are calculated on a grid where N is the number of clusters found by jClust. Afterwards, the nodes that belong to a cluster are placed in a circle around these centers forming distinct clustering visualizations. Medusa can be used separately or be called through jClust application.