EXAMPLE

To identify accurate protein complexes given a protein-protein interaction network, we built a workflow consisting of a two step procedure. Initially, a protein - protein interaction network is clustered by the MCL or the RNSC algorithm and in the second step the results are filtered based either on individual or on a combination of 4 different methods.  This two step approach maintains only those clusters that have high probability to be real biological complexes. A real biological complex can be defined as a set of proteins that are commonly involved in a biological process.

 

In order to test the efficiency of GIBA, we have compared it with 4 other algorithmic methods: the Mcode, the HCS, the SideS and the RNSC algorithm.

 

To demonstrate the use of our methodology, we have used seven datasets derived from various small scale and high-throughput methods.The benchmark that we have used to evaluate the algorithms tested consists of known yeast protein complexes retrieved from the MIPS database. MIPS protein complexes composed from smaller ones, also recorded in MIPS database, were removed to avoid redundancy. These datasets used here were:

 

ΙΤΟ dataset  -  4038 interactions among 3279 proteins

Ito T, ..., Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome Proceedings of the National Academy of Science 2001, 98(8):4569-4574.

 

Tong dataset  -  7430 edges and 2262 vertices.

Tong AH, ..., Chang M et al: Global mapping of the yeast genetic interaction network. Science 2004, 303(5659):808-813

 

Krogan dataset   -  7088 edges and 2675 vertices

Krogan NJ, ..., Tikuisis AP et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440(7084):637-643.

 

Gavin_2002 datasets  -  3210 edges and 1352 vertices

Gavin AC,..., Cruciat CM et al: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415(6868):141-147.

 

Gavin_2006 datasets  -  26531 edges and 1430 vertices.

Gavin AC, ..., Dumpelfeld B et al: Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631-636.

 

DIP dataset  - 17491 edges and 4934

Xenarios I, ..., Eisenberg D: DIP: the database of interacting proteins. Nucleic Acids Res 2000, 28(1):289-291.

 

DATA ARE DOWNLOADED WITH jCLUST APPLICATION

 

By chosing the Markov Clustering (MCL) and setting the filtering parameters like:

 

Dataset

Filter

ITO

Density=0.75, Haircut=2

Tong

Density=0.75, Haircut=2

Krogan

Cutting_Edge=0.55, Density=0.7, Haircut=3

Gavin_2002

Cutting_Edge=0.5, Density=0.6, Haircut=2

Gavin_2006

Cutting_Edge=0.75, Density=0.6, Haircut=2, Best_neighbor =0,6

DIP

Cutting_Edge=0.5, Density=0.6, Haircut=3

MIPS

Cutting_Edge=0.5, Density=0.7, Haircut=2, Best_neighbor =0,75

 

We see that our approach works better than others (GIBA MCL) and GIBA (RNSC) perform better than other known methods to predict protein complexes:

Figure   - The percentage of successful predictions in respect to the MIPS recorded complexes of the algorithms tested.
 

 

Figure   - Above we see the protein protein interaction network from Gavin 2006 et al. protein -protein interaction dataset. After applying spectral clustering algorithm and filter the results with parameters density=0.7 and haircut=3 we see how the layout algorithm places the separates the clusters using different color schemes. By isolating some clusters we see that jClust can be used for real protein complexes prediction. We show an example where the algorithm predicts the already recorded budding yeast Arp2/3 complex.