XINViewer Documentation

Introduction
Displaying and Exploring a Protein-Protein Interaction Network

Selecting Network Nodes
Panning and Zooming the Network

Searching a Protein-Protein Interaction Network
Cliques and Hubs In a Protein-Protein Interaction Network

Finding Cliques and Hubs
Extracting Cliques and Hubs

Network Statistics
Known Issues

1. Introduction

XINViewer is a DIP XIN Protein-Protein Interaction Network viewer written in Java. It was written entirely from scratch as a final project for CH391L.

This document describes how to use XINViewer.

2. Displaying and Exploring a Protein-Protein Interaction Network

When you launch XINViewer, you will be presented with the main product window:

XINViewer main window

On this window, you can load and display a XIN file containing a DIP protein-protein interaction network. Go to the File menu, select Open, and choose a XIN file (such as "Ecoli20041003.xin"):

E.coli network

You can load as many XIN networks as you wish inside the main XINViewer window. Each network will have its own window and can be moved, closed or hidden independently.

Selecting Network Nodes

You can click on any node in the network to select it and see its details displayed in the table on the right:

E.coli selected network node

Notes:

When you click on a network node, the network will be re-laid out with the node you clicked on in the center and all its immediate neighbors laid out radially around it. That node's properties will be displayed in the table on the right.
If you hold down the ALT key when you click on a network node, the network will not be re-laid out. Only the node's properties will be displayed in the table on the right.
If you just hover the mouse over a network node, that node and its immediate neighbors will be highlighted in orange, and you will see a tooltip with that node's name and description.

Panning and Zooming the Network

Clicking and dragging in the white area of the network window allows you to pan and zoom the network:

If you click with the left mouse button and drag left-right or up-down, the network will pan.
If you click with the right mouse button and drag up-down, the network will zoom.

3. Searching a Protein-Protein Interaction Network

You can search for any text string that describes a network node. For example, if you want to search for "chaperonin" (a string that describes the center node in the network), hit CTRL+F or go to the Search menu, and select Find:

Find input text box

The first node containing that string (if any) is highlighted in dark cyan on the screen:

Highlighted found node

Note that the highlighted node might be off the visible area of the screen, so you may have to pan and zoom the network to find it. In rare cases, the highlighted node has many other nodes laid out on top of it, so it does not appear at all; this is a known bug.

To find additional nodes that contain that same string, hit F3 or go to the Search menu, and select Find Next. If no additional such nodes exist, you will be prompted.

4. Cliques and Hubs In a Protein-Protein Interaction Network

The central idea of this project is to allow the user to explore salient features of a protein-protein interaction network via cliques and hubs. Briefly, a clique (in our definition) is an almost fully-connected set of neighboring nodes; a hub (in our definition) is a set of neighboring nodes where one node has connections to all other nodes, while the other nodes have few connections between them (imagine a hub-and-spoke). We believe that cliques and hubs in protein-protein interaction networks represent important biological processes that warrant special attention.

Finding Cliques and Hubs

You can find cliques and hubs via the controls in the bottom-right side of the window:

Clique controls:
Hub controls:

The two text boxes in each set of controls control the clique and hub search parameters:

N represents the minimum number of nodes that must participate in a clique or hub to be found. In other words, no clique or hub of less than N nodes will be found.
K represents the clustering coefficient that must be met by a set of nodes to be called a clique or hub.

The clustering coefficient is defined as the ratio of the number of actual connections between nodes to the number of theoretical connections between nodes.

For example, given a set of 5 nodes, there are 10 possible theoretical connections between them
If these nodes form a clique, then the number of actual connections will be close to 10, so the clustering coefficient will be high (upper bounded by 1)
If these nodes form a hub, then the number of actual connections will be close to 5, so the clustering coefficient will tend to be low (upper bounded by 0.5)

For a clique, K is the minimum clustering coefficient (only sets of nodes that have a clustering coefficient higher than K will be called cliques).
For a hub, K is the maximum clustering coefficient (only sets of nodes that have a clustering coefficient lower than K will be called hubs).

You can change N or K by typing another number in the respective text box.

Note: you must hit Enter, otherwise the value will not be committed into the text box! This is a known bug.

When you wish to search for a clique or a hub, you can click the button in the left of these controls.

If a clique or a hub is found, it is highlighted (cliques are shown in green, hubs in blue). See samples below:
Just like with searching for text in nodes, the highlighted nodes might be off the visible area of the screen, so you may have to pan and zoom the network to find them.
In rare cases, the highlighted nodes has many other nodes laid out on top of it, so they do not appear at all; this is a known bug.

A sample clique:

A sample hub:

Extracting Cliques and Hubs

Once you have found a clique or a hub, you can extract it into its own separate window to better see it and explore its nodes. To do this right-click on any node highlighted as part of a clique or a hub, and select "Extract Network" from the pop-up menu. Here is a sample extraction of a hub (highlighted in blue in the background):

Sample extracted hub

The new (extracted) window behaves exactly the same as the original protein-protein network window.

5. Network Statistics

You can get a set of high-level statistics about the protein-protein network by clicking on the statistics button in the bottom-right side of the window: Statistics controls

This displays a window containing the following kind of textual information:

Graph statistics: Number of nodes: 466 Number of edges: 611 Maximum diameter: 19 Average diameter: 4 Clustering coefficients histogram: Bin #0: [0 - 0.1] 131.0 Bin #1: [0.1 - 0.2] 0.0 Bin #2: [0.2 - 0.3] 4.0 Bin #3: [0.3 - 0.4] 0.0 Bin #4: [0.4 - 0.5] 7.0 Bin #5: [0.5 - 0.6] 19.0 Bin #6: [0.6 - 0.7] 56.0 Bin #7: [0.7 - 0.8] 5.0 Bin #8: [0.8 - 0.9] 9.0 Bin #9: [0.9 - 1] 2.0 Bin #10: [1 - 1.1] 233.0 Overflow: 0 Underflow: 0 Average: 0.6391008570193119 Standard deviation: 0.42872359584364245 Kurtosis: -1.350054769408164 Skewness: -0.6412215419297523 Average distances histogram: Bin #0: [0 - 1] 0.0 Bin #1: [1 - 2] 160.0 Bin #2: [2 - 3] 27.0 Bin #3: [3 - 4] 9.0 Bin #4: [4 - 5] 66.0 Bin #5: [5 - 6] 23.0 Bin #6: [6 - 7] 15.0 Bin #7: [7 - 8] 13.0 Bin #8: [8 - 9] 7.0 Bin #9: [9 - 10] 10.0 Bin #10: [10 - 11] 5.0 Bin #11: [11 - 12] 1.0 Bin #12: [12 - 13] 0.0 Bin #13: [13 - 14] 0.0 Bin #14: [14 - 15] 0.0 Bin #15: [15 - 16] 0.0 Bin #16: [16 - 17] 0.0 Bin #17: [17 - 18] 0.0 Bin #18: [18 - 19] 0.0 Overflow: 0 Underflow: 0 Average: 3.298268613000755 Standard deviation: 2.5342635469921877 Kurtosis: 0.3312019436184787 Skewness: 1.0523915211663415 DONE!

You can use this information get a high-level idea of the properties of this network, which may be useful to validate other research. For example, a network with low average diameter and low clustering coefficients in the clustering coefficient histogram is most likely a small-world network (Watts, D. J. Strogatz, S. H. "Collective Dynamics of Small-World Networks." Nature 393, 440-442, 1998, http://tam.cornell.edu/SS_nature_smallworld.pdf).

6. Known issues

Here are some of the known issues with this current release of the product:

Interactive Help: there is no interactive help currently in the product. This document serves that purpose for the time being.
Node Highlighting: when you search for nodes, cliques, or hubs the highlighted nodes may be laid out underneath other nodes, so they appear invisible on the screen. As an interim solution, you can move nodes around to uncover them.
Repeated Cliques: the same clique will be found more than once (in fact, it will be found and highlighted once for every node in the clique; this is because every node that participates in the clique meets the N and K requirements for a clique).
Self-looping Nodes: many nodes in the network have self-loops. These are currently not displayed due to a limitation in the graph drawing library.
Default Attributes: the XIN loader does not handle default attributes (http://dip.doe-mbi.ucla.edu/dip/Guide.cgi?SM=0:3)
Memory Use: the networks tend to be quite large, so if you open more than 2 or 3, the program will probably run out of memory and silently crash. You can launch it again and give it more heap space via command line parameters (http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/java.html#options). By default, the program is launched with a maximum of 512MB of memory.