Usage

Requirement

  • JDK 1.6 or more
  • AllegroGraph 3.0 or more: Please download AllegraGraph from http://www.franz.com/ and install it locally.
  • Cell Illustrator Online (CIO): Please download CIO from http://cionline.hgc.jp/ to view the CSO and CSML models. It needs registration.

To start the package

  • To click the jar file on your browser,
  • After downloading the package, in command line, "java -jar CSOValidatorPackage.jar",
  • Or use the batch/shell file from the Download page.

Parameter setting

The GUI consists of three tabs: Parameters, Log, and Information. The Log and Information tabs give log information during running the package and general information, respectively.

Parameters

csoGui.png

Name Description
REQUIRED Mandatory values
Input CSO File Name A CSO file name to validate
Output CSO File Name A result CSO file name after validation
Allegrograph Hostname localhost by default
Allegrograph DB Dir A DB directory that saves each database
Allegrograph DB Name A DB name for the currrent CSO file
BASIC Two options for validation
Validation of biological events Checked by default
Complementation of models The complementation should be done only once. It is useful to check for the last validation.
ADVANCED For multiple files validation
Allegrograph Port 4567 by default
Validation Title The validation GUI title that is given automatically as each CSO file
Script Language JavaScript
Script Script for multiple CSO files

In the case of CSMLValidatorPackage.jar, the parameters are slightly different as follows:

Name Description
REQUIRED Mandatory values
Input CSML File Name The CSML file name to validate
Output CSO File Name [intermediate] The CSO file after CSML2CSO conversion
Output CSO File Name The CSO file name after validation

In the below, this document explains how to use CSOValidatorPackage. The usage of two packages is basically the same. CSMLValidatorPackage needs one more step than CSOValidatorPackage, i.e. a conversion from CSML to CSO, so it needs more time to finish. To save time, when saving a model in CIO, save as CSO.

How to set parameters for one CSO file

Five parameters, Input CSO File Name, Ouput CSO File Name, and Allegrograph Hostname, DB Dir, and DB Name.

  • To select the input CSO file, click the Value cell then you can see the arrow head in the right of the cell, and choose a file.
  • For Output CSO File, you select any directory and give a name for the ouput CSO file.
  • Allegrograph Hostname is given as localhost by default. We assume that Allegrograph is already instally locally.
  • You have to select a directory for storing Allegrograph DB files in your file system.
  • Allegrograph DB Name should be given by manual. Each CSO file corresponds to one DB. If the DB name is duplicated, the new one overwrites the previous one without warning. To avoid confusion, give the same name with Input CSO File Name.

If the necessary parameters are all given, then click the Run button in the bottom of the window.

How to set parameters for multiple CSO files with JavaScript

If you want to validate multiple CSO files located in the same directory, use JavaScript. Click the Script's Value cell and then the arrow head which will be shown, and Script Editor will be popped up. For the script, refer CSOValidator_sample_script.txt as an example. For CSML validator, use CSMLValidator_sample_script.txt. Usually you just change the input directory as your directory in line 6 in the script. Copy and paste the content to the Script Editor, and click OK and Run.

Basically, Validation Title is given as each CSO file name if you use the given script file.

script.jpg

Validation GUI

After clicking the [Run] button, it takes time to generate the validation results depending on the size of the model. The validation results will be shown in another window (see the below window). If such kind of the window is not popped up, the given model does not need to be changed.

Basically, there are 2 types of validation:

  • Validation of biological events: to check each biological process whether it is valid or not.
  • Complementation of models: to add missing processes if needed.
    • to add a binding process for a starting complex.
    • to add a binding process for a starting entity except for complex
    • to add a degradation process for protein, complex, mRNA if they have no degradation process.

In the CSOvalidation step, if the given model has some missing and mistakes, a GUI window will be popped up to present processes and entities which have a problem and any recommendations. The GUI consists of 4 panes: Table to list warnings on the left, Error description for detailed explanation on the top right, Set value to correct values, and Buttons on the bottom. The pane name is in blue and the column name of Table is enclosed with [].

validator1.jpg

Table

The validation will be done by a process base. For a process, the validator checks whether the given process satisfies some conditions. If a condition is not satisfied, it will be listed in Table. This table explains that for a process, which entity/corrector's which property is not correct. The detailed description will be shown in Error description. If the Proposed value is not what you want, it can be modified in Set value.

The detail of each column is as follows:

  • [Level]: the priority level of error. Currently HIGH by default.
  • [Modify]: if checked, the corresponding entity will be modified with the value of the [Edit] column.
  • [Input Entities]: if input entities of a process in [Process] have a problem, they will be listed.
  • [Process]: the process which is incorrect. The problematic one may be input entities, the process itself, output entities, or/and connector. The [Property] explains the reason.
  • [Output Entities]: if output entities of a process in [Process] have a problem, they will be listed.
  • [Connector]: which connectors are incorrect, if any.
  • [Property]: which property is incorrect. For details, see next section.
  • [Description]: short description about the reason of error.
  • [Current]: the current value of the corresponding entity's ([Input/Output Entities]) property ([Property]).
  • [Proposed]: the recommended value for the corresponding entity's property. For details, see next section.
  • [Edit]: the value in [Proposed] will be shown by default. it can be changed by Set value.
  • [Biological Event]: the CSO term for each process, e.g. ME_Binding.
  • [Error]:
  • [Entities]: the list of all [Input Entities] and [Output Entities].

Set value

To modify the model, use this pane.

  • Recommended Value: If there are recommended values, the radio button is on and the values are shown as a combo box. It is the same with the [Proposed] value.
  • Selectable Value: If the recommended value is not what you want, click the radio button and select one from the combo box. Selectable values are listed if the CSO term (already defined) is not correctly selected, for example, cell compartment, entity type, etc.
  • Customized Value: If there is no recommended value, a user defined value will be given here. First click the radio button for Customized Value, give a value, and click Set. Please check the value of the [Edit] column to confirm whether the user defined value is set.
  • Entity to be changed: It lists entities to be modified. If there is only one entity, it is checked by default. If there are multiple candidates for modification, please select one among them to be modified.

[Property] and [Proposed]

The validator checks several properties of a process and its related entities and connectors. Each process has its biological event name. If the biological event is "Binding," this process should have at least two input entities for binding and one output entity whose type is complex. The full list of rules for each process is given later.

For correction, if there is a suggestion, it will be shown in [Proposed]. In the following, each table shows the name of [Property], its description, and the way to modify.

TYPE_OF_ENTITY to constrain the type of entity.
For some processes, a specific entity type is needed.
For DNABinding, it should have one DNA as input and one complex as output. If none of input entities is DNA, the input entities are shown in [Input Entities] and DNA is shown in [Proposed]. In this case, the candidate entities of DNA are listed “Entity to be changed”. To correct it, in Set value, and select one entity to be DNA.
CONNECTORTYPE to constrain the type of connector.
To check whether the CSO class for connectors is correct or not. For biological models, just follow the recommended values.
But, currently the validator first checks this property. If it is satisfied, other validations are not executed. So first correct this property and then run again CSOvalidator.
CELLCOMPONENT to constrain whether the entities location is correct.
For some processes, the location of entities is important.
For Transcription, the location should be nucleus which is a recommended value.
For Translocation, the locations of input and output entities should be different. If two locations are same, a user selects different location for one of two entities. In this case, no recommendation is provided.
FEATURETYPEto constrain whether the entity has a correct post-modification information.
For some processes, an output entity should have a feature type such as FT_ActivatedResidue after activation, FT_PhosphorylatedResidue after phosphorylation. For this property, just select recommended value after selecting one output entity. In most cases, it will be ok.
XREF to constrain whether the input and output entities have the same external references.
For this property, it only shows current values. The correct external references should be given by a user.
STOICHIOMETRY to constrain whether the stoichiometric coefficient is correct.
The default value is 2 for dimerization. By definition, more than 3 and less than 20 for oligomerization and more than 21 for polymerization. So a natural number should be given by a user.
NAME to constrain whether the entity has the correct name.
For example, GDPGTPExchange should have entities whose names are GDP and GTP. This is needed because GDP and GTP are names, not the type of entity (their type is SmallMolecule).
cardinality to constrain whether the number of entity is correct.

For some processes, the number of input/associate/inhibitor/output entities should be constrained. It is represented such as INPUTPROCESSBIOLOGICALCONNECTORS which means that the number of inputprocess connectors for input entities is incorrect (or missing). In such case, GUI cannot handle this problem because some entities are missing. For this case, check the model again.

Buttons

There are several “Select” buttons as follows:

  • All : to check [Modified] for all rows
  • Selected: to check [Modified] for selected multiple rows. Multiple rows can be selected by mouse. For example, for feature type property, first click [Property] column in the first row, then the values are ordered. If you want to accept recommended values for the same [Property] rows, it will be helpful.
  • Unselect selected: to uncheck [Modified] for selected multiple rows.
  • None: to uncheck [Modified] for all rows.
  • Red: to check [Modified] for all rows with HIGH level.
  • Orange: to check [Modified] for all rows with some level. Currently no such rows. Other buttons are as follows:
  • Edit and close: if there are rows to be modified, click this button.
  • Cancel and close: if there are no rows to be modified, click this button.
  • Display model: to display the given model. If one row is selected, the related elements are highlighted in the Display model window. It is useful to browse the model without CIO.

Example

Tips

  • Sorting by [Process] (by clicking the title of [Process] column) lets you know how many errors are found for a process. Note that if one entity whose type is incorrect is involved in two processes, the related error will be shown at least two times, because the error report is generated based on processes. So if one row is modified, it will be ok.
  • Actually this table can be used to present which part is incorrect in the given model. You may directly modify the model via CIO (Cell Illustrator Online). If some errors are repeated and it is tedious to modify via CIO, such as featuretype error, this GUI will be helpful.
  • For missing entities, the modification should be done by manual. The program doesn’t know how to do.
  • Sometimes, one error occurs indirectly. That is, other conditions are not satisfied and then the error may occur. For this case, check the model via CIO.
  • [Process] and [Input/Output Entities] show the ID, not name. The name is shown in Error description after the ID with ().

Case 1

In the example (activation.cso), there are 4 warnings. The next figure shows that the rows are sorted by [Process].

validator2.jpg

  • The first row shows that there is no input entity for p4. p4 is an UnknownActivation process which needs at least one input entity. e64 is an input association entity, not input process entity. In this case, the connector may be changed into InputProcessBiological in Element Settings in CIO, or e64 is actually an input association entity and then a new input entity is added to the model.
  • The second row shows that e64 output entity of p4 has no a feature type. For this type of error, just check [Modified]. It will be OK.
  • The third row shows that p5 (DNA binding process) needs at least one COMPLEX as an output entity.
  • The fourth row shows that p5 (DNA binding process) needs at least one DNA as an input entity. When clicking the row, there are two candidates for DNA, i.e. e65 and e67 in the below of Set value . One of them should be changed into DNA.