INGOR
Loading...
Searching...
No Matches
INGOR Manual

About input data file

INGOR accepts the input data matrix written in GDF format. See document of the GDF programming interface.

Options

--input-args key [ =value, ... ]
-I key [ =value, ... ]

Input file arguments. See document of ytGDF_read_fp() for available arguments.

-o file
--output-file file

Output file name. If the bootstrap mode is enabled, this is used as a file name prefix.

--output-type format
-t format

Output file format. See Network File Formats for available formats. If this is omitted, the file name suffix (extention) is used to determine the output file format.

--output-args key [ =value, ... ]
-O key [ =value, ... ]

Arguments for output file format.

--dynamic
-y

Dynamic mode. This option estimates a dynamic Bayesian network where the relationships between time points are fixed among consecutive time points.

--dbn
-Y

Time expaneded (T-time step) dynamic model mode. The resultant network consists of p × T nodes where p is the number of variables in the input data and T is the number of time points (secondary IDs).

--sdh
-H

Time expanded (T-time step) dynamic and static hybrid model mode. In addition to edges between consecutive time points, this allows edges within the same time point.

-p n

The maximum number of parent candidates used in the greedy (HC) algorithm. (default: n=10)

--pc n

The maximum number of continuous parent candidates. Uses can different values for the -p and --pc options. If this is not specified, the value of the -p option is used.

--pd n

The maximum number of discrete parent candidates. Uses can different values for the -p and --pc options. If this is not specified, the value of the -p option is used.

-m n

The maximum number of parents. (default: n=10)

--mc n

The maximum number of continuous parents. This precedes the -m option if only continous values are included in the input data set. If this is not specified, the value of the -m option is used.

--md n

The maximum number of discrete parents. This precedes the -m option if only discrete values are included in the input data set. (default: n=2)

-a ( cghc | sdhybrid | nnsr )

Structure search algorithm.
cghc (defalut): Combinatorial greedy hill-climbing algorithm. See Combinatorial Greedy Hill-Climbing Algorithm for details.
sdhybrid : Static-dynamic hybrid greedy hill-climbing algorithm. See SDHybrid: Static-dynamic hybrid greedy algorithm for details.
nnsr : Neighbor Node Sampling & Repeat algorithm. This requires MPI-enabled INGOR. See Neighbor Node Sampling and Repeat Algorithm for details.

-A key [ =value, ... ]

Arguments for the algorithm. See documents of the algorithms for available arguments.
The available algorithms are described in the description of the -a option above.

--score-args key [ =value, ... ]
-S key [ =value, ... ]

Arguments for the score function. See documents of scores listed Network Scores section for available arguments.

--bootstrap n
-B n

Performs the bootstrap resampling. Set n (≥ 1) of the bootstrap ID. There are several modes of bootstrap resampling. See document of the "--bs-mode" option below. This ID is used for the file name suffix, and random seed. The same ID causes the same resampled set of input data, and thus results in the same network generation. Therefore, please be very carefull for specifying this ID when you run multiple network estimation simultaneously.

--bs-mode ( pseudo | pid | list)
--B-mode ( pseudo | pid | list)

Bootstrap mode. Specify the number of blocks sampled for single network estimation by the --blocks option. The definition of a block depends on the mode and described also below.

pseudo (default)
Resamples primary IDs within the samples of the same secondary IDs. This is for the dynamic model only. For example, if you have 24 samples consiting of 8 time points with 3 time replicated data (triplicate), then this mode resamples 1 sample from 3 replicates for each time point. Therefore, in this case, 1 block becomes a set of 8 consecutive samples.
pid
Resamples blocks of the same primary ID samples. In other words, it resamples primary IDs, therefore all consecutive samples with the same primary ID are selected as a single block. The number of blocks, i.e. parimary IDs, sampled for single network estimation can be specified by the "--blocks" option.
list
This realizes a kind of block resampling where each block consisting of a set (list) of primary IDs. The list of primary IDs sampled together is given by a file specified by the "--bs-file" option. In a file, each line consists of a tab-delimited list of primary IDs that are resampled together.

--blocks n
-b n

The number of blocks to resample for bootstrap resampling.

--bs-file file
--B-file file

File read by the list bootstrap resampling mode. See the explanation of the list mode bootstrap in the document for the --bs-mode option.

-N n

The number of iterations of network estimation. This is used with -B for performing the single-process bootstrap method.

--single-file ( on | off )

If on, estimated multiple networks are output in a single file in the bootstrap method. This is useful to reduce the number of files when you perform the bootstrap by a single process. By default, on is set.

--cons file
--constrain file
-c file

Reads the constrain structure (graph) from the file. The algorithm only searches for edges on the constrain graph. If not specified, the complete graph is assumed. Users can specify more than one constrain graph file. In such a case, edges that exist in all constrain graphs remain in the final constain graph. This can be done by specifying multiple --cons options or by multiple files concatenated with a delimiter ":" (colon) for a single --cons option. For example, "--cons file1:file2:file3" and "--cons file1 --cons file2 --cons file3" are the same.

--cons-type ( parents | no_parents | format )
--constrain-type ( parents | no_parents | format )
--c-type ( parents | no_parents | format )

Constrain graph file type (format). The following types or general file formats can be specified. If the general format is specified, edges connected to nodes that do not appear in the data set are not restricted. To do change this default behaviour, add notfound=restrict in the argument of the --cons-args option.
parents: the file is a list of node names that can be parents of other nodes.
no_parents: the file is a list of node that cannot have parents.
format: Any network file format listed in Network File Formats.
As well as the "--cons" option, this also can be specified multiple times or multiple types concatenated by a colon can be specified for a single "--cons-type" option to provide constrain graph types for multiple files. If the number of types is less than the number of specified files, the last type is used.

--cons-args key [ =value, ... ]
--constrain-args key [ =value, ... ]
--c-args key [ =value, ... ]

Arguments for reading the constrain graph file. As well as the "--cons" option, this also can be specified multiple times for specifying different arguments for different files. If the number of occurrences of this option is less than the number of specified files, the last one is used.

--cons-write file

Saves the generated constrain graph in the specified file.

--cons-write-type format

File format for the "--cons-write" option. See Network File Formats for available formats. If the file name has an extension identical to the format type, this can be omitted.

--cons-write-args key [ =value, ... ]

Argument for the network format for writing constrain graph.

--fixed file

Reads the predefined fixed structure of the network.

--fixed-type format

Network file format of the fixed network file (--fixed).

--fixed-args key [ =value, ... ]

Arguments for reading the fixed network.

--output-data file_prefix

Specifies to output data values used for modeling. If specified, six files are generated: file_prefix.X, file_prefix.Y, file_prefix.PR.Y, file_prefix.LL, file_prefix.Z, and file_prefix.D. These files represents, input (explanatory) values, target (objective) values, target partial residual values, log likelihood values, Z scores, and sample deviences, respectively. Each row in the first three files corresponds to values of an edge. The order of edges are the same as ones in TXT format. The rest of files consist of p rows and n columns where p is the number of variables, and n is the number of samples used during the modeling. The number of columns (n) of all the files are the same. If --stdout is specified and "STDOUT_n" is specified as prefix, the six files are regarded as STDOUT_n, STDOUT_n+1, ..., and STDOUT_n+5 where n is the index number. Therefore, these six indices will be reserved for the standard output buffer.

--fix-range

Fixes modeling value ranges for B-spline nonparametric regression.

--read-range file

Fixes modeling value ranges for B-spline nonparamatric regression by a file. The file is a tab-separated text file. Each line consists of three columns: Node name, left-most (minimum) value of the range for that node, and the right-most (maximum) value.

--total-mem n

Total memory limination in mega bytes (MiB).

--seed n

Randon number seed. If not specified, the current time is used as the seed. The seed number is adjusted by the bootstrap ID. To disable this behaviour, use the --bs-seed-adjust option.

--bs-seed-adjust (on | off)

Enables or disables random seed adjustment for bootstrap. The default is "on". Generally, the random seed is adjusted by the bootstrap ID so that the user only need to specify a fixed random seed for multiple processes in order to reproduce the bootstrap results. However, this is inconvenient to reproduce a certain single iteration of multiple bootstrap estimation by a single process. Use this with "-A reset=off" for that purpose.

--stdin

Enables to read input data sets from the standard input. Use "STDIN_n" as a file name to specify the data sets to read, where n (≥1) represents the index of the multiple data sets. If this option is specified, multiple input data needs to be given to the standard input and these data sets need to be separated by the file separator (FS) control character that is 28 in decimal ascii code.

--stdout

Enables to output multiple files into the standard output. Use "STDOUT_n" as a file name to specify the order of the output files by n (≥ 1). Multiple output files written into the standard output are seperated by the file separator (FS) control character that is 28 in decimal ascii code.

--stdio

Same as specifying both --stdin and --stdout.

--show-data-stat

Prints input data statistics to the log file.

-L ( 0 | 1 | 2 )

Log mode. By default, -L 0 is assumed.
0 : Automatic mode. For bootstrap, only the process with the bootstrap ID=1 outputs logs in file_name.log where file_name is a file name given by the -o option or --log option. Other processes drops log messages. If non bootstrap, log messages are output in the standard error.
1 : Forces all processes to output logs in the standard error.
2 : Forces all processes to output logs in file file_name.log.XXXXXX where XXXXXX is a six-digit, zero-filled bootstrap ID. If non bootstrap, the ID is 0. For the MPI-enabled execution, the ID corresponds to the MPI process rank number.

--log file_name
Log file name. If this is not specified, the file name given by the -o option is used.


Network Scores

  • BNDC : Continuous-discrete mixed nonparametric model
  • BNRC : B-spline nonparametric regression model

Network File Formats

Here are the list of network file formats. Arguments available for reading and writing a file can be found in programming documents linked in the items below.

  • ING : JSON based INGOR native network file format
  • SGN3 : SiGN-BN compatible network file format
  • TXT : Parent child pair edge list
  • PaList : Parent list network file format
  • NodeList : Node list

Filters

The network filters are applied to the network after the estimation. If the multiple network filters are specified, they are applied to the network in the order of their appearances. If no input data set is specified, the empty network is passed to the first filter. Typical, you may specify ReadFilter or RNDNetworkFilter that both generate a network without input data set.

  • --read : ReadFilter - Reading a network from a file.
  • --write : WriteFilter - Writing a network to a file.
  • --edgeprop : EdgePropFilter - Extracting edges with a property condition.
  • --comp : CompFilter - Comparing a network to another.
  • --subnet : SubnetFilter - Extracting a subnetwork.
  • --bs : BSFilter - Compiling bootstrapped networks.
  • --score : ScoreFilter - Calculating the network score.
  • --search : SearchFilter - Searching paths.
  • --pr : PRFilter - Calculating the partial residuals.
  • --prc : PRCFilter - Performing muliple comparison with different thresholds.
  • --rndnet : RNDNetworkFilter - Generating a random network.
  • --gendata : GenDataFilter - Generating a simulated data set.
  • --npartite : NPartiteFilter - Converting the structure corresponding to a n-partite dynamic model.
  • --dag : DagFilter - Extracting a DAG from a network.
  • --ec : EdgeContribFilter - Calculates edge contributions.
  • --layout : LayoutFilter - Applying a layout algorithm.
  • --scorestest : ScoreTestFilter - Tests scores.
  • --mcmc : MCMCFilter - Markov chain monte carlo method. (Test & eveluation only).
  • --status : StatusFilter - Prints network information.
  • --filecheck : FileCheckFilter - Checks the multiple networks in a file.

Contents