INGOR
|
INGOR accepts the input data matrix written in GDF format. See document of the GDF programming interface.
--input-args
key [ =
value, ... ] -I
key [ =
value, ... ] Input file arguments. See document of ytGDF_read_fp() for available arguments.
-o
file --output-file
file Output file name. If the bootstrap mode is enabled, this is used as a file name prefix.
--output-type
format -t
format Output file format. See Network File Formats for available formats. If this is omitted, the file name suffix (extention) is used to determine the output file format.
--output-args
key [ =
value, ... ] -O
key [ =
value, ... ] Arguments for output file format.
--dynamic
-y
Dynamic mode. This option estimates a dynamic Bayesian network where the relationships between time points are fixed among consecutive time points.
--dbn
-Y
Time expaneded (T-time step) dynamic model mode. The resultant network consists of p × T nodes where p is the number of variables in the input data and T is the number of time points (secondary IDs).
--sdh
-H
Time expanded (T-time step) dynamic and static hybrid model mode. In addition to edges between consecutive time points, this allows edges within the same time point.
-p
n The maximum number of parent candidates used in the greedy (HC) algorithm. (default: n=10
)
--pc
n The maximum number of continuous parent candidates. Uses can different values for the -p
and --pc
options. If this is not specified, the value of the -p
option is used.
--pd
n The maximum number of discrete parent candidates. Uses can different values for the -p
and --pc
options. If this is not specified, the value of the -p
option is used.
-m
n The maximum number of parents. (default: n=10
)
--mc
n The maximum number of continuous parents. This precedes the -m
option if only continous values are included in the input data set. If this is not specified, the value of the -m
option is used.
--md
n The maximum number of discrete parents. This precedes the -m
option if only discrete values are included in the input data set. (default: n=2
)
-a
( cghc
| sdhybrid
| nnsr
) Structure search algorithm.
cghc
(defalut): Combinatorial greedy hill-climbing algorithm. See Combinatorial Greedy Hill-Climbing Algorithm for details.
sdhybrid
: Static-dynamic hybrid greedy hill-climbing algorithm. See SDHybrid: Static-dynamic hybrid greedy algorithm for details.
nnsr
: Neighbor Node Sampling & Repeat algorithm. This requires MPI-enabled INGOR. See Neighbor Node Sampling and Repeat Algorithm for details.
-A
key [ =
value, ... ] Arguments for the algorithm. See documents of the algorithms for available arguments.
The available algorithms are described in the description of the -a
option above.
--score-args
key [ =
value, ... ] -S
key [ =
value, ... ] Arguments for the score function. See documents of scores listed Network Scores section for available arguments.
--bootstrap
n -B
n Performs the bootstrap resampling. Set n (≥ 1) of the bootstrap ID. There are several modes of bootstrap resampling. See document of the "--bs-mode
" option below. This ID is used for the file name suffix, and random seed. The same ID causes the same resampled set of input data, and thus results in the same network generation. Therefore, please be very carefull for specifying this ID when you run multiple network estimation simultaneously.
--bs-mode
( pseudo
| pid
| list
) --B-mode
( pseudo
| pid
| list
) Bootstrap mode. Specify the number of blocks sampled for single network estimation by the --blocks
option. The definition of a block depends on the mode and described also below.
pseudo
(default)pid
--blocks
" option. list
--bs-file
" option. In a file, each line consists of a tab-delimited list of primary IDs that are resampled together. --blocks
n -b
n The number of blocks to resample for bootstrap resampling.
--bs-file
file --B-file
file File read by the list bootstrap resampling mode. See the explanation of the list
mode bootstrap in the document for the --bs-mode
option.
-N
n The number of iterations of network estimation. This is used with -B
for performing the single-process bootstrap method.
--single-file
( on
| off
) If on
, estimated multiple networks are output in a single file in the bootstrap method. This is useful to reduce the number of files when you perform the bootstrap by a single process. By default, on
is set.
--cons
file --constrain
file -c
file Reads the constrain structure (graph) from the file. The algorithm only searches for edges on the constrain graph. If not specified, the complete graph is assumed. Users can specify more than one constrain graph file. In such a case, edges that exist in all constrain graphs remain in the final constain graph. This can be done by specifying multiple --cons
options or by multiple files concatenated with a delimiter ":
" (colon) for a single --cons
option. For example, "--cons file1:file2:file3
" and "--cons file1 --cons file2 --cons file3
" are the same.
--cons-type
( parents
| no_parents
| format ) --constrain-type
( parents
| no_parents
| format ) --c-type
( parents
| no_parents
| format ) Constrain graph file type (format). The following types or general file formats can be specified. If the general format is specified, edges connected to nodes that do not appear in the data set are not restricted. To do change this default behaviour, add notfound=restrict
in the argument of the --cons-args
option.
parents
: the file is a list of node names that can be parents of other nodes.
no_parents
: the file is a list of node that cannot have parents.
format: Any network file format listed in Network File Formats.
As well as the "--cons
" option, this also can be specified multiple times or multiple types concatenated by a colon can be specified for a single "--cons-type
" option to provide constrain graph types for multiple files. If the number of types is less than the number of specified files, the last type is used.
--cons-args
key [ =
value, ... ] --constrain-args
key [ =
value, ... ] --c-args
key [ =
value, ... ] Arguments for reading the constrain graph file. As well as the "--cons
" option, this also can be specified multiple times for specifying different arguments for different files. If the number of occurrences of this option is less than the number of specified files, the last one is used.
--cons-write
file Saves the generated constrain graph in the specified file.
--cons-write-type
format File format for the "--cons-write" option. See Network File Formats for available formats. If the file name has an extension identical to the format type, this can be omitted.
--cons-write-args
key [ =
value, ... ] Argument for the network format for writing constrain graph.
--fixed
file Reads the predefined fixed structure of the network.
--fixed-type
format Network file format of the fixed network file (--fixed
).
--fixed-args
key [ =
value, ... ] Arguments for reading the fixed network.
--output-data
file_prefix Specifies to output data values used for modeling. If specified, six files are generated: file_prefix.X
, file_prefix.Y
, file_prefix.PR.Y
, file_prefix.LL
, file_prefix.Z
, and file_prefix.D
. These files represents, input (explanatory) values, target (objective) values, target partial residual values, log likelihood values, Z scores, and sample deviences, respectively. Each row in the first three files corresponds to values of an edge. The order of edges are the same as ones in TXT format. The rest of files consist of p rows and n columns where p is the number of variables, and n is the number of samples used during the modeling. The number of columns (n) of all the files are the same. If --stdout
is specified and "STDOUT_
n" is specified as prefix, the six files are regarded as STDOUT_
n, STDOUT_
n+1, ..., and STDOUT_
n+5 where n is the index number. Therefore, these six indices will be reserved for the standard output buffer.
--fix-range
Fixes modeling value ranges for B-spline nonparametric regression.
--read-range
file Fixes modeling value ranges for B-spline nonparamatric regression by a file. The file is a tab-separated text file. Each line consists of three columns: Node name, left-most (minimum) value of the range for that node, and the right-most (maximum) value.
--total-mem
n Total memory limination in mega bytes (MiB).
--seed
n Randon number seed. If not specified, the current time is used as the seed. The seed number is adjusted by the bootstrap ID. To disable this behaviour, use the --bs-seed-adjust
option.
--bs-seed-adjust
(on
| off
) Enables or disables random seed adjustment for bootstrap. The default is "on
". Generally, the random seed is adjusted by the bootstrap ID so that the user only need to specify a fixed random seed for multiple processes in order to reproduce the bootstrap results. However, this is inconvenient to reproduce a certain single iteration of multiple bootstrap estimation by a single process. Use this with "-A reset=off
" for that purpose.
--stdin
Enables to read input data sets from the standard input. Use "STDIN_
n" as a file name to specify the data sets to read, where n (≥1) represents the index of the multiple data sets. If this option is specified, multiple input data needs to be given to the standard input and these data sets need to be separated by the file separator (FS) control character that is 28 in decimal ascii code.
--stdout
Enables to output multiple files into the standard output. Use "STDOUT_
n" as a file name to specify the order of the output files by n (≥ 1). Multiple output files written into the standard output are seperated by the file separator (FS) control character that is 28 in decimal ascii code.
--stdio
Same as specifying both --stdin
and --stdout
.
--show-data-stat
Prints input data statistics to the log file.
-L
( 0
| 1
| 2
) Log mode. By default, -L 0
is assumed.
0
: Automatic mode. For bootstrap, only the process with the bootstrap ID=1 outputs logs in file_name.log
where file_name is a file name given by the -o
option or --log
option. Other processes drops log messages. If non bootstrap, log messages are output in the standard error.
1
: Forces all processes to output logs in the standard error.
2
: Forces all processes to output logs in file file_name.log.XXXXXX
where XXXXXX is a six-digit, zero-filled bootstrap ID. If non bootstrap, the ID is 0. For the MPI-enabled execution, the ID corresponds to the MPI process rank number.
--log
file_name -o
option is used.
Here are the list of network file formats. Arguments available for reading and writing a file can be found in programming documents linked in the items below.
The network filters are applied to the network after the estimation. If the multiple network filters are specified, they are applied to the network in the order of their appearances. If no input data set is specified, the empty network is passed to the first filter. Typical, you may specify ReadFilter or RNDNetworkFilter that both generate a network without input data set.
--read
: ReadFilter - Reading a network from a file.--write
: WriteFilter - Writing a network to a file.--edgeprop
: EdgePropFilter - Extracting edges with a property condition.--comp
: CompFilter - Comparing a network to another.--subnet
: SubnetFilter - Extracting a subnetwork.--bs
: BSFilter - Compiling bootstrapped networks.--score
: ScoreFilter - Calculating the network score.--search
: SearchFilter - Searching paths.--pr
: PRFilter - Calculating the partial residuals.--prc
: PRCFilter - Performing muliple comparison with different thresholds.--rndnet
: RNDNetworkFilter - Generating a random network.--gendata
: GenDataFilter - Generating a simulated data set.--npartite
: NPartiteFilter - Converting the structure corresponding to a n-partite dynamic model.--dag
: DagFilter - Extracting a DAG from a network.--ec
: EdgeContribFilter - Calculates edge contributions.--layout
: LayoutFilter - Applying a layout algorithm.--scorestest
: ScoreTestFilter - Tests scores.--mcmc
: MCMCFilter - Markov chain monte carlo method. (Test & eveluation only).--status
: StatusFilter - Prints network information.--filecheck
: FileCheckFilter - Checks the multiple networks in a file.