INGOR
|
GenDataFilter generates a simulated dataset for the given network structure. The network structure can be generated by the RNDNetworkFilter.
n=
n The number of samples to generate. If n < 3, then the normalization is automatically disabled because it is impossible to calculate variance.
normalize=
{ on
| off
} (default: on
) Performs the normalization after the data generation.
file=
file The generated data set is put in the specified file. The data set is put in GDF format. See the programming document of ytGDF for details.
args={
key [ =
value ,
... ]}
Arguments for outputting the dataset. See the programming document of ytGDF for the available key and values.
types=
t1:
t2:
... Types of variables. If not specified they are automatically determined at random with the ratio specified by the r
argument. t i is either d
and c
where i represents the index of variables.
disc=
r1:
r2:
... The probabilities of the number of categories for discrete variables. r1 + ... + r N needs to be 1.0, representing that a discrete variable will have i + 1 categories with probability r i. If this is not specified, all the discrete variables have two possible values (categories).
r=
x The ratio of discrete variables.
dehybrid
Specifies to save generated data as dynamic model represented by a bipartite graph.
categorical
Discrete variables are regarded as categorical values.
sd=
x (default: x=0.3) Standard deviation of the system noise. The system noise is generated and added to the calculated data using random values of the normal distribution with the specified standard deviation.
osd=
x (default: x=0) Standard deviation of the observation noise. The observation noise is generated and added to the generated data using random values of the normal distribtion with the specified standard deviation. If 0 is specified, no observation noise is added.
func=
f1:
f2:
... The list of function IDs to be assigned for edges.
use_net_func
Use function IDs for edges in node property "model.func
" of the input network.
dbn
Shrinking time-expanded DBN model data before writing into a file. This assumes mainly time-expanded DBN networks generated by RNDNetworkFilter with "m=dbn
" option.
input=
file Input data file for simulated data. If specified, existing values or non NaN values are regarded as given fixed data values and simulated values are generated for non existing variables and NaN values in the input data set. The variables in the input data set, whose names do not exist in the network will be ignored. If the input data set contains samples less than the value of argument n
, then the k-th sample is used to the i-th generated sample where k = m mod n
and m represents the number of samples of the input data set.
input_args={
key=
values ,
...}
Arguments for input data.
rand_type=
{ normal
| uniform
} (default: rand_type=normal
) EXPERIMENTAL: Type of the random value distribution for noise and nodes with no parents. This only supports continuous variable nodes, and does not support for observation noise.