INGOR
Loading...
Searching...
No Matches
Public Member Functions | List of all members
BNDC Class Reference

The BNDC score function. More...

#include <BNDC.h>

Public Member Functions

void * BNDC_init (ytData *data, ytKeyValues *args)
 Initializes the BNDC score.
 
void BNDC_reinit (void *buff, ytData *data)
 Re-initializes the score function.
 
double BNDC_score (void *buff, const int j, const int *parents, const int q)
 BNDC score function.
 
int BDE_count (const BNRC_Data *D, int j, int n, const int *parents, int q, int R, BDE_Data *B)
 
void BNDC_prepare (const BNRC_Data *D, const BDE_Data *B, int j, const int *parents, int q, int R, int lv, BNRC_Data *DD, int **offset, int **nn, const double *Yd, double *YYd)
 Prepares for discrete parent and child value pattern specific modeling.
 
void BNDC_partialResidual (void *buff, const int j, const int *parents, const int q, FILE **fp, double *llar)
 Calculates partial residuals.
 
void BNDC_setEdgeProp (void *buff, const int j, const int *parents, const int q, const int k, const ytEdge *edge)
 Sets parameters from ytEdge.
 
void BNDC_calcEdgeContrib (void *buff, ytNetwork *network, int j, const int *parents, int q, int mode, double *ar)
 Calculates edge contribution for edges connected to a node.
 

Detailed Description

The BNDC score function.

The BNDC score function allows the calculation of the network scores for discrete and continuous mixed data.

See document of BNDC_init() for the available arguments of the score function.

See also
BNRC

About Node and Edge Properties

After the network estimation and/or score calculation, followng node and edge properties are added. These are included in a network output file.

Node properties

bndc.parents.continuous
Integer array. The list of indices of the continous parent nodes. The origin of the index is 0.
bndc.parents.discrete
Integer array. The list of indices of the discrete parent nodes. The origin of the index is 0.
bndc.categories
String array. Available for a categorical variable (node) only. The list of categories (string expressions of categorical values) of this node. Categorical values are internally converted to discrete (integer) values begining from 0. This is the table of such values.
bndc.r
Integer. The number of discrete values (categorical values) of this node.
mean
Real. Available for a continuous node only. This represents the mean of risiduals. If continous parents exist, this becomes 0 (zero).
variance
Real. Available for a continuous node only. This represents the variance of residuals.
bndc.cpd
Real array. Available for a discrete variable. The length of the array is R × R1 × ··· × Rq where q represents the number of discrete parent nodes, R the number of discrete values of this node, and Rk the number of discrete values of the k-th discrete parent node. This is the vectorized conditional probability table. That is, the values are probabilities of the discrete values of this node with respect to its parent value patterns. The order of the parent value patterns is the lexicographical order of parent values, i.e., (0, 0, ..., 0), (0, 0, ..., 1), ..., (0, 0, ..., Rq − 1), ..., (R1 − 1, R2 − 1, ..., Rq − 1), where (x1,...,xq) represents a single pattern of parent node values, and xk is the value of the k-th parent.
bndc.bspline
Real array. Available for a continuous node with both continuous and discrete parents. The length of the array is 22 × qc × R1 × ··· × Rq where qc represents the number of continuous parent nodes. The 22 values are a parameter set consisting of the range min, max and 20 coefficients of B-splines of a single regression curve. The first 22 × qc values of the array are the parameter sets of the qc continuous parents with respect to the first pattern of the discrete parents, and the order of patterns is the lexicographical order of discrete parent values. See the description of bndc.cpd above for the details of the order of the parent pattern.
bndc.means
Real array. Available for a continuous node with both continuous and discrete parents. The legnth of the array is qc × R1 × ··· × Rq. These are the mean parameters of the B-spline curves. The order of the values are the same as bndc.bspline.
bndc.vars
Real array. Available for a coninuous node with both continuous and discrete parents. These are the variance parameters of the B-spline curves. See bndc.means about the length and the order of the array.

Edge properties

bspline
Real array. Available for an edge connecting between continuous nodes without any discrete parents. The length of the array is 22 consising of range min, max, and 20 coefficients of B-splines of a single curve.

Member Function Documentation

◆ BDE_count()

int BDE_count ( const BNRC_Data D,
int  j,
int  n,
const int *  parents,
int  q,
int  R,
BDE_Data B 
)

brief Counts samples with respect to parent patterns.

This counts up the number of samples with respect to parent value patterns.

The number of samples with pattern 'm' is stored in B->Njm[m]. If R != 1, the number of samples with the parent pattern and the value of the target variable 'j' (D->Y) is stored in B->Njmr[r + m * R], where R represents the number of categories of the target variable 'j', 'm' the parent pattern, and 'r' the target value.

If R == 1, it means that the j -th variable is continous. If so, B->Njmr[m * R] would be idential to B->Njm[m].

Thus, The order of memory for B->Njmr would be: INDEX 0 1 2 3 4 5 m 0 0 0 1 1 1 r 0 1 2 0 1 2

Parameters
yWhether or not this counts up Njmr.

◆ BNDC_calcEdgeContrib()

void BNDC_calcEdgeContrib ( void *  buff,
ytNetwork network,
int  j,
const int *  parents,
int  q,
int  mode,
double *  ar 
)

Calculates edge contribution for edges connected to a node.

gamma[0] needs to be set before calling this.

Parameters
ar(n x q) matrix where q is the number of parents of the j-th variable. ar[i + k * n] represents the value for the k-th parent of the j-th variable at the i-th sample.
mode0: RC, 1: RCr, 2: ECv

◆ BNDC_init()

void * BNDC_init ( ytData data,
ytKeyValues args 
)

Initializes the BNDC score.

Arguments

prop=n

Property output type. This changes properties to be stored in edges and nodes after the estimation.
0: standard.
1: less information (simple) for the contitous or discrete only variable data set.

hyper_bg=x
hb=x
hb value for the hyperparameter range. The hyperparameter β is determined by the grid search. The grid search starts from β = 10hb and then decreases the value such as β = 10hb − (i × hi ), where i = 1, 2, ...., hn. (default: x = 2.0)
hyper_inc=x
hi=x
hi value for hyperparameter range. See explanation of hyper_bg. (default: x = 0.4)
hyper_n=n
hn=n
hn value for hyperparameter range. See explanation of hyper_bg. (default: n = 21)
linear
Linear mode. This is short for "hb=2.0,hi=1.0,hn=2".
lv=n
Precalculation level of the BNRC score. n = 0 ∼ 3 can be specified. (default: n = 3)
outer=x
Width of the outer region of the value range for B-spline nonparametric regression. (default: x = 0.0000001)
ecv_clip=x
Replaces the clipped edge contribution value with the specified value. Value x can be a specific real value or "nan".
max_loops=n
The maximum number of loops for parameter estimation by the back fitting algorithm. (default: n = 100)
stop
If specified, the algorithm stops if the parameter estimation is not converged.
verbose=n
v=n
Verbose level. By defalt, n=0.

The following keys are used internally.

max_parents=n
mp=n

Maximum parents. Basically, this is set automatically by the structure search algorithm.

max_cont_parents=n

Maximum continuous parents

max_disc_parents=n

Maximum discrete parents

xl
DoubleArray instance that defines the minimum (left-most) values of the value ranges for modeling with B-splines.
xr

DoubleArray instance that defines the maximum (right-most) values of the value ranges for modeling with B-splines.

max_mem
size_t value of the maximum memory.
Parameters
data
args
Returns
Pointer to the score buffer.

◆ BNDC_partialResidual()

void BNDC_partialResidual ( void *  buff,
const int  j,
const int *  parents,
const int  q,
FILE **  fp,
double *  llar 
)

Calculates partial residuals.

This assumes BNDCBuff::gamma (bf->gamma) to be set already by BNDC_score() with the same j-th node and its parents.

This uses (breaks) following working variables.

  • RS
  • B_gamma
  • tmp_B_gamma
  • T_B

Outputs input, target and partial residuals.

  • fp[0] : .X
  • fp[1] : .Y
  • fp[2] : .PR.Y
  • fp[3] : .LL (Log likelihood)
  • fp[4] : .Z (Z score)
  • fp[5] : .D (sample deviance)
Parameters
buffpointer to the score buffer (BNDCBuff instance).
llarpointer to the array to store log likelihoods for particular primary IDs. This is not applicable if the dataset does not have primary id. NULL acceptable. If so, it does not calculate primary id specific log likelihoods.

◆ BNDC_prepare()

void BNDC_prepare ( const BNRC_Data D,
const BDE_Data B,
int  j,
const int *  parents,
int  q,
int  R,
int  lv,
BNRC_Data DD,
int **  offset,
int **  nn,
const double *  Yd,
double *  YYd 
)

Prepares for discrete parent and child value pattern specific modeling.

Call BDE_count() before calling this.

Parameters
Doriginal data matrices.
DDresultant displaced data matrices.
nnwork area for displacing samples.

◆ BNDC_reinit()

void BNDC_reinit ( void *  buff,
ytData data 
)

Re-initializes the score function.

This is only applicable if only the values in data are changed.

Parameters
buffpointer to the buffere returned by BNDC_init().
dataOnly ytData::X and ytData::Y are used.

◆ BNDC_score()

double BNDC_score ( void *  buff,
const int  j,
const int *  parents,
const int  q 
)

BNDC score function.

This is thread-safe with OpenMP multi-threading. Thread-dependent working memory is allocated in buff returned by BNDC_init(). Therefore, you do not need to initialize and obtain the score buffer for different threads.

Parameters
buffscore buffer returned by BNDC_init().
jtarget node.
parentsparent indices.
qlength of parents.
Returns
BNDC score

◆ BNDC_setEdgeProp()

void BNDC_setEdgeProp ( void *  buff,
const int  j,
const int *  parents,
const int  q,
const int  k,
const ytEdge edge 
)

Sets parameters from ytEdge.

BNDC_setNodeProp() should be called before calling this.

Sets coefficients (gamma) from "bspline" properties, and calculates B gamma (B_gamma) where B represents the design matrix.

By calling this, BNDC_partialResidual() can calcualtes valid partial residuals without the score calculation.


The documentation for this class was generated from the following file: