math - Calculation of probabilities in Naive Bayes in C# -


i'm working on naive bayes solution c# there 2 possible outcomes. have found small sample code wondering if able explain last line.

the analyzer finding probability word belongs 1 of 2 categories

cat1count number of words found in category 1 ( if word found 2 times in category 1 2 / total words in category 1)

cat1total = total number of words in category 1

as understand it, bw probability word in category 1 , gw probability word in category 2

pw , fw start bit lost. full source code can found here.

        float bw = cat1count / cat1total;         float gw = cat2count / cat2total;         float pw = ((bw) / ((bw) + (gw)));         float             s = 1f,             x = .5f,             n = cat1count + cat2count;         float fw = ((s * x) + (n * pw)) / (s + n); 

what fw? understand bw, gw, , pw are.

this code called on , on again each particular word w in text (e.g. tweet) being analyzed. variables conditional probabilities estimated using frequencies.

bw probability word w seen given word category 1 text

gw probability word w seen given word category 2 text

pw rescales probability bw seen words on similar scale seen words (mathematically, division indicates pw conditional probability)

fw shifts scale pw can't 0 (or one). if, example, pw=0 , n=10, fw = ((1 * 0.5) + (10 * 0)) / (1 + 10) = 0.045. (in general, way understand code play around different numbers , see happens.)

in naive bayes, may know, conditional probabilities multiplied (in case via logprobability function in github analyzer.cs file pointed me at), you're in trouble if have 0 conditional probability anywhere in multiplications, because end result zero. so, it's common practice substitute small number instead of zero, purpose of fw.


Comments