i'm working on naive bayes solution c# there 2 possible outcomes. have found small sample code wondering if able explain last line.
the analyzer finding probability word belongs 1 of 2 categories
cat1count number of words found in category 1 ( if word found 2 times in category 1 2 / total words in category 1)
cat1total = total number of words in category 1
as understand it, bw probability word in category 1 , gw probability word in category 2
pw , fw start bit lost. full source code can found here.
float bw = cat1count / cat1total; float gw = cat2count / cat2total; float pw = ((bw) / ((bw) + (gw))); float s = 1f, x = .5f, n = cat1count + cat2count; float fw = ((s * x) + (n * pw)) / (s + n); what fw? understand bw, gw, , pw are.
this code called on , on again each particular word w in text (e.g. tweet) being analyzed. variables conditional probabilities estimated using frequencies.
bw probability word w seen given word category 1 text
gw probability word w seen given word category 2 text
pw rescales probability bw seen words on similar scale seen words (mathematically, division indicates pw conditional probability)
fw shifts scale pw can't 0 (or one). if, example, pw=0 , n=10, fw = ((1 * 0.5) + (10 * 0)) / (1 + 10) = 0.045. (in general, way understand code play around different numbers , see happens.)
in naive bayes, may know, conditional probabilities multiplied (in case via logprobability function in github analyzer.cs file pointed me at), you're in trouble if have 0 conditional probability anywhere in multiplications, because end result zero. so, it's common practice substitute small number instead of zero, purpose of fw.
Comments
Post a Comment