trying spellchecker came across online work, no luck. appreciated. original code http://norvig.com/spell-correct.html
import re, collections, codecs def words(text): return re.findall('[a-z]+', text.lower()) def train(features): model = collections.defaultdict(lambda: 1) f in features: model[f] += 1 return model file = codecs.open('c:\88888\88888\88888\88888\8888\a word.txt', encoding='utf-8', mode='r') nwords = train(words(file.read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word): splits = [(word[:i], word[i:]) in range(len(word) + 1)] deletes = [a + b[1:] a, b in splits if b] transposes = [a + b[1] + b[0] + b[2:] a, b in splits if len(b)>1] replaces = [a + c + b[1:] a, b in splits c in alphabet if b] inserts = [a + c + b a, b in splits c in alphabet] return set(deletes + transposes + replaces + inserts) def known_edits2(word): return set(e2 e1 in edits1(word) e2 in edits1(e1) if e2 in nwords) def known(words): return set(w w in words if w in nwords) def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=nwords.get) error:
file "c:\8888\8888\8888\8888\88888\spellcheck.py", line 11 file = codecs.open('c:\888\888\888\8888\88888\a word.txt', encoding='utf-8', mode='r') ^ syntaxerror: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uxxxxxxxx escape
ok, let's try this... string value '\x' , try or try
string('\x.....') returns error right?
so if have string defined say
x = string('\y\o\u \c\a\n \n\e\v\e\r \c\h\a\n\g\e \t\h\i\s \i\n \p\y\t\h\o\n') than out of luck. bummer if user decides type '\' character of input.
to fix problem try using looping or recursive code like: how remove illegal characters path , filenames?
Comments
Post a Comment