i'm downloading html files web in different encodings (which know beforehand) , need write them file encoded utf-8. stuff omitted brevity:
try { url = new url(urlstring); = url.openstream(); buf = new bufferedreader(new inputstreamreader(is, charset)); while ((line = buf.readline()) != null) { text.append(line); } } catch (malformedurlexception mue) { ... } { ... } return text.tostring(); if strings in java encoded utf-16, should read whole page charset (e.g. windows-1252) , store in string object (in case stringbuilder) re-encoded utf-16.
now write exact same string file:
file file = new file(savepathhtml + filename); try { fileutils.writestringtofile(file, text, "utf-8"); } catch (ioexception ex) { logger.error(ex); } when opening file, there's gibberish , symbols, indicating encoding somehow messed (e.g. turns Ä ).
have misunderstood how encoding works when working files or strings?
i've discovered none of text editors can correctly identify encoding when opening downloaded files.
if open file , select encoding utf-8, looks fine.
Comments
Post a Comment