java - Encodings get messed up when writing to file -


i'm downloading html files web in different encodings (which know beforehand) , need write them file encoded utf-8. stuff omitted brevity:

try {     url = new url(urlstring);     = url.openstream();     buf = new bufferedreader(new inputstreamreader(is, charset));     while ((line = buf.readline()) != null) {         text.append(line);     } } catch (malformedurlexception mue) {     ... } {     ... } return text.tostring(); 

if strings in java encoded utf-16, should read whole page charset (e.g. windows-1252) , store in string object (in case stringbuilder) re-encoded utf-16.

now write exact same string file:

file file = new file(savepathhtml + filename); try {     fileutils.writestringtofile(file, text, "utf-8"); } catch (ioexception ex) {     logger.error(ex); } 

when opening file, there's gibberish , symbols, indicating encoding somehow messed (e.g.   turns Ä ).

have misunderstood how encoding works when working files or strings?

i've discovered none of text editors can correctly identify encoding when opening downloaded files.

if open file , select encoding utf-8, looks fine.


Comments