| concat task on group of utf 8 files w/ BOM | concat task on group of utf 8 files w/ BOM 2007-08-01 - By peter reilly
On 8/1/07, Dominique Devienne <ddevienne@(protected)> wrote: > On 8/1/07, Peter Reilly <peter.kitt.reilly@(protected)> wrote: > > I do not think that filter chains will help here as they > > operate on Readers and not on input streams. > > Actually, that may be why it "should" work. Java knows about optional > BOMs and does the right thing, as long as you tell it that the > encoding is UTF-16.
The encoding is not UTF-16, it is UTF-8. Having a BOM in UTF-8 makes no sense, as a byte cannot be byte ordered. However, it is allowed by the UTF standard as an optional feature. see: http://issues.apache.org/bugzilla/show_bug.cgi?id=28049 and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
Some windows programs (notepad.exe for example) use this character sequence to indicate that the text file is utf-8 encoded as against the windows encoded.
XML readers in java know about the UTF-8 BOMs, but the std java streamreader does not. In old versions of java they woudl throw an exception, new versions convert it to a ? (i think).
Peter
> > Alex, try playing with 'encoding' and 'outputencoding' attributes of > <concat> to see if that gets rid of the BOMs. I suspect the BOMs will > be "eaten up" for the char decoder and no longer appear in the > streams. --DD > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@(protected) > For additional commands, e-mail: user-help@(protected) > >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@(protected) For additional commands, e-mail: user-help@(protected)
|
|
 |