Java Mailing List Archive

http://www.ant-tasks.com/

Apache Ant Archive

» Ant Users List
» Ant Developers List
concat task on group of utf 8 files w/ BOM

concat task on group of utf 8 files w/ BOM

2007-08-01       - By peter reilly
Reply:     1     2     3     4     5     6  

On 8/1/07, Dominique Devienne <ddevienne@(protected)> wrote:
> On 8/1/07, Peter Reilly <peter.kitt.reilly@(protected)> wrote:
> > I do not think that filter chains will help here as they
> > operate on Readers and not on input streams.
>
> Actually, that may be why it "should" work. Java knows about optional
> BOMs and does the right thing, as long as you tell it that the
> encoding is UTF-16.

The encoding is not UTF-16, it is UTF-8. Having a BOM in UTF-8
makes no sense, as a byte cannot be byte ordered. However, it
is allowed by the UTF standard as an optional feature.
see: http://issues.apache.org/bugzilla/show_bug.cgi?id=28049 and
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

Some windows programs (notepad.exe for example) use this character sequence
to indicate that the text file is utf-8 encoded as against the windows
encoded.

XML readers in java know about the UTF-8 BOMs, but the std java streamreader
does not. In old versions of java they woudl throw an exception, new versions
convert it to a ? (i think).

Peter

>
> Alex, try playing with 'encoding' and 'outputencoding' attributes of
> <concat> to see if that gets rid of the BOMs. I suspect the BOMs will
> be "eaten up" for the char decoder and no longer appear in the
> streams. --DD
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@(protected)
> For additional commands, e-mail: user-help@(protected)
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@(protected)
For additional commands, e-mail: user-help@(protected)


©2008 ant-tasks.com - Jax Systems, LLC, U.S.A.