| containsregex and concat | containsregex and concat 2006-11-29 - By George Bills
Thanks Gilbert - except that doesn't work when tables span more than one line. "byline true" splits the text into multiple tokens, and the regex is applied independently to each token. So if the start of the expression (<table>) is on line one, the middle of the expression (<tr>blah</tr>...etc) is on line two, and the end of the expression (</table>) is on line three, then no one individual line / token matches, so nothing comes out (correct me if I'm wrong, but that's what seemed to happen in my testing). If byline is false, the entire text is one big token - so if I match the token, I get the entire token (the original input) back. Also, I wanted the entire table, not just the contents. I tried using "replace="\0"", but that just means that within the token I'm replacing the matching text with the matching text - not very useful.
What I really wanted was a way of saying "give me the matching text and only the matching text, not the token that matches". I sort of solved it by writing a regular expression to match the entirety of input: "(.*?)(<table[^/<]*class="summary"[^/>]>(.*?)</table>)" . With the Ant encoding that ends up as: "(.*?)(<table[^</]*class="summary"[^</]*>(.*?)</table >)(.*)". I don't know that each table will only take one line of input (in fact, they won't), but since I know that there's only one table in each input file, I can match the entire file and use "replace="\2"" to replace the entire match (all input) with the second matching group (the table).
So, that works for one file. The problem I have now is getting it to work for multiple files - each file that I concatenate has exactly one summary table that I want to extract and place in a single HTML summary file. I tried: (A) Concatenating (<concat>) all of the files and applying a filterchain - but the filterchain filters all the input once, not once per file. So I concatenate the files first, then apply the regex - which means I only get the one matching table from the entire concatenation, not one matching table from each file that I concatenate. (B) Copying (<copy>) all of the files to a single file - in this case, the filterchain extracts the individual tables from each file - but I only end up with one file, because I can't make it concatenate them all to one destination (even with a mergemapper). "enablemultiplemappings" doesn't seem to help.
If there was some way of saying "for each file, apply the transform *before* concatenating, not after", then that would work, but as far as I can see, there isn't. Any ideas?
Rebhan, Gilbert wrote: > Hi, > > <target name="depends"> > <echo file="Y:/test.html"> > <![CDATA[ > <html> > <head> > <title>summary</title> > <link rel="stylesheet" href="summary.css" type="text/css"> > </head> > <body> > <a name="overview"></a> > <center> > <table class="summary"> was wrong </table> > </center> > </html> > ]]> > </echo> > </target> > > <target name="main" depends="depends"> > > <loadfile srcfile="Y:/test.html" property="summary"> > <filterchain> > <containsregex > pattern='<table[^</]*>(.*?)</table>' > replace="\1" > byline="true" > /> > <tokenfilter> > <!-- to get rid of whitespace in ${summary} --> > <trim/> > </tokenfilter> > </filterchain> > </loadfile> > > <echo>Summary == ${summary}</echo> > > </target> > > gives only the text = > > depends: > main: > [echo] Summary == was wrong > BUILD SUCCESSFUL > Total time: 407 milliseconds > > > you have to use \1 and byline=true > > Regards, Gilbert > > -----Original Message----- > From: George Bills [mailto:gbills@(protected)] > Sent: Tuesday, November 28, 2006 6:14 AM > To: Ant Users List > Subject: Re: containsregex and concat > > Thanks: the regular expression works now, which is progress. > Unfortunately I'm getting all of the concatenated text, not just the > matching text. If I use replace: > <filterchain> > <!--<tokenfilter><filetokenizer />--> > <containsregex flags="isg" > pattern="${summary.regex}" > replace="SUMMARYTABLE" > byline="false" <!-- implies filetokenizer --> > /> > <!-- </tokenfilter>--> > </filterchain> > > I end up getting something like: > [concat] <html> > [concat] <head> > [concat] <title>summary</title> > [concat] <link rel="stylesheet" href="summary.css" type="text/css"> > [concat] </head> > [concat] <body> > [concat] <a name="overview"></a> > [concat] <center> > [concat] SUMMARYTABLE > [concat] </center> > [concat] ...more HTML here... > [concat] </html> > > I'm assuming it's because the file is just one big token - but if I use > a line tokenizer, will I be able to match regular expressions over > multiple lines? > > Thanks for the help. > > Rebhan, Gilbert wrote: > >> Hi, >> >> <table[^>/]*>(.*?)</table> >> >> should match : >> >> <table class="summary">foobar</table> >> >> also with more than one attribute >> >> <table class="summary" foo="bar">foobar</table> >> >> >> foobar is /1 (group 1) >> >> >> Regards, Gilbert >> >> >> -----Original Message----- >> From: George Bills [mailto:gbills@(protected)] >> Sent: Monday, November 27, 2006 6:41 AM >> To: Ant Users List >> Subject: Re: containsregex and concat >> >> Hrm, it probably isn't since advanced regexs are still black magic to >> me. The "." was supposed to match any character, including a newline >> (with the s flag), the * to say match 0-n of them and the ? to say be >> lazy, match as little as possible (so that I don't pull in >> <table>...</table><table>...</table> in one match). >> >> I just tried [^<], but it doesn't seem to work - I think because of >> > such > >> things as "<table><tr>...</tr></table>" - the opening bracket of <tr> >> conflicts. I tried [.<>]*? to make sure that the "regex.body" >> > part > >> was matching the brackets, but that didn't work either. >> >> Also, <table class="summary"> was wrong - <table class="summary"(.*?)> >> > > >> is a little better since the tables can have more than the class >> attribute (in fact, all of them do). But after changing that I'm >> matching the entire document - <html> through to </html>. That might >> just be because I'm using filetokenizer - if I make one match within >> filetokenizer, do I end up getting the entire document? If so, how do >> > I > >> get only the matching text? >> >> Regex is now: <table class="summary".*?>.*?</table> >> >> Thanks for the help, I appreciate it. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscribe@(protected) >> For additional commands, e-mail: user-help@(protected) >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@(protected) > For additional commands, e-mail: user-help@(protected) > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@(protected) > For additional commands, e-mail: user-help@(protected) > >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@(protected) For additional commands, e-mail: user-help@(protected)
|
|
 |