pprint - a creeping php indent

index | end

I have read through some of the 82 items using 'indent' search, and 52 using 'php', but none seem to touch this particular PHP indenting problem. I seek clarification, advice ...

Using a configuration, on a HTML (actually a .php) file, like :-
--wrap 99
--indent yes
--break-before-br yes
--indent-attributes yes
--vertical-space yes
--indent-spaces 1

Tidy does indent the PHP script, that is between <?php and ?>, but at present it ADDS this indent to any existing white space that may already be there. Then if you further Tidy the output, the white space is accumulated, hence what I have called a 'creeping indent' ;=))

There is a mechanism to remove this 'additional' white space, if the data is CDATA or COMMENT, but ONLY for TEXT nodes???

In pprint.c, PPrintPHP() is called, and since in my case TidyWrapPhp is on, it calls PPrintText() with a mode CDATA. If this is off, then it is a called with COMMENT, but either would work.

At the beginning of PPrintText(), and after each 'new line', TextStartsWithWhitespace() is called, but in this case this is NOT a 'TY_(nodeIsText)(node)', thus it always returns -1!!! And then when the service IncrWS() is called with -1, it does nothing.

The logic in TextStartsWithWhitespace() is -

if ( (mode & (CDATA|COMMENT)) &&
        TY_(nodeIsText)(node) &&
        node->end > node->start &&
        start >= node->start )

It seems this function nodeIsText() should be replaced with something like nodeIsTextLike(), which does something more like the following :-

   Bool TY_(nodeIsTextLike)( Node * node )
      switch ( node->type )
      case TextNode:   /* yes for sure */
      case CDATATag:   /* maybe??? */
      case SectionTag: /* maybe??? */
      case AspTag:     /* yes? */
      case JsteTag:    /* yes? */
      case PhpTag:     /* yes for sure */
      return yes;
      return no;

You can see I am UNSURE of certain tags, but I definitely want it for my PhpTag case, and of course, a real TextNode ...

I have made such a change to my personal 'development' branch, and it now works as I expect it. And I made 2 runs of test/alltest.cmd using Tidy from CVS and my TidyDEV, and found NO significant differences. There was one difference, but unrelated to this issue.

The above shows the configuration items used, and this is my simple test file, php-01.php. Of course it would need to be 'errantly' given a html extension to go into our test cases :-

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
  <meta http-equiv="Content-Type"
        content="text/html; charset=us-ascii">
   PHP creeping indent
   First para
   $file = 'file.txt';
   echo "<p>The file is $file ...</p>\n";
   Last para

Note, it is already indented, thus I expect Tidy to DO NOTHING, and generate an exact equivalent output. Which it now does with this new function used, nodeIsTextLike(), in place of 'just' nodeIsText() ...

I seek feedback, comments, etc, as I feel this is really a minor BUG in Tidy that I would like fixed for others ...

If all agreed, I could open a new BUG, and provide the patch, or Tidy could be fixed from this, if a related 'issue' number, that I missed, can be found - Advise which.


Saturday, August 18, 2007.

PS: OT: During the above regression testing, testcases.txt contains 1168193, merge-spans, added recently, Aug 13, 2007, but it seems input/in_1168193.*ml is missing??? 2007.08.19 Now fixed ;=))

EOF - Php-01.doc


checked by tidy  Valid HTML 4.01 Transitional