Building Tidy Perl

|- index -|
in page links: |- prerequisites -|- building -|- fix-ups -|- Restart -|- MSVC7 -|- downloads -|- diff -|- end -|

under construction

Preamble

After a recent Tidy Support Request ( http://tidy.sf.net/issue/1786061 ), on the Tidy wrappers, I decide to try to build this neat set of wrapper 'components' using MSVC8 (Microsoft Visual Studio Express Editions - FREE!) ... This is building the Perl package. As can be seen below, I had some minimal success with a MSVC7 compile, but real problems with MSVC8 .... this is an ongoing story ;=))

Prerequisites

TOOLS:
Of course you need the tools - MSVC8 mentioned above, the Microsoft (MS) PSDK (Platform Software Development Kit), and Perl installed ... And of course a CVS client to download and update the sources.

SOURCES & FOLDERS: 1. HTML Tidy and 2. Tidy Wrappers

1. HTML Tidy:

The first source to download and build the standard HTML Tidy from :-
http://tidy.sourceforge.net/ 
Or more precisely, the CVS source download instructions from :-
http://sourceforge.net/cvs/?group_id=27659

At lease the static libtidy.lib must be built, as the Tidy wrappers link with this static library. In my case I built the following folders :-
<some_root>\tidycom
To maintain a 'relative' folder access, I then COPIED the wrapper sources for building into :-
<some_root>\tidyatl  - see previous page tidy_06.htm
<some_root>\twperl  - subject of this build.

2. Tidy Wrappers:

This page give the CVS download instructions for the Tidy Wrappers :-
http://sourceforge.net/cvs/?group_id=97170

First setting the environment variable CVSROOT to the 'root' folder I am using, C:\FGCVS, but here I am showing just <myroot> - you should choose your own root folder name - like :-

> set CVSROOT=c:\fgcvs

Then the instructions I used were : (Enter key for password)
cvs -d:pserver:anonymous@tidywrap.cvs.sourceforge.net:/cvsroot/tidywrap login
Then in a folder <myroot>\tidywrap I did each of the following :-
cvs -z3 -d:pserver:anonymous@tidywrap.cvs.sourceforge.net:/cvsroot/tidywrap co -P tidyatl
cvs -z3 -d:pserver:anonymous@tidywrap.cvs.sourceforge.net:/cvsroot/tidywrap co -P include
cvs -z3 -d:pserver:anonymous@tidywrap.cvs.sourceforge.net:/cvsroot/tidywrap co -P twperl
cvs -z3 -d:pserver:anonymous@tidywrap.cvs.sourceforge.net:/cvsroot/tidywrap co -P www
This downloaded the sources into their respective folders.

I NEVER build in these DOWNLOAD, and UPDATED CVS/SVN source folders, but ALWAYS copy it to another location to do the actual building ... grace of the growing size and declining price of hard disk space ;=))

top -|- end


Building HTML::Tidy

Initially the source looked like any other Perl package download - the first step could be :-
> Perl Makefile.PL
to get the actual Makefile built, but in this case parameters must be added, so there is a BUILD.BAT batch file to handle this.

You run this, giving the root location of the static LIBTIDY.LIB you have built above. However the batch file seemed to be in error. Essentially it does :-
> perl Makefile.PL TIDYINCPATH=%1\include TIDYLIBPATH=%1\release
While this will correctly set TIDYINCPATH to the HTML Tidy includes, the TIDYLIBPATH would NOT point to the location of the LIBTIDY.LIB ... That line had to be altered to something like :-
> perl Makefile.PL TIDYINCPATH=%1\include TIDYLIBPATH=%1\build\msvc\release

But in addition I added some 'if' 'testing' into the batch file, such that if it does not find some of the needed components on the given path, it will abort, with a helpful warning ... See diff patch files below for the details.

Also those who have manually installed Perl packages before will know that all this has to be done in the special MSVC8 Command Prompt, which has run the VSVARS32.BAT batch file to correctly establish the full build environment. This includes establishing additional environment variables like INCLUDE, LIB, etc, and extending the PATH to allow the running of the compiler, linker, etc tools ...

In my case I had to MANUALLY add the appropriate PSDK path into the INCLUDE and LIB variables, by altering the VSVARS32.BAT found in the %VS80COMNTOOLS% folder. If you do NOT have VS80COMNTOOLS set in your command prompt environment, then perhaps your MSVC8 has NOT installed correctly.

Without the correct INCLUDE and LIB environment variables, during the running of the BUILD.BAT you will see a warning report that libraries like kernel32.lib, user32.lib, etc can NOT be found, but it will continue ...

top -|- end


Fix Ups

To get the file tidyperl.cpp compiled, I had to do quite a number of fix-ups in the code. A lot were to do with the use of a 'unsigned long' for a 'void *'. While these are in fact the same byte length in 32-bit code, namely four(4) bytes, the compiler does NOT think they are the same ;=(( and generates copious warning and some errors about needing a re-casting' ... warnings like -

..\tidyperl.cpp(410) : warning C4312: 'type cast' : conversion from 'IV' to 'void *' of greater size
c:\projects\tidy\twperl\tidyx.h(97) : warning C4312: 'type cast' : conversion from 'ulong' to 'Tidy::Source *' of greater size
..\tidyperl.cpp(7701) : warning C4312: 'type cast' : conversion from 'ulong' to 'void *' of greater size

Also there were warning about the redefinition of certain items in HTML Tidy's platform.h with the definition in the CORE Perl headers, like c:\perl\lib\core\win32iop.h, and others ... warning like :-

c:\projects\tidy\tidycom\include\platform.h(455) : warning C4005: 'fstat' : macro redefinition
   c:\perl\lib\core\win32iop.h(239) : see previous definition of 'fstat'

See the diff file below, for the numerous 'fixes' made ... with these patches in place the NMAKE ran to conclusion ;=)) ... on to the 'testing' ...

top -|- end


Testing

There are a few 'test' files, in t/*.t - basic.t, dump.t and opt.t - and these are run using the Makefile, via the command -

> nmake test

but the error report indicates that Tidy.pm had NOT been copied into blib/lib/HTML - that is it could NOT be found using the values in @INC??? Ok, copying that file into there got the tests running but with a VERY POOR result ... ;=((
Failed 10/10 tests, 0.00% okay - that is ZERO percent ok!!!

t/basic....Can't load 'C:\Projects\tidy\twperl\blib\arch/auto/HTML/Tidy/Tidy.dll ' for module HTML::Tidy: load_file:The specified module could not be found at C: /Perl/lib/DynaLoader.pm line 230. at C:\Projects\tidy\twperl\blib\lib/HTML/Tidy.pm line 7 Compilation failed in require at t/basic.t line 3. BEGIN failed--compilation aborted at t/basic.t line 3.

But that does NOT make sense in that C:\Projects\tidy\twperl\blib\arch/auto/HTML/Tidy/Tidy.dll DOES exist!

   19/09/2007 18:24 393,216 Tidy.dll

 Reading the comments just before this line 230 in DynaLoader.pm yields perhaps a clue :-

  # Many dynamic extension loading problems will appear to come from
  # this section of code: XYZ failed at line 123 of DynaLoader.pm.
  # Often these errors are actually occurring in the initialisation 
  # C code of the extension XS file. Perl reports the error as being 
  # in this perl code simply because this was the last perl code 
  # it executed.
  my $libref = dl_load_file($file, $module->dl_load_flags) or
   croak("Can't load '$file' for module $module: ".dl_error());

That make sense, BUT WHAT TO DO ABOUT IT???

Time to reflect ...

top -|- end


Restart With SWIG

20070922 - Although able to build the HTML/Tidy/Tidy.dll, trying to run the tests produced the above BIG problems, so I backtracked, and decide to download SWIG (Simplified Wrapper and Interface Generator) from http://www.swig.org/ ... I downloaded version 1.3.31 from http://sourceforge.net/project/showfiles.php?group_id=1645 - this ZIP file contained a WIN32 swig.exe ...

The twperl source already includes the tidy.i SWIG interface file. Running the command -

> [pathto\]swig -perl -c++ -I..\tidycom\include tidy.i

produces a new tidy.pm and tidy_wrap.cxx, which I copied onto tidyperl.cpp ... During the running of this, I get a number of 'warnings' which I hope are not fatal :-

[common_root]\twperl>[path_to\]swig.exe -perl -c++ -I..\tidycom\include tidy.i
tidy.i(14): Warning(124): Specifying the language name in %typemap is deprecated -
   use #ifdef SWIG<LANG> instead.
tidy.i(19): Warning(124): Specifying the language name in %typemap is deprecated -
  use #ifdef SWIG<LANG> instead.
..\tidycom\include\platform.h(591): Warning(314): no is a perl keyword
tidyx.h(79): Warning(401): Nothing known about base class 'TidyInputSource'. Ignored.
tidyx.h(119): Warning(401): Nothing known about base class 'TidyOutputSink'. Ignored.
tidyx.h(140): Warning(401): Nothing known about base class 'TidyBuffer'. Ignored.
tidy.i(17): Warning(119): %typemap(ignore) has been replaced by %typemap(in,numinputs=0). 

Then I could run my modified build.bat, in the MSVC8 command prompt, which then runs 'nmake' on the created 'makefile' but I ran into a new set of errors ;=((

C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\cstdlib(20) : error C2039:
  'PerlProc_abort' : is not a member of '`global namespace''
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\cstdlib(20) : error C2873:
  'PerlProc_abort' : symbol cannot be used in a using-declaration
... and many more like this ...
C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\cstdio(39) : error C2039:
  'PerlSIO_vprintf' : is not a member of '`global namespace'' C:\Program Files\Microsoft Visual Studio 8\VC\INCLUDE\cstdio(39) : error C2873:
  'PerlSIO_vprintf' : symbol cannot be used in a using-declaration
NMAKE : fatal error U1077:
'"C:\Program Files\Microsoft Visual Studio 8\VC\BIN\cl.EXE"' : return code '0x2'
Stop.

These are defined in the the Perl/lib/CORE, in several headers ...

   lib\CORE\iperlsys.h
#if defined(PERL_IMPLICIT_SYS)
   #define PerlSIO_vprintf(f,fmt,a) \
   (*PL_StdIO->pVprintf)(PL_StdIO, (f),(fmt),a)
#else   /* PERL_IMPLICIT_SYS */
   #define PerlSIO_vprintf(f,fmt,a) vfprintf(f,fmt,a)

   lib\CORE\perlsdio.h
#ifdef PERLIO_IS_STDIO
   #define PerlIO_vprintf(f,fmt,a) PerlSIO_vprintf(f,fmt,a)

   lib\CORE\XSUB.h
   # define vfprintf PerlSIO_vprintf

Yowee! How to get over these?

top -|- end


MSVC7

As another backtrack, I tried compiling the Perl Tidy.dll in MSVC7 (circa 2003). First I had considerable trouble with the -
> perl Makefile.PL ... command, through the BUILD.BAT batch file. This kept putting some 'bad' PATH values in the Makefile. Eventually I MANUALLY fixed a Makefile.vc7, and got the DLL built, but as above, this failed to copy the Tidy.pm to blib/lib/HTML ??? I HAD TO MANUALLY DO THIS STEP ???

At least now, when running -
> nmake /f Makfile.vc7 test ... command, I get a 'different' result - a Windows 'sorry for the inconvenience' abort dialog ... but it appears 'something' is loaded in basic.t before this ABORT. THIS IS MINOR PROGRESS ;=))

I got some dots .... before the abort, which indicates the use HTML::Tidy was causing a load, but it aborts before getting to far into the code ??? So all is NOT WELL, but it seems slightly 'better', or perhaps 'less bad' than the MSVC8 compiles ;=() In FACT, some part seemed to work, in that the opt.t test created an 'opterr.txt' file, and dump.t created an 'errors.txt' file so some parts are working ...

Creating just a simple t/a.t like the following, and adding it to Makefile.vc7, and it ran without an abort ...

BEGIN {print "Load only\n";}
END {print "not ok 1\n" unless $loaded;}
use HTML::Tidy;
$loaded = 1;
print "ok 1\n";  

So the Perl Tidy.dll is getting successfully loaded at least ... expanding that a.t test file to :-

BEGIN {print "1..4\n";}
END {print "not ok 1\n" unless $loaded;}
use HTML::Tidy;
$loaded = 1;
print "ok 1\n";
my $tidy = HTML::Tidy::Document->new();
if ( $tidy ) {
 print "ok 2\n";
} else {
 print "not ok 2\n";
}
my $stat = $tidy->Create();
if ( $stat >= 0 ) {
 print "ok 3\n";
} else {
 print "not ok 3\n";
}
$tidy->Release();
print "ok 4\n";

Gave this BEAUTIFUL output :-

G:\GTools\tidyproj\twperl>nmake /f makefile.vc7 test
   C:\perl\bin\perl.exe -Mblib -IC:\perl\lib -IC:\perl\lib -e "use Test::Harness qw(&runtests $
   verbose); $verbose=1; runtests @ARGV;" t\a.t
   Using G:/GTools/tidyproj/twperl/blib
   t\a....1..4
   ok 1
   ok 2
   ok 3
   ok 4
   ok
   All tests successful.
   Files=1, Tests=4, 1 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU)

Time for more reflections ...

top -|- end


Downloads

*TBD*  nothing at the moment, since nothing works ;=))

top -|- end


Geoff.
September 19, 2007.

Diff - Patch File

Initial diff file - as can be seen all the differences are really just 'casting' questions. No substantial change has been made in the code. The initial comment visible gave me a pause for thought - maybe I too could 'modify the SWIG interface file instead', but I do not think I have that ??? DO I??? A good question!

diff -ur C:\FGCVS\TidyWrap\twperl\tidyperl.cpp twperl\tidyperl.cpp
--- C:\FGCVS\TidyWrap\twperl\tidyperl.cpp       Wed Dec 17 15:59:51 2003
+++ twperl\tidyperl.cpp Wed Sep 19 15:05:39 2007
@@ -7,7 +7,10 @@
  * changes to this file unless you know what you are doing--modify the SWIG 
  * interface file instead. 
  * ----------------------------------------------------------------------------- */
-
+/* FIX20070919 */
+#if   defined(_MSC_VER) && (_MSC_VER > 1300)
+#pragma warning(disable:4244)
+#endif
 
 #ifdef __cplusplus
 template<class T> class SwigValueWrapper {
@@ -7677,7 +7680,8 @@
     const char *_swigerr = _swigmsg;
     {
         Tidy::Document *arg1 = (Tidy::Document *) 0 ;
-        ulong arg2 ;
+        /* FIX20070919 ulong arg2 ; */
+        void * arg2 ;
         int argvi = 0;
         dXSARGS;
         
@@ -7690,11 +7694,11 @@
             }
         }
         {
-            ulong * argp;
+           ulong * argp;
             if (SWIG_ConvertPtr(ST(1),(void **) &argp, SWIGTYPE_p_ulong,0) < 0) {
                 SWIG_croak("Type error in argument 2 of Document_SetAppData. Expected _p_ulong");
             }
-            arg2 = *argp;
+            arg2 = (void *)*argp;   /* FIX20070919 */
         }
         (arg1)->SetAppData(arg2);
         
@@ -7712,7 +7716,8 @@
     const char *_swigerr = _swigmsg;
     {
         Tidy::Document *arg1 = (Tidy::Document *) 0 ;
-        ulong result;
+        /* FIX20070919 ulong result; */
+        void * result;
         int argvi = 0;
         dXSARGS;
         
diff -ur C:\FGCVS\TidyWrap\twperl\tidyx.h twperl\tidyx.h
--- C:\FGCVS\TidyWrap\twperl\tidyx.h    Wed Dec 17 15:59:52 2003
+++ twperl\tidyx.h      Wed Sep 19 15:07:38 2007
@@ -80,10 +80,10 @@
 public:
   Source()
   {
-    getByte    = get;
-    ungetByte  = unget;
-    eof        = end;
-    sourceData = (ulong) this;
+    getByte    = (TidyGetByteFunc)get;
+    ungetByte  = (TidyUngetByteFunc)unget;
+    eof        = (TidyEOFFunc)end;
+    sourceData = (void *) this;
   }
   virtual ~Source() {}
 
@@ -120,8 +120,8 @@
 public:
   Sink()
   {
-    putByte  = put;
-    sinkData = (ulong) this;
+    putByte  = (TidyPutByteFunc)put;
+    sinkData = (void *) this;
   }
   virtual ~Sink() {}
   virtual void PutByte( byte bv ) = 0;
@@ -162,7 +162,8 @@
   { tidyBufCheckAlloc(this, buflen, chunkSize);
   }
 
-  void Attach( void* vp, uint size )  { tidyBufAttach(this, vp, size); }
+  /* void Attach( void* vp, uint size )  { tidyBufAttach(this, vp, size); } */
+  void Attach( void* vp, uint size )  { tidyBufAttach(this, (byte *)vp, size); }
   void Detach()                       { tidyBufClear(this); }
 
   void Clear()                        { tidyBufClear(this); }
@@ -265,7 +266,9 @@
 
 
     Bool IsText()        { return tidyNodeIsText( tnod() ); }
-    Bool IsHeader()      { tidyNodeIsHeader( tnod() ); } /* h1, h2, ... */
+    /* FIX20070919 */
+    /* Bool IsHeader()      { tidyNodeIsHeader( tnod() ); } /* h1, h2, ... */ 
+    Bool IsHeader()      { return (tidyNodeIsHeader( tnod() ) ? yes : no); } /* h1, h2, ... */
 
     TagId Id()           { return tidyNodeGetId( tnod() ); }
 
@@ -496,8 +499,8 @@
         Release();
         if ( _tdoc = tidyCreate() )
         {
-            tidySetAppData( _tdoc, (ulong) this );
-            tidySetReportFilter( _tdoc, ReportFilter );
+            tidySetAppData( _tdoc, (void *) this );
+            tidySetReportFilter( _tdoc, (TidyReportFilter)ReportFilter );
             return 0;
         }
         return -1;
@@ -511,8 +514,10 @@
     /* Let application store a chunk of data w/ each Tidy instance.
     ** Useful for callbacks.
     */
-    void  SetAppData( ulong data ) { tidySetAppData( _tdoc, data ); }
-    ulong GetAppData()             { return tidyGetAppData( _tdoc ); }
+    /* void  SetAppData( ulong data ) { tidySetAppData( _tdoc, data ); } */
+    /* ulong GetAppData()             { return tidyGetAppData( _tdoc ); } */
+    void  SetAppData( void * data ) { tidySetAppData( _tdoc, data ); }
+    void * GetAppData()             { return tidyGetAppData( _tdoc ); }
 
     static ctmbstr ReleaseDate()  { return tidyReleaseDate(); }
 

End of DIFF file

top -|- end

New BUILD batch file - Rather than including a diff of the build.bat batch file, the complete file is given below ...

@if NOT "%1." == "." set TIDY=%1
@if NOT "%TIDY%." == "." goto start
@echo ERROR: NO Location of Tidy static library given ...
@echo The PATH to where libtidy has been built is required ...
@goto END

:start
@set TEMP1=%TIDY%/include
@if NOT EXIST %TEMP1%/tidy.h goto ERR1
@set TEMP2=%TIDY%/build/msvc/release
@if NOT EXIST %TEMP2%/libtidy.lib goto ERR2
@if "%INCLUDE%." == "." goto ERR3
@if "%LIB%." == "." goto ERR4

perl Makefile.PL TIDYINCPATH=%TEMP1% TIDYLIBPATH=%TEMP2%
nmake
@goto END

:ERR1
@echo Can NOT find tidy.h on PATH %TEMP1% ... check name, location ...
@goto END
:ERR2
@echo Can NOT find libtidy.lib on PATH %TEMP2% ... check name, location ...
@goto END
:ERR3
@echo Can NOT find INCLUDE in the ENVIRONMENT ... This is NOT a 'build' environment!!!
@echo Use the MSVC Command Prompt, or
@echo first run %VS80COMNTOOLS%\VSVARS32.BAT ...
@goto END
:ERR4
@echo Can NOT find LIB in the ENVIRONMENT ... This is NOT a 'build' environment!!!
@echo Use the MSVC Command Prompt, or
@echo first run %VS80COMNTOOLS%\VSVARS32.BAT ...
@goto END

:END

End of NEW BATCH file

top


checked by tidy  Valid HTML 4.01 Transitional