Building (Andy's) HTML::Tidy

External: |- index -|
In page: |- preamble -|- prerequisites -|- building -|- modifications -|- downloads -|- DLL problem -|- end -|


July, 2008: Having tried, and TRIED to build HTML::Tidy, a Perl 5 module, which is part of tidywrap (twperl), using SWIG and MSVC8 (2005) in WIN32, but not succeeding, see this page,  and while continuing to look for a solution I ran across this CPAN source - HTML::Tidy by Andy Lester - and decided I should try to build this ...

Here the 'glue' code between the libtidy and perl is generated using the perl XS functions, as opposed to using SWIG. The principle is somewhat similar - you write a pseudo-code xs glue module, which is called an interface module, (.i), in SWIG, that maps the desired functions from the libtidy to perl.



The following must be installed for you to complete this build, in no particular order :-

  1. MSVC - The latest (free) release is Microsoft Visual C++ Express Edition 2008 (MSVC9)
  2. PSDK - You also need the Microsoft Platform Software Development Kit (PSDK)
  3. Perl - ActivePerl provides an excellent and easy Win32 install! (Perl)
  4. libtidy - The latest source of HTML Tidy from sourceforge (Tidy)

Then you ready to download Andy Lester's source, and get started ...


The Build

1. Preparing libtidy

Having install MSVC, optionally the PSDK, and downloaded the libtidy source, you are ready to build the libtidy. The source has a build/msvc folder containing a Tidy.dsw, a MSVC6 build file, which can be loaded into all version of MSVC, and converted where necessary. This will build three(3) projects -

  1. Tidy.exe - A command line WIN32 executable to 'tidy' HTML sources ...
  2. libtidy.lib - A STATIC runtime library.
  3. libtidy.dll - A Dynamic link library for Tidy, together with a link archive libtidy.lib.

Either 2, the 'static' or 3, a 'shared' (DLL) libtidy can be used. The DLL is best, since you can take advantage of any enhancements of Tidy without re-compiling the HTML::Tidy module, but you must remember to copy the DLL to one of the PATHS listed in your PATH environment variable. The most obvious is C:/Windows/System32 or C:/Windows, but these are not necessarily the recommended location.

2. HTML::Tidy

Having installed perl, and downloaded (and unpacked) Andy's source, this may be completed by the usual perl MakeMaker mantra, which has to be run in the special MSVC command prompt - there should be a link to this when you installed MSVC, or through the MSVC menu, Tools -> Visual Studio ... Command Prompt ...

  1. perl Makefile.PL
  2. nmake
  3. nmake test
  4. nmake install

Step 1: perl Makefile.PL

But before doing this, it is a good policy to check the contents of Makefile.PL first ... sometimes, but not particularly in this case, parameters can be added that set some needed paths. In this case the author has attempted to fully 'automate' the process, but it FAILED in my MSWin32, so I added a switch based on the fact that Perl places the OS in use in the global $^O variable, and it is 'MSWin32' for 32-bit windows ...

Rather than adding an interface to additional command line parameters, I HARD CODED my libtidy location, by adding the following code -

my $os = $^O;
my @vars = ();
my $libs = '';
my $incs = '';

if ($os =~ /^MSWin/i) {
    # this must set to where you unpacked, and built HTML Tidy
    my $tidylib = 'C:\Projects\Tidy\Tidy4p5';
    $incs = '-I. -I'.$tidylib.'\include';
    # and this MUST be likewise adjusted to WHERE you have built libtidy.[DLL/LIB]
    @vars = ExtUtils::Liblist->ext( '-L'.$tidylib.'\build\msvc\ReleaseDLL -ltidy', 0, 1 );
    # and REMEMBER, libtidy.dll MUST be copied to a PATH in your environment
} else {
    @vars = ExtUtils::Liblist->ext( '-L/sw/lib -ltidy', 0, 1 );
    $incs = '-I. -I/usr/include/tidy -I/usr/local/include/tidy -I/sw/include/tidy';

$libs = $vars[2];
... and down near the bottom, replaced the 'INC' line with ...
    INC                 => $incs,

With this in place, it correctly found my libtidy installation ...

Step 2: nmake

This ran fine, and a Tidy.DLL was built, and placed in the 'test' location, blib\arch\auto\HTML\Tidy ... there seemed no apparent errors or warnings.

Step 3: nmake test

This is where I ran into the most problems. For some reason the Tidy.DLL can NOT be loaded by the perl dynamic library (DLL) loader,, and shows an error around line 200, stating that the Tidy.DLL CAN NOT BE LOADED. Of course this results in a LOT OF NOISE, as each test fails the same way.

This is exactly where I 'stalled' when trying to build the previous fuller implementation of HTML::Tidy, using SWIG. For some reason the Makefile auto-generated by the perl MakeMaker tool, builds a non-loadable DLL! I was able to sometimes prove this by trying to simply load the DLL in to a simple-do-nothing test application, that used the SAME dynamic link library loading code used in the Perl source. But sometimes this worked, but the 'tests' still failed, so is not a good test.

The single change I was able to make in the generated Makefile, was to substitute msvcrt.lib, with libcmt.lib, and then it worked ??? So I wrote a small perl script to do this - read in the generated Makefile, substitute libcmt.lib for msvcrt.lib, and write it back as Makefile ... Why it fails using msvcrt.lib, which in fact maps to the current MSVC runtime, MSVCR80.dll, is a BIG MYSTERY!

BUT IT WORKED! A single change, and HTML::Tidy functioned fine for MOST tests, but still FAILED on some.

Some changes to get most tests working are given below, although two, t/perfect.t and t/unicode.t continued to abort with a SEGFAULT dialog, so eventually excluded these. The other changes in the tests were to account for the NEW messages that this June 2008 version of libtidy outputs, an additional Info message ...

Step 4: nmake install

It is certainly NOT good practice to do this step BEFORE you are very satisfied that 'nmake test' works, as best as can be expected ;=))



In general the modifications were very modest, and will be passed back to Andy, and perhaps by the time you download the source, some or all will be included.

Addition to mantra

  1. perl Makefile.PL
  2. perl fixmf.cgi
  3. nmake
  4. nmake test
  5. nmake install

All this fixmf.cgi perl script does is substitute 'libcmt.lib' for 'msvcrt.lib'. As mentioned above, I have no clue WHY this modification is required. 'msvcrt.lib', which is a mapping to the current MSVC runtime, in my case MSVCR80.dll. Many other binary components of ActivePerl include this 'msvcrt.lib' with no problem. Below are some more thoughts on this DLL problem.

It seems this value, 'msvcrt.lib' comes from lib\, generated when you install perl. I was not able to find a way to modify Makefile.PL to do this substitution, thus render my 'fix' unnecessary. But maybe others, who know the perl ExtUtils::MakeMaker, and ExtUtils::Liblist better, will know how, and I would love to hear form you.

Addition to and

It seems the current HTML Tidy outputs an additional message of the form -

line 4 column 9 - Info: <head> previously mentioned

To do this I added another category of message type to, 3 = Info -

 my %strings = (
     1 => 'Warning',
     2 => 'Error',
     3 => 'Info',

And then added a constant and handler for this type of message in

use constant TIDY_INFO => 3;
 my $message;
 if ( $line =~ /^line (\d+) column (\d+) - (Warning|Error|Info): (.+)$/ ) {
   my ($line, $col, $type, $text) = ($1, $2, $3, $4);
   $type = ($type eq 'Warning') ? TIDY_WARNING : 
   ($type eq 'Error') ? TIDY_ERROR : TIDY_INFO;
   $message = HTML::Tidy::Message->new( $filename, $type, $line, $col, $text );

Of course, I also took the opportunity to increment the version number

our $VERSION = '1.09';

libtidy source

I also made a few very minor modification to the HTML Tidy source, which is why I put it in a folder called tidy4p5 ... but these were mainly to remove a few remaining MSVC8 compiler warnings.

In Tidy.c, I added a return "uncased", even after an abort();, just to shut the compiler up.

I moved some pragma warning( disable:<NUMBER> ) to near the top of platform.h, BEFORE any 'includes' have happened, and added 4127 - conditional expression is constant, and 4090 - 'function' : different 'const' qualifiers to the list existing.

And finally, also in platform.h, encased some re-defines in an #ifndef _INC_PERL_XSUB_H, since these are also re-defined in the perl XSUB.h, which gets included before platform.h ...




(a) If you already have the libtidy HTML Tidy source, then you only need the tidy4p5.diff.txt to patch that source.
(b) If you want to try HTML::Tidy without building the HTML Tidy libraries, then you can use the libtidy.dll in, copied to a folder given in your PATH environment variable.
(c) contains the full modified source of HTML Tidy, including the binary DLL, both Debug and Release, and the MSVC8 solution files.

  download description MD5 size
a tidy4p5.diff.txt patch file for 18 June, 2008 source text file only 2,527
b libtidy.dll - unzip and copy to a PATH 60c79cf0d71e9cc5d0f22b614fe33e61 104,247
c Full source, including binaries, and build files 10942beac428de3de848e74edc805c8e 2,043,374


(a) If you already have the HTML-Tidy-1.08 source, then you only need the HTML-Tidy-1.09.diff.txt to patch that source, namely the files :-
Makefile.PL - to adjust the location of HTML Tidy (libtidy.lib and includes); - adjust version, and add TIDY_INFO; - add message type 'Info';
t\simple.t - increase expected message count; and
t\too-many-titles.t - account for the new 'Info:' message.
fixmf.cgi - It also contain the fixmf.cgi perl script to 'patch' the generated Makefile.
(b) If you want my full modified 1.09 source, to compare it with other source version, then download the

  download description MD5 size
a HTML-Tidy-1.09.diff.txt patch file for 1.08 source text file only 6,370
b full source 2d3ca98093cfb7e12b705d31dc956c14 41,509

Happy Perl 'Tidying' ;=))


Ruminations on this Microsoft DLL problem

The Microsoft Help and Support page seems clear in its explanation in Section 1 - There are three(3) forms of the C run-time library available -
1. LIBC.LIB - a statically linked library for single-threaded programs.
2. LIBCMT.LIB - a statically linked library that supports multithreaded programs.
3. CRTDLL.LIB - an import library for CRTDLL.DLL that also supports multithreaded programs.

That seems clear, but then as you read on, there is a 'however' - the CRT in a DLL is named MSVCRT.LIB. The DLL is redistributable. Its name depends on the version of VC++ (i.e. MSVCRT10.DLL or MSVCRT20.DLL). This is also an indication of the age of this article, in that my MSVC8 uses MSVCR80.DLL!

It then goes on to making points about - Problems Encountered When Using Multiple CRT Libraries and Mixing Library Types, and ends with the simple phrase - If there is a possibility that the DLL will be called by multithreaded programs, be sure to link it with one of the libraries that support multithreaded programs (LIBCMT.LIB, CRTDLL.LIB or MSVCRT.LIB).

This seems to suggest MSVCRT.LIB (DLL) and LIBCMT.LIB (STATIC) should be somewhat interchangeable. So this seems to throw no particular light on why the Tidy.Dll fails using msvcrt.lib and works with libcmt.lib !!! Maybe I should get around to trying crtdll.lib as well ;=))


Geoff. 31 July, 2008.

checked by tidy  Valid HTML 4.01 Transitional