Tidy Update Sat., 16 Sep. 2006.

|- index -|- download -|- end -|

Tidy cvs now has native WIN32 file mapping enabled by default. While this does not produce a speed gain for small files, it is not longer either, and it gives a 20% plus reduction in the processing time for larger files.

Below is a full 'diff' file that can be applied to the current cvs (circa 16 September, 2006) ... but first some details about the changes to get a clean compile ...

I decided it was time for a personal tidy update ;=)) I had added this 'feature' to my own version of tidy some considerable time ago, but this brings the feature into the main stream of development, that is, into CVS, since there is an in-progress parallel development of this feature under unix/linux ...

Also, I decided to try out trusty old MSVC 6 (Microsoft Visual Studio Visual C++ 6.0, circa 1998). Of course, the new module, mappedio.c, had to be added to the source file lists of both tidylib and tidydll projects. That is, updating the tidy dsp files. And, in the case of the static library, tidylib, changing the Project Settings -> C/C++ -> Customize to uncheck the 'Disable language extensions' option.

This latter option changes the compiler command line, replacing the /Za with /Ze. This is needed to allow mappedio.c to include <windows.h>, which, sadly, is NOT always exactly ANSI C compliant ...

I even had to add -

#ifdef   WIN32
#if _MSC_VER < 1700
#pragma warning( disable : 4115 ) /* named type
definition in parentheses */
#endif /* #ifdef WIN32 */

to the head of mappedio.c, to disable another pesky compiler 'warning' ... again, from one of the 'windows' headers that gets included when <windows.h> is included ... perhaps it is clear that MS engineers are not encouraged, or are not given the time, to try a full ANSI C compliant compile, and thus keeping these headers in 'shape' ...

This is not needed for MSVC7.1 upwards, since rather than switch from /Za to /Ze, the option is totally removed from the compiler command line ...

There were a number of other, relatively minor changes, to allow the compile under MSVC6 ... it appears GetFileSizeEx(...) was not added to the runtime libraries until later, although the MSDN 2001 documentation states it 'should' be available in WIN98 ... and the use of the LL suffix on numbers is not supported in this compiler, but I64 is ...

There were a number of other minor fixes to get this 'mapped' functions implemented for WIN32 ... the 'static' had to be removed from the function -

StreamIn* initStreamIn( TidyDocImpl* doc, int encoding )

and of course encased the new tidy definition for 'public' that are really meant to be 'private' functions, namely the TY_(public functions) definition, like -

StreamIn* TY_(initStreamIn)( TidyDocImpl* doc, int encoding )

I also 'elevated' the function to streamio.h, so it can be called from mappedio.c ...

There is a slightly adverse effect of using this new TY_(function) casing, and that is that you have to be aware that prvTidy is being appended to the front of the function name. Thus the compiler reports a warning like -

...\mappedio.c(257) : warning C4013: 'prvTidyinitStreamIn' undefined;
  assuming extern returning int 

But of course, in doing a SEARCH for where this function is declared, you must only search for 'initStreamIn' ... as stated, very minor inconvenience, to gain a unique function name ...

I presume this change was brought about by the fact that certain people had reported 'function name conflicts' emitted when using the static library in other projects, and this definition certainly 'screws-up' the function names ;=)) but it in no way rivals the function name embellishment done in C++, where -

"int expand_wilds(void)" becomes "?expand_wilds@@YAHXZ", or
"int process_args(int,char * *)" becomes "?process_args@@YAHHQAPAD@Z" ;=))

So, remembering to remove some letters from the beginning of a tidy function before searching for it seems trivial in comparison ;=))

Finally, I had to add a few headers to mappedio.c to get a 'clean' compile -

+ #include <errno.h>
#include "forward.h"
#include "mappedio.h"
+ #include "streamio.h"
+ #include "platform.h"
+ #include "tidy-int.h"
+ #include "message.h"

As usual, in the process of finding a smooth compile, I may have added more than is needed, and it always seems too difficult to go back and spend even more time in an 'elimination' process, especially since (a) all tidy headers a 'protected' against multiple includes of the same file, (b) disk speed and compiler processing increases means that there is no significant loss of time ...

As mentioned, attached below is a full 'diff' file that can be applied to the current cvs (16 September, 2006) covering ALL these changes ... including the tidy MSVC6 DSP files ... the modified MSVC6 files can then be used to generate MSVC7.1, or MSVC8, solution files ... and a ZIP file contained a release configuration WIN32 EXE file, so others can try it ...

The zip file also contains runtimer.exe, which can be easily used to check if this version is faster ... As mentioned, there is no real change in the time to process small files, but some gain when batch processing whole groups of files ... and a considerable gain when processing BIG files ...

Hope this helps ...


|- top -|- index -|

tidycvs6.zip: contains tidycvs6.exe, and runtimer.exe ...

download: MD5-sum: release: build:
tidycvs6.zip b50c853e1b76d508dd4131bd62eb2759 12 September 2006 16 September 2006
Older version, without memory mapping ...
tidycvs.zip 752bbba09a5af2a9d96fbf0d9a944c69 14 February 2006 25 February 2006

The difference between the above build and cvs as of 16 September, 2006, is given in this 'text' file.

|- top -|- index -|

EOF - Tidy-45.doc

PS: This file has been run through tidycvs, with no warnings, or errors, thus proudly 'sports' a tidy button, as well as my now usual, W3C HTML 4.01 logo ... maybe one day someone will be kind enough to put tidy 'online' to present a 'report' of its finding on a file, like the W3C validation service - http://validator.w3.org/ - it makes it so much easier to 'conform' when the 'testing' is make ubiquitous  ;=)).

|- top -|- index -|

checked by tidy  Valid HTML 4.01 Transitional