The GNU tar Utility

links: |- index -|- home -|
in page: usage | preamble | unxutils | testing | downloads | executable | win32src | full source | end

Basis Usage

2009-01-28: Basic tar usage:

CREATE: Assume you have a folder called 'test02', and you want to create a compressed archive of this folder -
$ tar -caf test02.tgz test02
c=create, a=auto select compression, f=file name, and the directory to archive

VIEW: To view the contents of this compressed archive -
$ tar -tvzf test02.tgz
t=list, v=verbose, z=uncompress using gzip, f=file to view

EXTRACT: TO extract the contents of this compressed archive -
$ tar -xvzf test02.tgz
x=extract, v=verbose, z=through gzip, f=file name


top

Preamble

2011-08-02: Minor update to VERSION "1.21.07", to fix double use of 'gzip'. See tar?07.zip below.

2009-09-08: Minor update to VERSION "1.21.06" to remove erroneous error issued for directories. See tar?06.zips below.

While tape drives are not used so frequently these days, the 'tar' utility is still very much in use. Many sources, and other things, are shipped from the unix/linux environment, as a 'tar' file, usually subsequently compressed, by say 'gzip' ...

I had been using a GNU WIN32 port of version 1.12, circa 2001, see GNU Win32 port, and was very happy with it until I found a 'tar' file it would not expand properly. So I set about porting the 2008 1.20 source, from [  http://www.gnu.org/software/tar/ ], to WIN32, using MSVC8 (Microsoft Visual C++ 2005, Express Edition).

Of course, it does not do _ALL_ that it does in the *nix world, but does do the most important things ... that is :-

create (-c)
Given a folder name followed by a wild card, it will 'create' a tar archive of all the files in the folder, and sub-folders, and
extract (-x)
It expand 'tar' archives back into the original folder structure ... and, of course
list (-t)
List the files contained in the archive ...
gzip (-z)
In addition, it will deal with 'gzip' compressed archives, if, and only if, you already have gzip available in the path ...

It does not deal with 'remote' archives, and will default to a 'user' and 'group' user and group, since this information is not available from the WIN32 NTFS file system. But it will store and restore the date of the files.

The porting was made somewhat easy by the fact that the source contains a number of header files, not present in windows, like say unistd.h, in the source, as unistd.in.h ... with a little manipulation these files were put to good use.

Some specific things HAD to be written for WIN32, like 'opendir()', 'readdir()', etc, not present in the WIN32 runtime libraries. Some old effort had ported some under a MSDOS switch, and most of this still works under WIN32 ...

Also the *nix command shell automatically expands wild card file names in the shell, presenting the found list to the program. This had to be 'emulated' in WIN32 ...

I used MSVC8 (Visual Studio C++ 2005 Express Edition), and there are still LOTS of warnings that should be looked into. But it compiles and links without error. I used the /MT (multithreaded static) runtime, but think it would also compile using the /MD (multithreaded DLL) runtime.

The source consisted of three projects. A libtar.lib static library, and a tar.exe executable, which I kept as two components, and a testing application, genfile.exe. I started with a MSVC6 DSW/DSP build file set, since these can be easily manually created, and allowed MSVC8 to convert these to its SLN/VCPROJ files ...

I started with the MSVC6 DSW/DSP because I have a Perl script that does quite well in reading the *nix Makefile.am files, and building the resultant DSW/DSP file ...

In the end the 'difference' is relatively small, consisting mainly of using a _MSC_VER switch to provide alternate code, as most new pieces of code have been placed in a winport.c file ... The biggest item was developing a hand crafted config.h to simulate what is done by the auotmake tools.

These two files, together with the 'converted' *nix headers, and the build files, have been placed in the sub-folder called Win32 ... the actual build should be as simple as loading either the tar.dsw into any version of MSVC6 onwards, or the tar.sln file into MSVC8 or onwards, and building ...

As stated, no big effort has been made to remove the many 'warnings' ... in fact a lot of pragma warning (disable:????) have been added to config.h to suppress many, but it should build without errors. Over time these suppressed warnings should be addressed ...


top

Testing

Then working through some of the *.at test suites in 'tests', and creating a Win32/tests/*.bat file equivalent, I think I have got rid of the most important bugs that come from the difference in the OSs ... Of course I was help by this in having Ubuntu linux available to compile the source, and do the testing in, as a comparison ...

The MOST trying effort came from testing 'spares' files. Not all windows file systems support 'sparse' file, but the now most common, NTFS does, but it required LOTS of code changes. Basically it all starts with 'stat' and 'seek'. The standard version only yield a 32-bit 'st_size' offset, thus is you have an 8GB sparse file, then these bomb out ;=((

So I added some defines that gave me the 64-bit version, in WIN32. Of course if you are lucky enough to have a 64-bit OS, then this would be MUCH easier, but I wanted it to work in my 32-bit XP and Vista, and still handle BIG files. That is files greater than a 32-bit unsigned int can hold ...

Maximum unsigned int = 4,294,967,295 bytes. About 4GB.
Maximum unsigned __int64 = 18,446,744,073,709,551,615 bytes. About 16 Exa-Bytes
A BIG difference ;=))

/* ********************************************
   use 64-bit seek and stat
   ******************************************** */
#define struct_stat struct __stat64
#define  FDSEEK   _lseeki64
#define  FNSTAT   _stat64
#define  FDSTAT   _fstat64
#define  lstat    FNSTAT

Also in unix, tar 'knows' it is a 'sparse' file from checking the results of the 'stat', but it seems that is not possible in WIN32, or at least I was not able to get the ST_IS_SPARSE() macro to work - see system.h for details, so had to find a WIN32 way, and found it in the code - it is an attribute of the file ...

int win_is_sparse_file( char * file ) {
   int iret = 0;
   HANDLE hFile = CreateFile(file, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
   BY_HANDLE_FILE_INFORMATION bhfi;
   memset( &bhfi, 0, sizeof(BY_HANDLE_FILE_INFORMATION) );
   if( VFH(hFile) ) {
      if( GetFileInformationByHandle(hFile, &bhfi) ) {
         if( bhfi.dwFileAttributes & FILE_ATTRIBUTE_SPARSE_FILE )
            iret = 1;
      }
      CloseHandle(hFile);
   }
   return iret;
}

And while 'stat' returned the file size, I used the following to get the actual file size, which should be less than the file size if there are sparse regions in the file, if needed -

__int64 win_get_sparse_file_size( char * file_name ) {
   LARGE_INTEGER li;
   _int64 i64;
   li.LowPart = GetCompressedFileSize( file_name, // LPCTSTR lpFileName,
      (LPDWORD)&li.HighPart );  // LPDWORD lpFileSizeHigh
   i64 = li.QuadPart;
   if( li.LowPart == INVALID_FILE_SIZE )
   {
      /* potential error */
      if( GetLastError() != NO_ERROR )
         return -1;
   }
   return i64;
}

The current tar does not use this, and instead just reads the file, block by block, and checks if it is all zero. This works because the 'read' function is passed all zeros for the 'sparse' blank file regions. In fact this way 'tar' get the minimal size of bytes it need to keep - that is non-zero area. This is slower by far for BIG (mainly empty) files, but is very efficient, since it seems my XP uses 64K as the minimum non-sparse region.

As to supporting 64-bits, initially it seemed I only needed to re-define, or push windows to define 'off_t' and 'size_t' as 64-bit entities, but doing this produced a non-functional 'tar' executable. I think this is because windows use these two values in structures, and lots of other places, and forcing them to 64-bits causes big problems. Of course this is probably what happens in WIN64 ...

So I needed to define an 'off64.t' and 'size64_t' and do the replacement in all the right places. THIS WAS A BIG JOB, and no doubt I have missed some ... but I could not think of any other way. It could not be done in a 'global' replacement, since there were many places in WIN32 where a 64-bit value was NOT appropriate, like for example file read and writes, memory allocation, etc ...

In fact I had to create a specific error to handle an exit if some values turned out over the 4GB (maximum uint) range ... but this should not happen, since 'tar' normally only allocate, write, reads in 'blocks' - 512 bytes ...

Extracting, and re-creating a 'sparse' file also presented a problem. The only way I found to 'set' the 'sparse' attributes was through the code, which I added to the genfile application, to be able to generate 'sparse' files -

   if ( !DeviceIoControl(handle, FSCTL_SET_SPARSE,
         NULL, 0, NULL, 0, &dwatts, NULL) )
      error (EXIT_FAILURE, GetLastError(),
         _("cannot set sparse '%s'"), file_name);

Now 'tar' uses the low level file descriptor returned from 'open', and this function requires the 'HANDLE' of the file, which are different ranges, but I found a neat way to get the HANDLE from the 'descriptor' -

HANDLE h = _get_osfhandle (fd);

These two have been combined into two services added to winport.[c|h] -

int win_set_sparse_attribute( char * name, long h ) {
   DWORD dwatts;
   if ( !VFH((HANDLE)h) ||
      !DeviceIoControl((HANDLE)h, FSCTL_SET_SPARSE,
       NULL, 0, NULL, 0, &dwatts, NULL) ) {
      error (0, 0, _("cannot set sparse attribute on '%s'"), name);
      return 1;
   }
   return 0;
}
int win_set_sparse_attribute_on_descriptor( char * name, int fd ) {
   long h = _get_osfhandle (fd);
   if( VFH((HANDLE)h) )
      return win_set_sparse_attribute( name, h );
   return 1;
}

So I can add the 'sparse' attribute ... and in sparse.c -> sparse_extract_file(), I call the above function to set the 'sparse' attribute ... and it work ;=)) ... like in the modified genfile, 'seek' is used to get to the next significant section before writing valid data, and windows does the sparse mapping  ... and this change massively sped up the 'sparse' extraction process. But the 'tar' of a large 'sparse' file still takes a lot of time, due to the fact 'tar' reads every byte, in 'blocks', and checks if it is all zero, as mentioned above ...

 I now have re-worked many of the tests into batch file in the folder 'Win32/tests' - the 'setup.bat' needs to be manually adjust to suit your environment, and the tools you have available. I have several personal utilties, and of course some from the unix utilities - see GNU Win32 port - and some other windows utilities can be used - see setup-ms.bat for examples ...

The set of batch files, in Win32/tests, includes :-

append.bat         append01.bat       append02.bat       blank.bat          chtype.bat
comprec.bat        delete01.bat       delete02.bat       delete03.bat       delete04.bat
delete05.bat       exclude.bat        extrac01.bat       extrac02.bat       extrac03.bat
extrac04.bat       extrac05.bat       extrac06.bat       extrac07.bat       grow.bat
gzip.bat           ignfail.bat        incr01.bat         mdd.bat            setup.bat
setup-alt.bat      setup-ms.bat       shortrec.bat       shortupd.bat       sparse01.bat
sparse02.bat       sparse03.bat       spmvp00.bat        TAR_MVP_TEST.bat   T-empty.bat
T-null.bat         truncate.bat       update.bat         verbose.bat        version.bat
volsize.bat        volume.bat

As mentioned, setup.bat is used by all, and setup-ms.bat, setup-alt.bat are other samples. Also mdd.bat is used to both create a directory and change into the directory. Also TAR_MVP_TEST.bat is only meant to be called by others. And the blank.bat is only a template used to create new batch files ...


top

Downloads

As usual, take care downloading binary executables from the web!

Date Link Size MD5
2011/08/02 tare07.zip 281,355 384ab2820b215e4c6f4e13cac54d4286
2011/08/02 tar-07.zip 2,109,577 dd8d81375fe62a9ea22cad406bef22b7
2011/08/02 tarw07.zip 400,642 61d1a35b6d4a144ed0e0c344695be18e

Where:

tareNN.zip Contains a WIN32 executable. Is all you need to try it, but take care! It has not been full debugged in every situation, and for 'gz' files, requires that you have 'gzip' in your PATH.
tar-NN.zip Contains the FULL source, including the following.
tarwNN.zip Contains just the 'Win32' folder items.

top

Older versions

link description MD5
2009-09-08: Version 1.21.06 for WIN32 - minor update to remove erroneous error message for directories.
tare06.zip WIN32 exe - this contain a tar.exe binary, for testing, version 1.21.06. size 187,494 bytes
9b7af9ef8b3f7d858e2cec0c86c1980c
tarw06.zip WIN32 src - this contain the contents of the Win32 folder, and includes a tar-06.diff.txt file to patch the tar-1.20.tar.gz source. size 400,293 bytes
1714a16e88534e5adaf7f96d65d26ede
tar-06.zip Full src - The whole shebang in one zip file, including my modified source, only excluding the tests, (*.at), language files, (*.po and *.gmo), and autoconf files (*.m4), none of which are used in this WIN32 port. size 2,109,115 byte
3d40d73247f9896078d92c10b7109808
Version 1.21.05 for WIN32, lots of testing; supports file max. > 4GB, and 'sparse' extract.
tare05.zip WIN32 exe - this contain a tar.exe binary, for testing, version 1.21.05. 7aaa17d6ee30d9cb67975698e8cebeac
tarw05.zip WIN32 src - this contain the contents of the Win32 folder, and includes a tar-05.diff.txt file to patch the tar-1.20.tar.gz source. 1d9e1230c59f18ff41c3da8e149ba3a4
tar-05.zip Full src - The whole shebang in one zip file, including my modified source, only excluding the tests, (*.at), language files, (*.po and *.gmo), and autoconf files (*.m4), none of which are used in this WIN32 port. 546619437fd13b3a8f5fd7db70bea01f
This is version 1.21.02 for WIN32, very little testing. BETA VERSION - file max. 4GB, no 'sparse' extract.
tare02.zip WIN32 exe - this contain a tar.exe binary, or testing, version 1.21.02. e11b3e1c96480cf7e4eaddd39c1e9042
tarw02.zip WIN32 src - this contain the contents of the Win32 folder, and includes a tar-1.21.02.patch to enable a patch to be applied to the tar-1.20.tar.gz source. b3075a05c9913b8807147f83f121a607
tar-02.zip Full src - The whole shebang in one zip file, including my modified source, only excluding the tests, (*.at), language files, (*.po and *.gmo), and autoconf files (*.m4), none of which are used in this WIN32 port. 7a5aace79fa2e5c15988c17284952f8b
This is like version 0.09 for WIN32, and only does the most basic functions! ALPHA VERSION ONLY
tare01.zip WIN32 exe - this contain a tar.exe binary, or testing, version 1.21.beta. 6aa6793f41166c0875a2baa7f7cce6c8
tarw01.zip WIN32 src - this contain the contents of the Win32 folder, and includes a tar-01.diff.txt to enable a patch to be applied to the tar-1.20.tar.gz source. 9a0b1d2c8050f2fca35f1e29abc7addb
tar-01.zip Full src - The whole shebang in one zip file, including my modified source, only excluding the tests, (*.at), language files, (*.po and *.gmo), and autoconf files (*.m4), none of which are used in this WIN32 port. 6c14c4685636f959bc62ba3742ea3a48

Geoff.
Update: 2 August, 2011
Update: 8 September, 2009.
Thursday, August 14, 2008.

EOF - original from tar-01.doc


top

checked by tidy  Valid HTML 4.01 Transitional