# Libarchive - Multi-format archive and compression [libarchive](https://www.libarchive.org/) is a multi-format archive and compression libarary. This module provides a very composable high level interface to the library for reading, processing and writing archives of files. See [PHLPM Talk](http://curt.tilmes.org/2019-PHLPM-Libarchive) for some more description and examples of usage (mostly duplicates what is here). ## Simple, streaming archive reading use Libarchive::Simple; .put for archive-read 'myfile.tar.gz'; # Print listing .extract for archive-read $*IN; # Extract all files # Print a custom listing, using field accessors for archive-read($*IN) { put "dir: {.pathname}" if .is-dir; put "file: {.pathname} {.human-size}" if .is-file; } for archive-read('this.tar.gz') { .content.put if .pathname eq 'README' } archive-read('this.zip'.IO) # Process Seq in normal ways .grep({ .pathname ~~ /README/ }) # with for, grep, map, etc. .map: { .extract :verbose }; # print listing to STDERR as extract # Many extract options to customize, either in object or extract() for archive-read('dvd.iso', :extract-no-overwrite, destpath => '/somewhere') { next unless .pathname eq 'the-file-i-want'; .extract(perm => 0o600); } Can read from filename, IO::Path, IO::Handle, Memory Buf, Supply of Blobs, Channel of Blobs `archive-read()` is just short-hand for `Libarchive::Read.new()` ## Simple, streaming archive writing use Libarchive::Simple; with archive-write('foo.zip') { .add: 'afile'; # Add a file from the filesystem to the archive .add: 'somedir'; # Add a directory, but not contents .add: dir('somedir'); # Add every file in a directory .add: 'thisdir', dir('thisdir'); # Add directory and contents .write: 'afile', "Some content\n"; # Create a file from a Str .write: 'bfile', buf8.new(1,2,3,4); # or from a Blob .write: 'bigrandomfile', # or an IO::Handle '/dev/urandom'.IO.open(:bin), size => 100_000; # override size .mkdir: 'adir'; # Create a directory .mkdir: 'bdir', perm => 0o700; # Override perm .write: 'cdir'.IO.add('another'), "this\n"; # IO::Path is fine too .symlink: 'linked', 'adir/anotherfile'; # Create a symlink .symlink: 'anotherlink' => 'adir/yetanother'; # Pair symlink is ok .close; # Always close! } Can write to filename, IO::Path, IO::Handle, Memory Buf, Supplier of Blobs, Channel of Blobs. Must specify format (optionally filters) unless filename: archive-write($*OUT, format => 'zip'); # Send zip file to STDOUT `archive-write()` is just shorthand for `Libarchive::Write.new()` ## Simple, Slurping all content into memory: use Libarchive::Simple; my $archive := archive-slurp 'this.tar'; say $archive; # Print listing put $archive; # content of a file $archive.content = "Change content\n"; # change existing file $archive:delete; # Remove file $archive.spurt: 'foo.zip'; # Dump archive back to disk `archive-slurp()` is just shorthand for `Libarchive::Archive.new()` It creates an object that is both `Iterable` just like `archive-read`, and also `Associative`, including all the data/content from the archive instead of reading it out of the stream as it goes, so you can use hyper processing in parallel without worry. The keys are paths, not just filenames. If the archive has two files with exactly the same path, you'll just get one. (Why would you do that anyway?) ## Processing Archives in a pipeline `Libarchive::Read` (and `archive-read`) produces a Seq of `Libarchive::Entry`s. You can use the `.copy` method to copy them into an `Libarchive::Write`. For example, you could hook up a reader to a writer to convert a tar file to a zip file (or ISO or whatever): use Libarchive::Simple; with archive-write($*OUT, format => 'zip') { .copy: archive-read($*IN, format => 'tar') .close; } Or even process the contents in various ways as they go: use Libarchive::Simple; with archive-write($*OUT, format => 'zip') { .write: 'NEWREADME', "This is my README\n"; # Add some extra files .write: 'LICENSE', "Special license file\n"; .copy: archive-read($*IN, format => 'tar') .grep({ .pathname ~~ /good/}) # Only pass good files .map({ .pathname(.pathname.uc) }) # Uppercase filenames .map({ .uname('fred').perm(0o600)}); # Change owner and perm .close; } When streaming, make sure you keep the sequence lazy, otherwise the stream with the data will be past before the copy occurs. If you want random access, use `Libarchive::Archive` or `archive-slurp`. # Filtering without an Archive, format 'raw' `libarchive` supports a special format 'raw' that works on a single virtual file, passing it through the specified filters. This can be used to `compress`, `gzip`, `bzip2` etc. The manual process is something like this: with archive-write($dest, format => 'raw', filter => 'gzip') { .write('ignore-filename', $source, size => ...); .close } or with archive-read($source, format => 'raw') { my $header = .read; # Read and ignore the archive header while my $buf = .read-data() { ...do something with $buf... } } These constructs have been packaged up into `Libarchive::Filter` with two subroutines `archive-encode` and `archive-decode`. Each take a `$source`, and a `$destination` that can be most of the normal things. `archive-encode`, of course, must include 1 or more filters to be useful. For example, you can read/write files: use Libarchive::Filter; archive-encode('Some content', 'file.gz', filter => 'gzip'); my $content = archive-decode('file.gz'); ... $content eq 'Some content'; or just use a memory buffer: use Libarchive::Filter; my $buf = archive-encode('Some content', filter => 'gzip'); ...encoded into $buf... my $content = archive-decode($buf); ...$content eq 'Some content' `archive-encode` sources can be anything that `archive-write` will write: content in a `Str` or `Buf`, or a filename `IO::Path`, an `IO::Handle`, a `Supply` or `Channel` of `Blob`s. `archive-encode` destinations can be anything that `archive-write` will produce: `Buf`, `IO::Handle`, `Supplier`, `Channel`, or a `Str` or `IO::Path` filename. `archive-decode` sources can be anything that `archive-read` will read: filename in a `Str` or `IO::Path`, `Blob`, `Supply`, `IO::Handle`, or `Channel`. `archive-decode` destinations can be `Blob`, `IO::Handle`, `IO::Path`, `Supplier`, `Channel`. If you don't set a destination, a `Str` with the content is returned. Note that the `Str` *into* `archive-encode` or *out* of `archive-decode` is the content itself, but `Str` *out* of `archive-encode` or *into* `archive-decode` are filenames. You can always use `IO::Path` for a filename. A number of shortcuts for various filters have also been defined: use Libarchive::Filter :gzip; my $buf = gzip('Some content'); my $content = gunzip($buf); These include: * `:gzip` -> `gzip()` and `gunzip()` * `:compress` -> `compress()` and `uncompress()` * `:bzip2` -> `bzip2()` and `bunzip2()` * `:lz4` -> `lz4()` and `unlz4()` * `:uuencode` -> `uuencode()` and `uudecode()` * `:lzma` -> `lzma()` and `unlzma()` You can also specify `use Libarchive::Filter :all` to get all the shortcut routines. These all take the same options that `archive-encode()` and `archive-decode()` do and go to/from files, IO::Handles, Supplies, Channels, etc. ## Formats and Filters Valid read formats: '7zip', 'ar', 'cab', 'cpio', 'empty', 'gnutar', 'iso9660', 'lha', 'mtree', 'rar', 'raw', 'tar', 'warc', 'xar', 'zip', 'zip-streamable', 'zip-seekable' Valid read filters: 'bzip2', 'compress', 'gzip', 'grzip', 'lrzip', 'lz4', 'lzip', 'lzma', 'lzop', 'none', 'rpm', 'uu', 'xz', 'zstd' You can specify a list of multiple formats/filters to consider if you want to limit which types you support. You can also specify 'all' for either format or filter, which is the default. Valid write formats: '7zip', 'ar', 'arbsd', 'argnu', 'arsvr4', 'bsdtar', 'cd9660', 'cpio', 'gnutar', 'iso', 'iso9660', 'mtree', 'mtree-classic', 'newc', 'odc', 'oldtar', 'pax', 'paxr', 'posix', 'raw', 'rpax', 'shar', 'shardump', 'ustar', 'v7tar', 'v7', 'warc', 'xar', 'zip' Valid write filters: 'b64encode', 'bzip2', 'compress', 'grzip', 'gzip', 'lrzip', 'lz4', 'lzip', 'lzma', 'lzop', 'uuencode', 'xz', 'zstd' By default, if you write to a file, the extension of the filename will be used to set the format (and possibly filter): You can override by explicitly specifying a format and/or filters: Libarchive::Write.new('myfile.tar.gz', format => 'zip'); will create a zip file named 'myfile.tar.gz' (but don't do that). If you are writing to a stream, you _*must*_ specify a format: Libarchive::Write.new($*OUT, format => 'zip'); You can optionally specify one or more filters to use while writing. Libarchive::Write.new('myfile', format => 'gnutar', filter => ); Multiple filters are built into a pipeline, so the order they are listed is significant. For more details on the specific way that libarchive handles each format, including some limitations, see the man page: [libarchive-formats.5](https://github.com/libarchive/libarchive/wiki/ManPageLibarchiveFormats5) and the [libarchive wiki](https://github.com/libarchive/libarchive/wiki/LibarchiveFormats). ## Libarchive Entry methods An `Libarchive::Entry` is sort of like a super-stat, holding all of the information about a file system component. `Str` and `gist` return a single line summary of the archive entry, kind of like an 'ls -l' or 'tar t' listing. The other methods can query and/or set various information about the entry: `pathname`, `size`, `uid`, `gid`, `uname`, `gname`, `fflags` `perm` - Integer permissions, for new files, defaults to `0o644`, for new directories, defaults to `0o755`. `atime`, `mtime`, `ctime`, `birthtime` - Various times, returned as `DateTime`s. Depending on the archive format, these might not be set. `symlink` - for a symbolic link, this is what it points to `strmode` - Read only unixish string for filetype/permissions (like `-rw-r--r--` or `drwxr-x-r-x`) `mode` - file mode, better to use perm and/or filetype `human-size` - uses [`Number::Bytes::Human`](https://github.com/dugword/Number-Bytes-Human) to process the size, so you get values like "15M", "25K" or "96B" for the size of a file. `filetype` - returns an `Libarchive::Filetype` object that numifys to the Unix/C filetype bits and stringifys to: REG, LINK, SOCK, CHAR, BLOCK, DIR, FIFO. You can pass in `:dir` to set filetype to DIR (or just use '.mkdir'); `is-file` - Bool shortcut to query for filetype REG `is-dir` - Bool shortcut to query for filetype DIR ## Libarchive Entry Extraction A `Libarchive::Read` produces `Libarchive::Entry::Read` objects that are `Libarchive::Entry`s with several additional methods: `data` reads the content of the entry from the data stream and returns it as a `Buf`. `content` - same as `data`, but `decode`s the `Buf` into a Str (encoding `utf-8` -- if you want other encodings, just call `decode` on data). `extract` - extracts the entry into a filesystem entity (file, directory, symlink, socket, fifo, etc.) You can change the `pathname` to rename or move the file around. You can also pass in `:destpath` either to the main object on creation, or to `extract()` and it will be prepended to the `pathname`. You can also pass in extract flags, either to the main object, or to individual `extract` calls to control the extraction: ## Extract flags: Extract flags can be specified to `Libarchive::Read.new()`, or to the `.open()`, or to `.extract()`. Flags to `.new()` and `.open()` are sticky, and will affect all future `.open`s as well. Flags to `.extract()` are not -- they affect only the specific extract. `:extract-owner` - The user and group IDs should be set on the restored file. By default, the user and group IDs are not restored. `:extract-perm` - Full permissions (including SGID, SUID, and sticky bits) should be restored exactly as specified, without obeying the current umask. Note that SUID and SGID bits can only be restored if the user and group ID of the object on disk are correct. If `:extract_owner` is not specified, then SUID and SGID bits will only be restored if the default user and group IDs of newly-created objects on disk happen to match those specified in the archive entry. By default, only basic permissions are restored, and umask is obeyed. `:extract-time` - The timestamps (mtime, ctime, and atime) should be restored. By default, they are ignored. Note that restoring of atime is not currently supported. `:extract-no-overwrite` - Existing files on disk will not be overwritten. By default, existing regular files are truncated and overwritten; existing directories will have their permissions updated; other pre-existing objects are unlinked and recreated from scratch. `:extract-unlink` - Existing files on disk will be unlinked before any attempt to create them. In some cases, this can prove to be a significant performance improvement. By default, existing files are truncated and rewritten, but the file is not recreated. In particular, the default behavior does not break existing hard links. `:extract-acl` - Attempt to restore ACLs. By default, extended ACLs are ignored. `:extract-fflags` - Attempt to restore extended file flags. By default, file flags are ignored. `:extract-xattr` - Attempt to restore POSIX.1e extended attributes. By default, they are ignored. `:extract-secure-symlinks` - Refuse to extract any object whose final location would be altered by a symlink on disk. This is intended to help guard against a variety of mischief caused by archives that (deliberately or otherwise) extract files outside of the current directory. The default is not to perform this check. If `:extract-unlink` is specified together with this option, the library will remove any intermediate symlinks it finds and return an error only if such symlink could not be removed. `:extract-secure-nodotdot` - Refuse to extract a path that contains a .. element anywhere within it. The default is to not refuse such paths. Note that paths ending in .. always cause an error, regardless of this flag. `:extract-secure-noabsolutepaths` - Refuse to extract an absolute path. The default is to not refuse such paths. `:extract-sparse` - Scan data for blocks of NUL bytes and try to recreate them with holes. This results in sparse files, independent of whether the archive format supports or uses them. `:extract-clear-nochange-fflags` - Before removing a file system object prior to replacing it, clear platform-specific file flags which might prevent its removal. ## Creating a new archive ## Writing to an archive Using either `Libarchive::Write.new()` or `archive-write()`, there are a number of methods for adding/creating filesystem entities. ### Add existing filesystem entitities: `add()` adds existing entities by filename or `IO::Path`. You may find ecosystem modules such as [`File::Find`](https://github.com/tadzik/File-Find) or [`Concurrent::File::Find`](https://github.com/gfldex/perl6-concurrent-file-find) useful for generating lists of files: ``` use Libarchive::Simple; use Concurrent::File::Find; with archive-write('somefile.tar.gz') { .add '/somedir', find('/somedir'); # Recursively add files } ``` If you add files within a directory, don't forget to add the directory itself if you want it to be created on extraction too. ### Create new files `write($filename, $content)` will create a new file `$filename` can be a `Str` or something that will convert to a `Str`, like an `IO::Path`. `$content` can be a `Str`, a `Blob`, an `IO::Handle` from which the content will be read, or an `IO::Path` from which the content will be read. ### Create directories `mkdir($pathname)` will add a new directory to the archive ### Create new symbolic links `symlink($pathname, $symlink)` or `symlink($pathname => $symlink)` ### Add a sequence of `Archive::Entry`s Use `copy()` to read from an `archive-read()` or `archive-slurp()` sequence into a new archive. ## LICENSE Copyright © 2019 United States Government as represented by the Administrator, National Aeronautics and Space Administration. No Copyright is claimed in the United States under Title 17, U.S. Code. All Other Rights Reserved.