Nim4 | When Zip Isn't Enough

The Problem: Native Zip's Limitations

Let's be honest—zip has been around since 1989, and it shows. While it's ubiquitous and simple, it's also severely limited for modern use cases:
Poor Compression Ratios: Native zip uses the DEFLATE algorithm, which is fast but produces significantly larger archives compared to modern algorithms like LZMA2.
Slow Performance: Even for straightforward compression tasks, native zip can be painfully slow, especially on large files or directories.
No Volume Splitting: Want to split a 10GB archive into smaller chunks for sharing or burning to media? Native zip doesn't support this natively.
Weak Security: Password protection in zip is notoriously weak and easy to crack.
Limited Progress Feedback: Most zip implementations provide minimal feedback during compression, leaving you wondering if the process is even working.
For someone like me who regularly backs up projects, compresses datasets, and shares large files, these limitations weren't just annoying—they were productivity killers.

The Benchmark That Changed Everything

To understand just how much room for improvement existed, I ran a simple benchmark comparing native zip against 7z compression on highly compressible dummy database files. The results were staggering:

# File listing after compression
-rw-r--r--@ 1 nk  staff   15K May  7 13:22 dummy_100mb.7z
-rw-r--r--@ 1 nk  staff  100M May  7 13:19 dummy_100mb.db
-rw-r--r--@ 1 nk  staff  298K May  7 13:27 dummy_100mb.zip

-rw-r--r--@ 1 nk  staff  155K May  7 13:29 dummy_1gb.7z
-rw-r--r--@ 1 nk  staff  1.0G May  7 13:20 dummy_1gb.db
-rw-r--r--@ 1 nk  staff  3.0M May  7 13:29 dummy_1gb.zip

ook at those numbers. A 100MB file compressed to just 15KB with 7z, versus 298KB with zip. That's a 6,666x compression ratio with 7z compared to a much more modest ratio with native zip. For the 1GB file, the difference was equally dramatic—155KB versus 3MB.
This wasn't just about saving disk space. It was about fundamentally changing what's possible with compression. These results convinced me that building a modern CLI around 7z was worth the effort.

Enter Rust: The Perfect Foundation

I chose Rust for Ferrozip for several reasons that aligned perfectly with the project's goals:
Memory Safety Without Garbage Collection: Rust's ownership system ensures memory safety at compile time, eliminating entire classes of bugs without the runtime overhead of garbage collection.
Zero-Cost Abstractions: Rust allows you to write high-level, expressive code that compiles down to efficient machine code. Perfect for a performance-critical tool.
Excellent CLI Ecosystem: Crates like clap for argument parsing, indicatif for progress bars, and anyhow for error handling made building a polished CLI straightforward.
Cross-Platform by Default: Rust's standard library and ecosystem make it easy to build binaries for Linux, macOS, and Windows from a single codebase.
Strong Type System: Rust's type system caught countless bugs at compile time that would have been runtime errors in other languages.

Architecture: Ferrozip Works ^_^

Ferrozip is built around a clean, modular architecture that separates concerns and makes the codebase maintainable:

graph TB Start([User Command]) -->|input| CLI[CLI Entry Point main.rs] CLI --> Parser{Command Parser clap} Parser -->|compress| CompMod[Compression Module compress/core.rs] Parser -->|extract| DecompMod[Decompression Module decompress/core.rs] Parser -->|list| ListMod[Archive Listing decompress/list.rs] CompMod --> |validate| Util1[Size Utilities helpers/size.rs] CompMod --> |display| Prog1[Progress Bar indicatif] DecompMod --> |validate| Util2[Size Utilities helpers/size.rs] DecompMod --> |display| Prog2[Progress Bar indicatif] CompMod --> |needs split?| Split{Volume Size Specified?} Split -->|yes| Splitter[Volume Splitter compress/split.rs] Split -->|no| Wrapper1[7z Process Wrapper] DecompMod --> |multi-part?| Combine{Multiple Volumes?} Combine -->|yes| Combiner[Volume Combiner decompress/combine.rs] Combine -->|no| Wrapper2[7z Process Wrapper] Splitter --> Wrapper1 Combiner --> Wrapper2 Wrapper1 --> |spawn| Process1[7z CLI Tool LZMA2 Algorithm] Wrapper2 --> |spawn| Process2[7z CLI Tool LZMA2 Algorithm] ListMod --> |spawn| Process3[7z CLI Tool List Contents] Process1 --> |write| FS1[File System Output Archive] Process2 --> |read/write| FS2[File System Extract Files] Process3 --> |read| FS3[File System Read Archive] FS1 --> Result1([Compressed Archive *.7z]) FS2 --> Result2([Extracted Files]) FS3 --> Result3([Archive Contents List]) style Start fill:#22c55e,stroke:#16a34a,color:#000 style CLI fill:#3b82f6,stroke:#2563eb,color:#fff style Parser fill:#8b5cf6,stroke:#7c3aed,color:#fff style CompMod fill:#ec4899,stroke:#db2777,color:#fff style DecompMod fill:#f59e0b,stroke:#d97706,color:#fff style ListMod fill:#06b6d4,stroke:#0891b2,color:#fff style Process1 fill:#ef4444,stroke:#dc2626,color:#fff style Process2 fill:#ef4444,stroke:#dc2626,color:#fff style Process3 fill:#ef4444,stroke:#dc2626,color:#fff style Result1 fill:#22c55e,stroke:#16a34a,color:#000 style Result2 fill:#22c55e,stroke:#16a34a,color:#000 style Result3 fill:#22c55e,stroke:#16a34a,color:#000

CLI Layer: Handles argument parsing, validation, and user interaction using the clap crate. This layer transforms user input into structured commands.
Command Modules: Separate modules for compression, decompression, and archive management. Each module is self-contained with its own logic and tests.
7z Integration Layer: A thin wrapper around the 7z command-line tool that handles process spawning, output parsing, and error handling.
Utilities: Helper modules for size parsing (converting "5g" to bytes), text formatting, and progress display.
This modular design made it easy to add features incrementally and test each component in isolation.

Key Features: More Than Just Compression

Ferrozip isn't just about compression—it's about providing a complete archiving solution with modern features:
Customizable Compression Levels: Choose from 1-9 for the perfect balance between speed and compression ratio, or use --max for maximum compression.
Strong Encryption: Password-protect your archives with AES-256 encryption, far superior to zip's weak password protection.
Volume Splitting: Split large archives into manageable chunks with intuitive size specifications like 5g, 700m, or 4.7g for DVD-sized volumes.
Multi-Volume Support: Seamlessly create and extract multi-part archives—just point to the first part and Ferrozip handles the rest.
Progress Indicators: Real-time progress bars and status updates so you always know what's happening.
Archive Management: List contents, verify integrity, and extract selectively with simple commands.

Installation: Getting Started

Getting Ferrozip up and running is straightforward. Currently, you can install from source or download pre-built binaries:

# Install 7z dependency (required)
# macOS
brew install p7zip

# Ubuntu/Debian
sudo apt-get install p7zip-full

# Clone and build from source
git clone https://github.com/nim444/ferrozip.git
cd ferrozip
cargo build --release

# The executable will be in target/release/ferrozip
# Optional: Move to a directory in your PATH
cp target/release/ferrozip /usr/local/bin/

Pre-built binaries are available for Windows (x86_64), macOS (Intel and Apple Silicon), and Linux. For Linux users, I recommend building from source using the included build script for optimal performance on your specific system.

Usage Examples: Ferrozip in Action

Let's look at some real-world usage examples that showcase Ferrozip's capabilities:

# Basic compression with default settings
ferrozip compress my_project

# Maximum compression for long-term storage
ferrozip compress important_docs --max

# Create encrypted backup with password
ferrozip compress financial_records --password "strong_password" --max

# Split large dataset into 1GB chunks
ferrozip compress huge_dataset --split 1g --output dataset.7z

# Create DVD-sized backups (4.7GB volumes)
ferrozip compress media_collection --split 4.7g

# Quick compression for temporary backups
ferrozip compress temp_files --level 1

# Extract with password
ferrozip extract secure_archive.7z --password "strong_password"

# Extract to specific directory
ferrozip extract backup.7z --output /path/to/restore

# List archive contents without extracting
ferrozip list my_archive.7z

The CLI is designed to be intuitive and composable. Options can be combined naturally—want maximum compression, password protection, AND volume splitting? Just combine the flags. The interface follows Unix philosophy: do one thing well and make it easy to combine operations.

Performance: Real-World Numbers

Let's talk numbers. Here's a real benchmark comparing Ferrozip to native zip on compressible database files:

Original FileNative ZipFerrozip (7z)Compression Ratio
100MB DB 298KB 15KB~6,666x smaller
1GB DB 3.0MB 155KB~6,600x smaller

Technical Challenges: Lessons Learned

uilding Ferrozip wasn't without its challenges. Here are some of the key technical hurdles I encountered and how I solved them:
Process Management: Wrapping the 7z command-line tool required careful process management. I needed to spawn processes, capture output, parse progress information, and handle errors gracefully. Rust's std::process::Command API made this manageable, but edge cases around process termination and signal handling required careful thought.
Volume Splitting Logic: Implementing smart volume splitting was trickier than expected. I had to handle cases where the archive is smaller than the volume size, exact multiples of volume size, and edge cases with empty archives. The test suite in src/compress/tests.rs has comprehensive coverage of these scenarios.
Cross-Platform Compatibility: Ensuring Ferrozip works seamlessly on Windows, macOS, and Linux required attention to path handling, file permissions, and platform-specific 7z behavior. Rust's standard library helped, but I still needed platform-specific testing.
Error Handling: Making error messages helpful without being overwhelming was an art. I used the anyhow crate for error propagation and added context at each layer so users get actionable error messages rather than cryptic stack traces.
Progress Indicators: Implementing real-time progress feedback meant parsing 7z's output stream while the process is running. This required careful buffering and output parsing to provide smooth progress updates.

Code Quality: Testing and CI/CD

# Run the full test suite
cargo test

# Tests include:
# - Volume splitting edge cases
# - Multi-part archive combining
# - Size parsing and formatting
# - End-to-end compression/extraction workflows
# - Password-protected archives
# - Directory and single file handling

The project uses GitHub Actions for continuous integration with multiple workflows:

Unit Tests: Run on every commit to ensure core functionality remains intact
Multi-Platform Builds: Automated builds for Linux, macOS (Intel and Apple Silicon), and Windows
CodeQL Analysis: Automated security scanning to catch potential vulnerabilities
Binary Artifacts: Automatically built and uploaded for each platform

This automation gives me confidence that changes won't
break existing functionality and that builds work across all supported platforms.

Project Structure: Clean and Modular

src/
├── main.rs                 # Application entry point
├── cli/                    # CLI functionality
│   ├── mod.rs             # Module re-exports
│   ├── app.rs             # CLI app definition
│   └── commands.rs        # CLI commands
├── compress/              # Compression functionality
│   ├── mod.rs             # Module re-exports
│   ├── core.rs            # Core compression logic
│   ├── split.rs           # Archive splitting
│   ├── utils.rs           # Compression utilities
│   └── tests.rs           # Compression tests
├── decompress/            # Decompression functionality
│   ├── mod.rs             # Module re-exports
│   ├── core.rs            # Core decompression logic
│   ├── combine.rs         # Archive combining
│   ├── list.rs            # Archive listing
│   ├── utils.rs           # Decompression utilities
│   └── tests.rs           # Decompression tests
└── helpers/               # Helper utilities
    ├── mod.rs             # Module re-exports
    ├── size.rs            # Size-related utilities
    └── text.rs            # Text-related utilities

Size Parsing: Making UX Intuitive

// From src/helpers/size.rs
pub fn parse_size(size_str: &str) -> Result<u64> {
    let size_str = size_str.trim().to_lowercase();
    
    let (num_str, unit) = if size_str.ends_with('g') {
        (&size_str[..size_str.len() - 1], 1_073_741_824u64)
    } else if size_str.ends_with('m') {
        (&size_str[..size_str.len() - 1], 1_048_576u64)
    } else if size_str.ends_with('k') {
        (&size_str[..size_str.len() - 1], 1_024u64)
    } else if size_str.ends_with('b') {
        (&size_str[..size_str.len() - 1], 1u64)
    } else {
        return Err(anyhow!("Invalid size format"));
    };
    
    let num: u64 = num_str.parse()?;
    Ok(num * unit)
}

This simple function enables natural size specifications like 5g, 700m, or 4.7g for DVD-sized volumes. It's a small touch, but it makes the CLI feel polished and user-friendly. The function includes proper error handling and validation to catch malformed inputs early.