Skip to content

Overview

Rugo is a specialized tool for reading Parquet metadata with high performance and minimal overhead.

What is Rugo?

Rugo is a Python library that provides fast, efficient access to Parquet file metadata. Built with C++17 and Cython, it offers:

  • High Performance: Optimized C++ parser with thin Python bindings
  • Complete Metadata: Access to all schema, row group, and column statistics
  • Flexible APIs: Multiple input methods (file paths, bytes, memoryview)
  • Zero Dependencies: No runtime dependencies beyond Python stdlib

Why Rugo?

When working with Parquet files, you often need to inspect metadata before reading data:

  • Understanding file structure and schema
  • Analyzing statistics for query optimization
  • Checking encoding and compression settings
  • Examining bloom filter availability

Traditional tools like PyArrow are excellent for reading data but can be heavyweight when you only need metadata. Rugo is purpose-built for metadata inspection, offering:

  • Faster startup - minimal overhead for quick operations
  • Lower memory usage - doesn't load data pages
  • Simpler output - pure Python dictionaries ready for JSON serialization

When to Use Rugo

Rugo is ideal for:

Metadata-only operations - inspecting file structure without reading data
Quick analysis - fast startup and low memory footprint
Integration scenarios - simple dict output integrates easily
Custom tooling - building your own Parquet utilities

Rugo may not be the best choice for:

Full data reading - use PyArrow or FastParquet for complete data access
Complex queries - dedicated query engines are better suited
Production decoding - the prototype decoder has limited capabilities

Architecture

Rugo consists of three main components:

  1. C++ Metadata Parser - High-performance Thrift and metadata parsing
  2. Cython Bindings - Thin layer connecting Python to C++
  3. Python Interface - Simple, Pythonic API for metadata access

This architecture provides the best of both worlds: C++ performance with Python convenience.

Next Steps