Skip to content

Rugo

A C++17 and Cython powered Parquet metadata reader for Python.

License Python Version PyPI Downloads

Rugo delivers high-throughput metadata inspection without loading columnar data pages, making it ideal for quickly analyzing Parquet file structure and statistics.

Key Features

  • Fast metadata extraction backed by an optimized C++17 parser and thin Python bindings
  • Complete schema and row-group details, including encodings, codecs, offsets, bloom filter pointers, and custom key/value metadata
  • Flexible input sources - works with file paths, byte strings, and contiguous memoryviews for zero-copy parsing
  • Optional schema conversion helpers for Orso
  • No runtime dependencies beyond the Python standard library

Quick Example

import rugo.parquet as parquet_meta

metadata = parquet_meta.read_metadata("example.parquet")

print(f"Rows: {metadata['num_rows']}")
print("Schema columns:")
for column in metadata["schema_columns"]:
    print(f"  {column['name']}: {column['physical_type']} ({column['logical_type']})")

Installation

pip install rugo

Use Cases

  • Schema inspection - quickly understand Parquet file structure without reading data
  • Metadata analysis - examine encoding, compression, and statistics across row groups
  • Query planning - use min/max statistics for predicate pushdown
  • Data profiling - analyze data distribution and file structure

Getting Started

Ready to dive in? Check out our Getting Started guide or jump straight to the Quickstart.

Project Status

Rugo is in active development (alpha). The API may evolve as we add features and refine the interface. Contributions are welcome!

License

Licensed under the Apache License 2.0. See LICENSE for full terms.