Rugo
A C++17 and Cython powered Parquet metadata reader for Python.
Rugo delivers high-throughput metadata inspection without loading columnar data pages, making it ideal for quickly analyzing Parquet file structure and statistics.
Key Features
- Fast metadata extraction backed by an optimized C++17 parser and thin Python bindings
- Complete schema and row-group details, including encodings, codecs, offsets, bloom filter pointers, and custom key/value metadata
- Flexible input sources - works with file paths, byte strings, and contiguous memoryviews for zero-copy parsing
- Optional schema conversion helpers for Orso
- No runtime dependencies beyond the Python standard library
Quick Example
import rugo.parquet as parquet_meta
metadata = parquet_meta.read_metadata("example.parquet")
print(f"Rows: {metadata['num_rows']}")
print("Schema columns:")
for column in metadata["schema_columns"]:
print(f" {column['name']}: {column['physical_type']} ({column['logical_type']})")
Installation
Use Cases
- Schema inspection - quickly understand Parquet file structure without reading data
- Metadata analysis - examine encoding, compression, and statistics across row groups
- Query planning - use min/max statistics for predicate pushdown
- Data profiling - analyze data distribution and file structure
Getting Started
Ready to dive in? Check out our Getting Started guide or jump straight to the Quickstart.
Project Status
Rugo is in active development (alpha). The API may evolve as we add features and refine the interface. Contributions are welcome!
License
Licensed under the Apache License 2.0. See LICENSE for full terms.