Overview
Rugo is a specialized tool for reading Parquet metadata with high performance and minimal overhead.
What is Rugo?
Rugo is a Python library that provides fast, efficient access to Parquet file metadata. Built with C++17 and Cython, it offers:
- High Performance: Optimized C++ parser with thin Python bindings
- Complete Metadata: Access to all schema, row group, and column statistics
- Flexible APIs: Multiple input methods (file paths, bytes, memoryview)
- Zero Dependencies: No runtime dependencies beyond Python stdlib
Why Rugo?
When working with Parquet files, you often need to inspect metadata before reading data:
- Understanding file structure and schema
- Analyzing statistics for query optimization
- Checking encoding and compression settings
- Examining bloom filter availability
Traditional tools like PyArrow are excellent for reading data but can be heavyweight when you only need metadata. Rugo is purpose-built for metadata inspection, offering:
- Faster startup - minimal overhead for quick operations
- Lower memory usage - doesn't load data pages
- Simpler output - pure Python dictionaries ready for JSON serialization
When to Use Rugo
Rugo is ideal for:
✅ Metadata-only operations - inspecting file structure without reading data
✅ Quick analysis - fast startup and low memory footprint
✅ Integration scenarios - simple dict output integrates easily
✅ Custom tooling - building your own Parquet utilities
Rugo may not be the best choice for:
❌ Full data reading - use PyArrow or FastParquet for complete data access
❌ Complex queries - dedicated query engines are better suited
❌ Production decoding - the prototype decoder has limited capabilities
Architecture
Rugo consists of three main components:
- C++ Metadata Parser - High-performance Thrift and metadata parsing
- Cython Bindings - Thin layer connecting Python to C++
- Python Interface - Simple, Pythonic API for metadata access
This architecture provides the best of both worlds: C++ performance with Python convenience.
Next Steps
- Installation Guide - Install Rugo in your environment
- Quickstart - Get started with basic examples
- User Guide - Learn about all features