API Functions
Complete reference for all Rugo functions.
Metadata Reading Functions
read_metadata
Read metadata from a Parquet file.
read_metadata(
path: str | Path,
schema_only: bool = False,
include_statistics: bool = True,
max_row_groups: int = -1
) -> dict
Parameters:
path(str | Path): Path to Parquet fileschema_only(bool): Return only schema, skip row groups (default: False)include_statistics(bool): Include min/max statistics (default: True)max_row_groups(int): Limit row groups read, -1 for all (default: -1)
Returns: Dictionary with metadata structure
Example:
import rugo.parquet as parquet_meta
metadata = parquet_meta.read_metadata("file.parquet")
schema = parquet_meta.read_metadata("file.parquet", schema_only=True)
sample = parquet_meta.read_metadata("file.parquet", max_row_groups=5)
read_metadata_from_bytes
Read metadata from bytes object.
read_metadata_from_bytes(
data: bytes,
schema_only: bool = False,
include_statistics: bool = True,
max_row_groups: int = -1
) -> dict
Parameters:
data(bytes): Parquet file as bytesschema_only(bool): Return only schema (default: False)include_statistics(bool): Include statistics (default: True)max_row_groups(int): Limit row groups, -1 for all (default: -1)
Returns: Dictionary with metadata structure
Example:
with open("file.parquet", "rb") as f:
data = f.read()
metadata = parquet_meta.read_metadata_from_bytes(data)
read_metadata_from_memoryview
Read metadata from memoryview (zero-copy).
read_metadata_from_memoryview(
view: memoryview,
schema_only: bool = False,
include_statistics: bool = True,
max_row_groups: int = -1
) -> dict
Parameters:
view(memoryview): Memory view of Parquet dataschema_only(bool): Return only schema (default: False)include_statistics(bool): Include statistics (default: True)max_row_groups(int): Limit row groups, -1 for all (default: -1)
Returns: Dictionary with metadata structure
Example:
data = open("file.parquet", "rb").read()
metadata = parquet_meta.read_metadata_from_memoryview(memoryview(data))
Decoding Functions
Experimental
Decoder functions are experimental with limited capabilities.
can_decode
Check if file can be decoded with Rugo.
Parameters:
path(str | Path): Path to Parquet file
Returns: True if file can be decoded, False otherwise
Example:
if parquet_meta.can_decode("file.parquet"):
values = parquet_meta.decode_column("file.parquet", "column")
decode_column
Decode column data from Parquet file.
Parameters:
path(str | Path): Path to Parquet filecolumn_name(str): Name of column to decode
Returns: List of Python values
Limitations:
- Only uncompressed data
- Only PLAIN encoding
- Only int32, int64, string types
- Only first row group
- Only required (non-nullable) columns
Example:
Orso Conversion Functions
Available when installed with pip install rugo[orso].
rugo_to_orso_schema
Convert Rugo metadata to Orso Relation.
from rugo.converters.orso import rugo_to_orso_schema
rugo_to_orso_schema(
metadata: dict,
table_name: str
) -> Relation
Parameters:
metadata(dict): Metadata fromread_metadata()table_name(str): Name for the relation
Returns: Orso Relation object
Example:
metadata = parquet_meta.read_metadata("file.parquet")
relation = rugo_to_orso_schema(metadata, "my_table")
extract_schema_only
Extract schema in simplified format.
Parameters:
metadata(dict): Metadata fromread_metadata()
Returns: List of column dictionaries
Example:
metadata = parquet_meta.read_metadata("file.parquet")
schema = extract_schema_only(metadata)
for col in schema:
print(f"{col['name']}: {col['type']}")
Common Patterns
Read and Inspect
# Full metadata
metadata = parquet_meta.read_metadata("file.parquet")
# Schema only (faster)
schema = parquet_meta.read_metadata("file.parquet", schema_only=True)
# Sample (even faster)
sample = parquet_meta.read_metadata(
"file.parquet",
max_row_groups=5,
include_statistics=False
)
From Memory
# From bytes
with open("file.parquet", "rb") as f:
metadata = parquet_meta.read_metadata_from_bytes(f.read())
# From memoryview (zero-copy)
data = open("file.parquet", "rb").read()
metadata = parquet_meta.read_metadata_from_memoryview(memoryview(data))
Decode Data
# Check first
if parquet_meta.can_decode("file.parquet"):
values = parquet_meta.decode_column("file.parquet", "column")
else:
# Fallback to PyArrow
import pyarrow.parquet as pq
table = pq.read_table("file.parquet")
Error Handling
All functions may raise exceptions:
try:
metadata = parquet_meta.read_metadata("file.parquet")
except FileNotFoundError:
print("File not found")
except Exception as e:
print(f"Error reading metadata: {e}")
try:
values = parquet_meta.decode_column("file.parquet", "col")
except Exception as e:
print(f"Cannot decode: {e}")
Type Hints
from typing import Union
from pathlib import Path
# Metadata return type
MetadataDict = dict[str, Any]
# Path types
PathLike = Union[str, Path]
# Example with type hints
def process_file(path: PathLike) -> MetadataDict:
return parquet_meta.read_metadata(path)
Next Steps
- Return Types - Detailed type information
- User Guide - Usage examples