patito.Database
- class patito.Database(query_handler, cache_directory=None, default_ttl=datetime.timedelta(days=364))
Construct manager for executing SQL queries and caching the results.
- Parameters:
query_handler (
Callable
[...
,Table
]) – The function that the Database object should use for executing SQL queries. Its first argument should be the SQL query string to execute, and it should return the query result as an arrow table, for instance pyarrow.Table.cache_directory (
Optional
[Path
]) – Path to the directory where caches should be stored as parquet files. If not provided, the XDG Base Directory Specification will be used to determine the suitable cache directory, by default~/.cache/patito
or${XDG_CACHE_HOME}/patito
.default_ttl (
timedelta
) – The default Time To Live (TTL), or with other words, how long to wait until caches are refreshed due to old age. The given default TTL can be overwritten by specifying thettl
parameter inDatabase.query()
. The default is 52 weeks.
Examples
We start by importing the necessary modules:
>>> from pathlib import Path ... >>> import patito as pt >>> import pyarrow as pa
In order to construct a
Database
, we need to provide the constructor with a function that can execute query strings. How to construct this function will depend on what you actually want to run your queries against, for example a local or remote database. For the purposes of demonstration we will use SQLite since it is built into Python’s standard library, but you can use anything; for example Snowflake or PostgresQL.We will use Python’s standard library documentation to create an in-memory SQLite database. It will contain a single table named
movies
containing some dummy data. The details do not really matter here, the only important part is that we construct a database which we can run SQL queries against.>>> import sqlite3 ... >>> def dummy_database() -> sqlite3.Cursor: ... connection = sqlite3.connect(":memory:") ... cursor = connection.cursor() ... cursor.execute("CREATE TABLE movies(title, year, score)") ... data = [ ... ("Monty Python Live at the Hollywood Bowl", 1982, 7.9), ... ("Monty Python's The Meaning of Life", 1983, 7.5), ... ("Monty Python's Life of Brian", 1979, 8.0), ... ] ... cursor.executemany("INSERT INTO movies VALUES(?, ?, ?)", data) ... connection.commit() ... return cursor
Using this dummy database, we are now able to construct a function which accepts SQL queries as its first parameter, executes the query, and returns the query result in the form of an Arrow table.
>>> def query_handler(query: str) -> pa.Table: ... cursor = dummy_database() ... cursor.execute(query) ... columns = [description[0] for description in cursor.description] ... data = [dict(zip(columns, row)) for row in cursor.fetchall()] ... return pa.Table.from_pylist(data)
We can now construct a
Database
object, providingquery_handler
as the way to execute SQL queries.>>> db = pt.Database(query_handler=query_handler)
The resulting object can now be used to execute SQL queries against the database and return the result in the form of a polars
DataFrame
object.>>> db.query("select * from movies order by year limit 1") shape: (1, 3) ┌──────────────────────────────┬──────┬───────┐ │ title ┆ year ┆ score │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞══════════════════════════════╪══════╪═══════╡ │ Monty Python's Life of Brian ┆ 1979 ┆ 8.0 │ └──────────────────────────────┴──────┴───────┘
But the main way to use a
Database
object is to use the@Database.as_query
decarator to wrap functions which return SQL query strings.>>> @db.as_query() >>> def movies(newer_than_year: int): ... return f"select * from movies where year > {newer_than_year}"
This decorator will convert the function from producing query strings, to actually executing the given query and return the query result in the form of a polars
DataFrame
object.>>> movies(newer_than_year=1980) shape: (2, 3) ┌───────────────────────────────────┬──────┬───────┐ │ title ┆ year ┆ score │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═══════════════════════════════════╪══════╪═══════╡ │ Monty Python Live at the Hollywo… ┆ 1982 ┆ 7.9 │ │ Monty Python's The Meaning of Li… ┆ 1983 ┆ 7.5 │ └───────────────────────────────────┴──────┴───────┘
Caching is not enabled by default, but it can be enabled by specifying
cache=True
to the@db.as_query(...)
decorator. Other arguments are also accepted, such aslazy=True
if you want to retrieve the results in the form of aLazyFrame
instead of aDataFrame
,ttl
if you want to specify another TTL, and any additional keyword arguments are forwarded toquery_executor
when the SQL query is executed. You can read more about these parameters in the documentation ofDatabase.query()
.