patito.Database

class patito.Database(query_handler, cache_directory=None, default_ttl=datetime.timedelta(days=364))

Construct manager for executing SQL queries and caching the results.

Parameters:
  • query_handler (Callable[..., Table]) – The function that the Database object should use for executing SQL queries. Its first argument should be the SQL query string to execute, and it should return the query result as an arrow table, for instance pyarrow.Table.

  • cache_directory (Optional[Path]) – Path to the directory where caches should be stored as parquet files. If not provided, the XDG Base Directory Specification will be used to determine the suitable cache directory, by default ~/.cache/patito or ${XDG_CACHE_HOME}/patito.

  • default_ttl (timedelta) – The default Time To Live (TTL), or with other words, how long to wait until caches are refreshed due to old age. The given default TTL can be overwritten by specifying the ttl parameter in Database.query(). The default is 52 weeks.

Examples

We start by importing the necessary modules:

>>> from pathlib import Path
...
>>> import patito as pt
>>> import pyarrow as pa

In order to construct a Database, we need to provide the constructor with a function that can execute query strings. How to construct this function will depend on what you actually want to run your queries against, for example a local or remote database. For the purposes of demonstration we will use SQLite since it is built into Python’s standard library, but you can use anything; for example Snowflake or PostgresQL.

We will use Python’s standard library documentation to create an in-memory SQLite database. It will contain a single table named movies containing some dummy data. The details do not really matter here, the only important part is that we construct a database which we can run SQL queries against.

>>> import sqlite3
...
>>> def dummy_database() -> sqlite3.Cursor:
...     connection = sqlite3.connect(":memory:")
...     cursor = connection.cursor()
...     cursor.execute("CREATE TABLE movies(title, year, score)")
...     data = [
...         ("Monty Python Live at the Hollywood Bowl", 1982, 7.9),
...         ("Monty Python's The Meaning of Life", 1983, 7.5),
...         ("Monty Python's Life of Brian", 1979, 8.0),
...     ]
...     cursor.executemany("INSERT INTO movies VALUES(?, ?, ?)", data)
...     connection.commit()
...     return cursor

Using this dummy database, we are now able to construct a function which accepts SQL queries as its first parameter, executes the query, and returns the query result in the form of an Arrow table.

>>> def query_handler(query: str) -> pa.Table:
...     cursor = dummy_database()
...     cursor.execute(query)
...     columns = [description[0] for description in cursor.description]
...     data = [dict(zip(columns, row)) for row in cursor.fetchall()]
...     return pa.Table.from_pylist(data)

We can now construct a Database object, providing query_handler as the way to execute SQL queries.

>>> db = pt.Database(query_handler=query_handler)

The resulting object can now be used to execute SQL queries against the database and return the result in the form of a polars DataFrame object.

>>> db.query("select * from movies order by year limit 1")
shape: (1, 3)
┌──────────────────────────────┬──────┬───────┐
│ title                        ┆ year ┆ score │
│ ---                          ┆ ---  ┆ ---   │
│ str                          ┆ i64  ┆ f64   │
╞══════════════════════════════╪══════╪═══════╡
│ Monty Python's Life of Brian ┆ 1979 ┆ 8.0   │
└──────────────────────────────┴──────┴───────┘

But the main way to use a Database object is to use the @Database.as_query decarator to wrap functions which return SQL query strings.

>>> @db.as_query()
>>> def movies(newer_than_year: int):
...     return f"select * from movies where year > {newer_than_year}"

This decorator will convert the function from producing query strings, to actually executing the given query and return the query result in the form of a polars DataFrame object.

>>> movies(newer_than_year=1980)
shape: (2, 3)
┌───────────────────────────────────┬──────┬───────┐
│ title                             ┆ year ┆ score │
│ ---                               ┆ ---  ┆ ---   │
│ str                               ┆ i64  ┆ f64   │
╞═══════════════════════════════════╪══════╪═══════╡
│ Monty Python Live at the Hollywo… ┆ 1982 ┆ 7.9   │
│ Monty Python's The Meaning of Li… ┆ 1983 ┆ 7.5   │
└───────────────────────────────────┴──────┴───────┘

Caching is not enabled by default, but it can be enabled by specifying cache=True to the @db.as_query(...) decorator. Other arguments are also accepted, such as lazy=True if you want to retrieve the results in the form of a LazyFrame instead of a DataFrame, ttl if you want to specify another TTL, and any additional keyword arguments are forwarded to query_executor when the SQL query is executed. You can read more about these parameters in the documentation of Database.query().

Methods