patito.Database
- class patito.Database(query_handler, cache_directory=None, default_ttl=datetime.timedelta(days=364))
Construct manager for executing SQL queries and caching the results.
- Parameters:
query_handler (
Callable[...,Table]) – The function that the Database object should use for executing SQL queries. Its first argument should be the SQL query string to execute, and it should return the query result as an arrow table, for instance pyarrow.Table.cache_directory (
Optional[Path]) – Path to the directory where caches should be stored as parquet files. If not provided, the XDG Base Directory Specification will be used to determine the suitable cache directory, by default~/.cache/patitoor${XDG_CACHE_HOME}/patito.default_ttl (
timedelta) – The default Time To Live (TTL), or with other words, how long to wait until caches are refreshed due to old age. The given default TTL can be overwritten by specifying thettlparameter inDatabase.query(). The default is 52 weeks.
Examples
We start by importing the necessary modules:
>>> from pathlib import Path ... >>> import patito as pt >>> import pyarrow as pa
In order to construct a
Database, we need to provide the constructor with a function that can execute query strings. How to construct this function will depend on what you actually want to run your queries against, for example a local or remote database. For the purposes of demonstration we will use SQLite since it is built into Python’s standard library, but you can use anything; for example Snowflake or PostgresQL.We will use Python’s standard library documentation to create an in-memory SQLite database. It will contain a single table named
moviescontaining some dummy data. The details do not really matter here, the only important part is that we construct a database which we can run SQL queries against.>>> import sqlite3 ... >>> def dummy_database() -> sqlite3.Cursor: ... connection = sqlite3.connect(":memory:") ... cursor = connection.cursor() ... cursor.execute("CREATE TABLE movies(title, year, score)") ... data = [ ... ("Monty Python Live at the Hollywood Bowl", 1982, 7.9), ... ("Monty Python's The Meaning of Life", 1983, 7.5), ... ("Monty Python's Life of Brian", 1979, 8.0), ... ] ... cursor.executemany("INSERT INTO movies VALUES(?, ?, ?)", data) ... connection.commit() ... return cursor
Using this dummy database, we are now able to construct a function which accepts SQL queries as its first parameter, executes the query, and returns the query result in the form of an Arrow table.
>>> def query_handler(query: str) -> pa.Table: ... cursor = dummy_database() ... cursor.execute(query) ... columns = [description[0] for description in cursor.description] ... data = [dict(zip(columns, row)) for row in cursor.fetchall()] ... return pa.Table.from_pylist(data)
We can now construct a
Databaseobject, providingquery_handleras the way to execute SQL queries.>>> db = pt.Database(query_handler=query_handler)
The resulting object can now be used to execute SQL queries against the database and return the result in the form of a polars
DataFrameobject.>>> db.query("select * from movies order by year limit 1") shape: (1, 3) ┌──────────────────────────────┬──────┬───────┐ │ title ┆ year ┆ score │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞══════════════════════════════╪══════╪═══════╡ │ Monty Python's Life of Brian ┆ 1979 ┆ 8.0 │ └──────────────────────────────┴──────┴───────┘
But the main way to use a
Databaseobject is to use the@Database.as_querydecarator to wrap functions which return SQL query strings.>>> @db.as_query() >>> def movies(newer_than_year: int): ... return f"select * from movies where year > {newer_than_year}"
This decorator will convert the function from producing query strings, to actually executing the given query and return the query result in the form of a polars
DataFrameobject.>>> movies(newer_than_year=1980) shape: (2, 3) ┌───────────────────────────────────┬──────┬───────┐ │ title ┆ year ┆ score │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ f64 │ ╞═══════════════════════════════════╪══════╪═══════╡ │ Monty Python Live at the Hollywo… ┆ 1982 ┆ 7.9 │ │ Monty Python's The Meaning of Li… ┆ 1983 ┆ 7.5 │ └───────────────────────────────────┴──────┴───────┘
Caching is not enabled by default, but it can be enabled by specifying
cache=Trueto the@db.as_query(...)decorator. Other arguments are also accepted, such aslazy=Trueif you want to retrieve the results in the form of aLazyFrameinstead of aDataFrame,ttlif you want to specify another TTL, and any additional keyword arguments are forwarded toquery_executorwhen the SQL query is executed. You can read more about these parameters in the documentation ofDatabase.query().