patito.DataFrame.validate

DataFrame.validate()

Validate the schema and content of the dataframe.

You must invoke .set_model() before invoking .validate() in order to specify how the dataframe should be validated.

Returns:

The original dataframe, if correctly validated.

Return type:

DataFrame[Model]

Raises:
  • TypeError – If DataFrame.set_model() has not been invoked prior to validation. Note that patito.Model.DataFrame automatically invokes DataFrame.set_model() for you.

  • patito.exceptions.ValidationError – If the dataframe does not match the specified schema.

Examples

>>> import patito as pt
>>> class Product(pt.Model):
...     product_id: int = pt.Field(unique=True)
...     temperature_zone: Literal["dry", "cold", "frozen"]
...     is_for_sale: bool
...
>>> df = pt.DataFrame(
...     {
...         "product_id": [1, 1, 3],
...         "temperature_zone": ["dry", "dry", "oven"],
...     }
... ).set_model(Product)
>>> try:
...     df.validate()
... except pt.ValidationError as exc:
...     print(exc)
...
3 validation errors for Product
is_for_sale
  Missing column (type=type_error.missingcolumns)
product_id
  2 rows with duplicated values. (type=value_error.rowvalue)
temperature_zone
  Rows with invalid values: {'oven'}. (type=value_error.rowvalue)