Data Validation with Pydantic

Photo of Karol Szuster

Karol Szuster

Nov 1, 2024 • 17 min read
Data Validation with Pydantic 1200x630

Unforeseen actions or incorrect data types such as software scenarios that haven’t been considered, unusual user behavior, incorrect data, or database communication errors are some of the most common issues developers face.

To ensure the best and the most fail-safe functionality, it’s good to have control over system processes, the data, and their types, and ensure incorrect data doesn’t disrupt operations. One way is by validating variable software types, which is where Pydantic comes into the equation.

Pydantic is a Python tool that’s primarily a parsing library, as opposed to a validation library. According to documentation, it “guarantees the types and constraints of the output model, not the input data. Although validation isn’t the main purpose of Pydantic, you can use this library for custom validation, including field validation through decorators like 'field_validator' to customize and validate individual fields in data schemas.”

Introduction to Pydantic

Pydantic is a powerful Python library that provides robust data validation and settings management features. It is widely used in the Python community and is a popular choice for building data-driven applications. Pydantic’s primary way of defining data schemas is through models, which are objects that define and store data about an entity with annotated fields. The BaseModel class in Pydantic is the cornerstone for creating these models. By leveraging Python type hints, Pydantic ensures that the data conforms to the specified types, making data validation straightforward and efficient.

Understanding Data Validation

Data validation is the process of ensuring that the data entered into a system is accurate, complete, and consistent. It is an essential step in maintaining data quality and preventing errors. Pydantic provides a robust data validation system that can be used to validate data against a set of rules or constraints. By utilizing Python type hints, Pydantic makes it easy to define the structure and type of data expected or required. This approach not only enhances code readability but also ensures that the data adheres to the defined schema, thereby reducing the likelihood of errors.

What is Pydantic?

Pydantic is a Python package that provides you with two main functionalities:

  • data validation

  • settings management

We’ll discuss the data validation functionality in the article below.

The Pydantic doc states: “Pydantic enforces type hints at runtime, and provides user-friendly errors when data is invalid.”

Class methods in Pydantic can be used to add custom validation logic, such as creating validators to ensure model fields meet specific criteria.

So, in layman’s terms, Pydantic is a set of tools for controlling the format and type of input and output data. It uses Python type hints, so there’s no need to learn a domain-specific language . You can also validate fields during class creation to ensure the specified fields exist on the model.

Why should you use Pydantic?

Pydantic is handy for two main reasons. Firstly, you gain readability of the code. When someone’s working on code (or is coming back to it after a long break), Pydantic enables you to clearly see the structure and type of data expected or required.

Additionally, Pydantic allows the implementation of a default value when validation fails, ensuring that your program can handle unexpected data gracefully.

Secondly, data passed to functions is validated, saving you from undesirable actions caused by wrong data types. Sometimes, you can’t be sure what kind of data is passed to your program, so it’s better to protect yourself.

Sounds like a Python dataclass?

Yes and no. Pydantic is similar because it helps you determine the type of data processed. With both dataclass and Pydantic, you define the type of expected data with type hints, and it looks like this:

from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class Bird:
   name: str
   wingspan: int
  
class PydanticBird(BaseModel):
   name: str
   wingspan: int

>>> bird_data = {"name": "Alcedo", "wingspan": 25}
>>> alcedo = Bird(**bird_data)
>>> pydantic_alcedo = PydanticBird(**bird_data)
>>> alcedo
Bird(name='Alcedo', wingspan=25)
>>> pydantic_alcedo
PydanticBird(name='Alcedo' wingspan=25)

Both seem to work the same. But, what if you pass the wrong data type?


>>> bird_data = {"name": "Alcedo", "wingspan": "blue"}
>>> alcedo = Bird(**bird_data)
>>> pydantic_alcedo = PydanticBird(**bird_data)


Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File ""<input>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
wingspan
  value is not a valid integer (type=type_error.integer)

As you can see in the traceback, Pydantic doesn’t allow you to create class instances with the wrong datatype, but a dataclass will. As such, Pydantic is a useful tool for preventing software from undesirable behavior when validation fails.

Be careful: validation or parsing of input data?

If you go deeper into the topic, you may come across information saying Pydantic doesn't really validate, but parses. What's the difference? In the previous example, Pydantic worked excellently! By the end of this article, let's code with Pydantic! And remember, the difference between validation and parsing is crucial.

Let's look at the next example.

>>> bird_data = {"name": "Alcedo", "wingspan": 25.4}
>>> pydantic_alcedo = PydanticBird(**bird_data)
>>> pydantic_alcedo
PydanticBird(name='Alcedo', wingspan=25)

Didn't we want to validate the "wingspan" variable so that it always contains an integer? Looking at the output, everything's correct and we have an integer. Pydantic parsed a float to int – when Pydantic gets data, it tries to parse the data to the specified type.

But what if you want to avoid such parsing situations and for Pydantic to pass only integers? Pydantic offers Strict Types, such as:

  • StrictStr

  • StrictBytes

  • StrictInt

  • StrictFloat

  • StrictBool

from pydantic import BaseModel, StrictInt

class PydanticBird(BaseModel):
    name: str
    wingspan: StrictInt
>>> bird_data = {"name": "Alcedo", "wingspan": 25.4}
>>> pydantic_alcedo = PydanticBird(**bird_data)

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.
10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
wingspan
  value is not a valid integer (type=type_error.integer)

Now, Pydantic expects exactly an integer type. You should therefore think carefully about what data is desired, because if you specify StrictFloat, and at some point, the software converts a floating point number to an integer (for example, 3.0 to 3), an error is thrown (meaning the validation works).

In addition to Python types, thanks to Pydantic you can also validate a variety of other useful data types such as:

  • IP addresses
  • Email addresses
  • Path to file
  • Path to directory
  • Color
  • JSON
  • URL
  • UUID
  • Payment card number (and more)

Function arguments and class attributes field validation

Handy and elegant describes the functionality of validating arguments passed to functions. All you have to do is set up Python type hints and a decorator for a function imported from the Pydantic library.

from pydantic import StrictFloat, validate_arguments

@validate_arguments
def check_if_alcedo_has_regular_wingspan(wingspan: StrictFloat):
    if 23 < wingspan < 27:
        return "Regular Alcedo"
>>> check_if_alcedo_has_regular_wingspan(25.0)
'Regular Alcedo'
>>> check_if_alcedo_has_regular_wingspan(25)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
  File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
  File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CheckIfAlcedoHasRegularWingspan
wingspan
  value is not a valid float (type=type_error.float)

The values argument serves as a dictionary containing validated field values and defaults, and it is crucial for accessing and utilizing values from other fields during validation processes.

In addition to validating the type of variables that are passed to a function, you can also set rules for a variable, such as number ranges, the maximum and minimum number of objects in a list, the length of strings, etc. It’s necessary to use the Field function, and you can combine it with the built-in python Annotated function from the typing library.

from typing import Annotated
from pydantic import Field, validate_arguments, StrictFloat

@validate_arguments
def get_only_regular_alcedo(wingspan: Annotated[float, Field(gt=23, le=27)]):
    return "Regular Alcedo"
>>> get_only_regular_alcedo(21)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
  File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
  File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GetOnlyRegularAlcedo
wingspan

The Field function can also be used in the class when defining variables in the following way:

class PydanticBird(BaseModel):
    name: str
    wingspan: Annotated[float, Field(gt=0, lt=35)]

Or this way:

class PydanticBird(BaseModel):
    name: str
    wingspan: float = Field(gt=0, lt=35)

You may catch errors with the ValidationError imported from the Pydantic module and the error messages are in a more friendly format.

>>> from pydantic import ValidationError
>>> try: get_only_regular_alcedo(21)
... except ValidationError as error:
... print(error)
    
1 validation error for GetOnlyRegularAlcedo
wingspan
  ensure this value is greater than 23 (type=value_error.number.not_gt; limit_value=23)

Pydantic also gives you the ability to create your own validators, allowing you to adapt the tool to the needs of each developer.

class PydanticBird(BaseModel):
    name: str
    wingspan: float = Field(gt=0, lt=35)

    @validator("name")
    def name_cannot_contain_non_alphabetic_characters(cls, name: str):
        if not name.isalpha():
            raise ValueError("cannot contain non alphabetic characters")
        return name.title()
>>> bird = PydanticBird(name="Alcedo 5", wingspan=20)
...
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
name
  cannot contain non alphabetic characters (type=value_error)

Sometimes, it may be the case that it’s not necessary to validate data, because it’s already been validated or comes from a trusted source. In this kind of situation, you can use the built-in construct() method to create objects. What’s the benefit? Pydantic documentation states: “It’s generally around 30 times faster than creating a model with full validation.”

>>> pydantic_alcedo = PydanticBird.construct(**bird_data)

Handling JSON

Another method that can be called using a Pydantic model is JSON(). Of course, there’s no problem getting JSON from the traditional class object, but it’s handier with Pydantic, as per the following code:

>>> import json
>>> json.dumps(alcedo.__dict__)
'{"name": "Alcedo", "wingspan": 25}'
>>> pydantic_alcedo.json()
'{"name": "Alcedo", "wingspan": 25}'

If you’d like a more detailed description of the Pydantic object, you can call on the following object schema() or schema_json() method:

>>> PydanticBird.schema()
{
   "title":"PydanticBird",
   "type":"object",
   "properties":{
      "name":{
         "title":"Name",
         "type":"string"
      },
      "wingspan":{
         "title":"Wingspan",
         "exclusiveMinimum":0,
         "exclusiveMaximum":35,
         "type":"number"
      }
   },
   "required":[
      "name",
      "wingspan"
   ]
}

It’s worth mentioning that Pydantic offers features that help handle objects, such as methods to parse objects from a JSONstring, dict (dictionary), or file.

  • parse_obj throws an error when an argument isn’t a dict type
>>> bird_data = {"name": "Alcedo", "wingspan": 25}
>>> PydanticBird.parse_obj(bird_data)
PydanticBird(name='Alcedo', wingspan=25.0)
  • parse_raw method as an argument takes string or bytes and parses it as JSON
>>> bird_data = '{"name": "Alcedo", "wingspan": 25}'
>>> PydanticBird.parse_raw(bird_data)
PydanticBird(name='Alcedo', wingspan=25.0)

During software development, you may need to create an object that you want to remain unchanged. It's possible to create that object with the allow_mutation flag. After its creation, any editing attempts should fail. With a dataclass, you can set the keyword argument to True to receive an immutable object.

@dataclass(frozen=True)
class Bird:
    name: str
    wingspan: int

To achieve the same result with Pydantic, you have to set the allow_mutation flag to False in the Config class inside the proper class.

class PydanticBird(BaseModel):
    name: str
    wingspan: float

    class Config:
        allow_mutation = False

Recursive models with custom validation

Recursive models are also a useful mechanism, because when creating a more complex model that contains other models in itself, the structure is created based on the passed data type. Pydantic recognizes data types and creates objects based on them. Then you can refer to the model’s attribute, instead of dictionaries such as creating models via dataclasses or in the most traditional way.

class Bird(BaseModel):
    name: str
    wingspan: float = None

class Props(BaseModel):
    one_species = True
    migrating = True

class Flock(BaseModel):
    properties: Props
    birds: List[Bird]
>>> flock = Flock(properties={'migrating': False}, birds=[{'name': 'Alcedo_1'}, {'name': 'Alcedo_2'}])
>>> flock.properties
Props(one_species=True, migrating=False)
>>> flock.properties.migrating
False
>>> flock.birds
[Bird(name='Alcedo_1', wingspan=None), Bird(name='Alcedo_2', wingspan=None)]

In nested models, you can access the 'field value' within the @field_validator in Pydantic V2 by using ValidationInfo.data to reference values from other fields.

Custom Validation with Pydantic

Pydantic provides several ways to perform custom validation, including field validators, annotated validators, and root validators. Field validators are used to validate individual fields, ensuring that each field meets specific criteria. Annotated validators allow you to apply validators to fields or models, providing a flexible way to enforce validation rules. Root validators, on the other hand, are used to validate the entire model’s data, ensuring that the combined data meets the required conditions. Pydantic also allows you to reuse validators across multiple fields or models, making it easy to define and maintain custom validation logic. This flexibility ensures that you can tailor the validation process to meet the specific needs of your application.

Best Practices for Data Validation

When using Pydantic for data validation, there are several best practices to keep in mind. First, it’s essential to define clear and concise validation rules that accurately reflect the requirements of your application. This helps in maintaining data integrity and consistency. Second, it’s crucial to test your validation rules thoroughly to ensure that they are working as expected. Comprehensive testing helps in identifying and fixing issues early in the development process. Third, it’s a good idea to use Pydantic’s built-in validation features, such as field validators and annotated validators, to simplify your validation logic. These features provide a robust framework for implementing validation rules. Finally, it’s essential to handle validation errors properly by providing informative error messages and logging errors for further analysis. This approach helps in diagnosing and resolving issues efficiently.

Troubleshooting Validation Issues

When troubleshooting validation issues with Pydantic, there are several steps you can take. First, check the validation rules to ensure that they are correctly defined and accurately reflect the requirements of your application. Misconfigured rules can lead to unexpected validation failures. Second, check the input data to ensure that it is correctly formatted and meets the validation rules. Incorrectly formatted data is a common cause of validation errors. Third, use Pydantic’s built-in debugging features, such as the validation_error attribute, to get more information about the validation error. This attribute provides detailed information about why the validation failed, helping you pinpoint the issue. Finally, consult the Pydantic documentation and community resources for further guidance and support. The Pydantic community is active and can provide valuable insights and solutions to common validation issues.

ORM mode

If you're working with databases, you probably know what ORM is: object-relational mapping. With a Pydantic class, you can set the ORM mode, informing the Pydantic model that in addition to the dictionary, it could also be an ORM model. With this config, you'll receive all data related to this model. When the ORM mode is set to False, it won't include the relationship data, even if those relationships are declared in your Pydantic models.

Proof of its usefulness is that FastAPI, a modern web framework, is based on Pydantic, and it's common to use this ORM mode with FastAPI applications.

To sum up, what do you gain using Pydantic?

  • Better data validation and thus greater control over how your software works

  • Tool with a high level of customization, making it extremely versatile

  • Tool based on the Python syntax, so no need to learn a new programming language

  • Bunch of handy methods to handle objects

  • Serialization/deserialization Pydantic models to JSON

  • Settings management

Photo of Karol Szuster

More posts by this author

Karol Szuster

Python Developer at Netguru
Build impactful web solutions  Engage users and drive growth Start today

Read more on our Blog

Check out the knowledge base collected and distilled by experienced professionals.

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business