A Guide to Pydantic: Data Validation in Python

An introduction to Pydantic, the powerful data validation and settings management library for Python. Learn how to use Python type hints to define data models that automatically parse and validate your data.

Working with external data is a common task in programming. Whether you're processing incoming JSON from an API request or loading a configuration file, you need a reliable way to validate that the data has the correct structure and types.

In Python, the go-to library for this is Pydantic. Pydantic is a data validation and settings management library that uses Python type hints to define data models. It's fast, easy to use, and has become a cornerstone of the modern Python web development stack, most famously as the data validation layer for FastAPI.

The Core Idea: Data Models

With Pydantic, you define the "shape" of your data as a class that inherits from BaseModel. You use standard Python type hints to declare the fields and their types.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    email: str
    is_active: bool = True # A field with a default value

This User model declares that a valid user object must have:

  • An id that is an integer.
  • A name that is a string.
  • An email that is a string.
  • An optional is_active flag that defaults to True.

Parsing and Validation

Now, you can use this model to parse and validate raw data, for example, from a dictionary.

# Raw data from an API request
raw_data = {
    'id': 123,
    'name': 'Alice',
    'email': 'alice@example.com'
}

# Create an instance of the model
user = User(**raw_data)

print(user.id)       # 123
print(user.is_active) # True (from the default value)

# You can also get a dictionary back from the model
print(user.dict()) # {'id': 123, 'name': 'Alice', ...}

Pydantic does more than just check the types; it also performs data conversion where possible. For example, if the id in the raw data was the string '123', Pydantic would automatically convert it to the integer 123 to match the type hint.

If the data is invalid or missing a required field, Pydantic will raise a helpful ValidationError that clearly explains what went wrong.

invalid_data = {
    'id': 'not-an-integer',
    'name': 'Bob'
    # Missing the required 'email' field
}

try:
    User(**invalid_data)
except ValidationError as e:
    print(e.json())

This would produce a detailed JSON error message indicating that id is not a valid integer and that the email field is required.

More Advanced Validation

Pydantic provides a rich set of tools for more complex validation scenarios.

  • Constrained Types: You can use types like constr (constrained string) or conint (constrained integer) to add rules like minimum/maximum length or value.

    from pydantic import constr
    
    class Post(BaseModel):
        title: constr(min_length=3, max_length=50)
        content: str
    
  • Custom Validators: You can create your own custom validation logic using the @validator decorator.

    from pydantic import validator
    
    class User(BaseModel):
        password: str
        confirm_password: str
    
        @validator('confirm_password')
        def passwords_match(cls, v, values, **kwargs):
            if 'password' in values and v != values['password']:
                raise ValueError('passwords do not match')
            return v
    
  • Email and URL Validation: Pydantic has built-in types for common formats like emails and URLs, which will automatically validate that the string conforms to the correct format.

Settings Management

Pydantic can also be used to manage your application's settings. You can create a BaseSettings model that automatically reads configuration from environment variables.

from pydantic import BaseSettings

class Settings(BaseSettings):
    database_url: str
    secret_key: str

    class Config:
        env_file = '.env'

settings = Settings()
print(settings.database_url)

This will automatically load the DATABASE_URL and SECRET_KEY from a .env file or from the system's environment variables.

Conclusion

Pydantic is a powerful and elegant library that solves the common and critical problem of data validation in Python. By leveraging the power of type hints, it allows you to define clear, explicit data models that are self-documenting and easy to work with. Whether you are building a web API, processing complex data, or managing application settings, Pydantic is an essential tool for writing robust and reliable Python code.