Python Data Modeling with Dataclasses and Pydantic

The Silent Bug

Imagine you’re processing customer records. The pipeline runs without errors, but customers never receive their welcome emails. After digging through the code, you discover the issue is a simple typo in a dictionary key.

---
config:
  theme: dark
  layout: dagre
  look: neo
---
flowchart LR
    A["Write data
'emial': ..."] --> B["Store
dict saves anything"] --> C["Read data
.get('email')"] --> D["Result
None, no error!"]

Press Run below to see it in action.

Output

Loading Python…

💡 Tip

The output looks like the customer has no email on file, but we passed "alice@example.com". The data is there, just stored under "emial".

.get("email") finds no match and returns None instead of raising an error.

This happens because dictionaries don’t know what keys they should have. Without a schema, Python treats "emial" and "email" as equally valid. The same goes for missing fields, extra fields, and wrong types.

What Are Typed Data Containers?

Python offers several ways to avoid this bug, each adding more safety than the last:

	Safety	Flexibility	Dependencies	Mutability
dict	None	Any key, any value	Built-in	Mutable
NamedTuple	Basic	Fixed fields	Built-in	Immutable
dataclass	Moderate	Fixed fields, defaults	Built-in	Mutable
Pydantic	Full	Fixed fields, validators	pip install	Mutable

Notice the pattern: each row gains something the previous one lacks:

dict → NamedTuple: Gain fixed fields, lose flexibility.
NamedTuple → dataclass: Gain mutability and defaults.
dataclass → Pydantic: Gain type validation, add a dependency.

In this course, you’ll try each tool yourself and see how it catches the mistakes that dictionaries miss.

Creating a Dictionary

A dictionary maps string keys to values. It’s the most common way to represent a record in Python, but it has no fixed structure. You can add, remove, or misspell any key at any time.

Creating one takes a single pair of curly braces:

💡 Tip

The output prints Alice Smith by looking up the "name" key in the dictionary.

Silent Failures

A typo in the key name causes a KeyError at runtime:

The error tells you what went wrong but not where. When dictionaries pass through multiple functions, finding the source of a typo can take significant debugging time:

Output

Loading Python…

💡 What the output shows

The error is raised in send_email(), but the actual bug (the typo "emial") was introduced in load_customer(). The bug and its symptom are in different functions.

Using .get() makes it worse by returning None silently:

Output

Loading Python…

Quiz

What does {"name": "Alice"}.get("email") return?

Type Confusion

Missing keys aren’t the only risk. Without a schema, dictionaries also accept the wrong type for any field.

Let’s see what happens when age is stored as a string instead of an integer:

💡 What the output shows

"28" * 2 produces "2828" instead of 56. Since "28" is a string, Python repeats it twice instead of doubling the number. The code runs fine, but the result is silently wrong.

Creating a NamedTuple

NamedTuple is a lightweight way to define a fixed structure with named fields and type hints, like a dictionary with a schema.

Instead of string keys, you declare a NamedTuple class with fixed fields. Every object created from it must provide values for those exact fields:

💡 What the output shows

Printing the object displays all five fields by name and value in the order they were defined.

Once created, you can access fields with dot notation instead of string keys like customer["name"]. This allows your IDE to autocomplete the field names and catch typos immediately:

Quiz

What happens if you create a Customer without providing the email field?

Catching Typos at Runtime

In the dictionary pipeline, load_customer returned {"emial": row[2]} and the typo traveled through validate_customer before crashing in send_email. With NamedTuple, the same typo fails at the source:

Output

Loading Python…

💡 What the output shows

The error is raised inside load_customer, exactly where the typo was made, so you spend less time tracing through functions to find the bug.

Quiz

A NamedTuple Customer has fields customer_id, name, email, age, is_premium. You write Customer(customer_id="C001", nme="Alice", email="a@b.com", age=28, is_premium=True). When does the error appear?

Exercise: Fix a Buggy Pipeline

Scenario

The load_customer function from the dictionary section had a typo ("emial") that traveled silently through the pipeline. Your team wants to prevent this class of bug entirely.

Task

Rewrite this dict-based pipeline to use a Customer NamedTuple so the typo is caught at creation. Fix the typo so the pipeline works.

Output

Ready

Immutability Prevents Accidental Changes

Dictionaries let you change any value at any time, which means fields can be overwritten by accident. NamedTuples are immutable, so once created, their values cannot be changed:

Output

Loading Python…

💡 What the output shows

Assigning "Bob" to customer.name raises an AttributeError. Once a NamedTuple is created, its values are fixed.

Quiz

Why is immutability useful when passing a Customer object through multiple functions?

Default Values

NamedTuple supports default values for simple types like bool and str:

💡 What the output shows

Customer("Alice") uses the default False for is_premium, while Customer("Bob", is_premium=True) overrides it. You only need to pass values that differ from the defaults.

However, mutable defaults like lists are shared across all instances, which can cause unexpected behavior:

💡 What the output shows

Both Alice and Bob show ["premium"]. This happens because Python creates the default [] once when it reads the class, then hands that same list to every instance. There’s only one list in memory, so c1.tags and c2.tags are the same object.

This diagram shows how the single default list is shared before and after the append:

---
config:
  theme: dark
  layout: dagre
  look: neo
---
flowchart TD
    subgraph After c1.tags.append
        c1b[c1.tags] --> list2["['premium']"]
        c2b[c2.tags] --> list2
    end

    subgraph Before append
        c1a[c1.tags] --> list1["[ ]"]
        c2a[c2.tags] --> list1
    end

Quiz

NamedTuple is immutable, yet c1.tags.append("premium") works without error. Why?

Limitations: No Runtime Type Validation

Type hints in NamedTuple are not enforced at runtime. You can still pass in wrong types:

Output

Loading Python…

💡 What the output shows

Python accepts name=123 and age="old" without complaint. NamedTuple type hints are for documentation and static analysis only. They are not enforced at runtime.

Quiz

What is the purpose of type hints like age: int in a NamedTuple if they are not enforced?

Exercise: Fix a Type Bug

Scenario

A sensor monitoring system adjusts temperature readings by a calibration factor of 2. A faulty sensor sends its reading as a string. The code runs without error, but one sensor’s adjusted value is wrong.

Task

Fix the readings list so that all adjusted temperatures are calculated correctly.

Output

Ready

Creating a dataclass

A dataclass is a class decorator that automatically generates __init__, __repr__, and other methods from field definitions. It provides the same fixed fields and IDE support as NamedTuple, plus:

Mutable fields: Change values after creation, unlike NamedTuple
Default values: Fields can have defaults, including empty lists and dicts
Post-init logic: Run custom code right after an object is created

Creating a dataclass looks similar to NamedTuple, but you use the @dataclass decorator instead of inheriting:

💡 What the output shows

The output matches the NamedTuple format. Both give you named fields and readable printing. Where they differ is mutability and default handling, which the next sections cover.

Quiz

What happens if you try to create Customer(customer_id="C001", nmae="Alice", email="a@b.com", age=28)?

Exercise: Build a Product Record

Scenario

An inventory system receives product data as separate variables from a database query. You need to structure each product as a dataclass for type-safe access throughout the codebase.

Task

Define a Product dataclass with fields sku (str), name (str), price (float), and in_stock (bool). Create a product and print its formatted summary.

Output

Ready

Mutability Allows Updates

Dataclass trades NamedTuple’s immutability protection for flexibility. You can modify fields after creation:

Output

Loading Python…

💡 What the output shows

Unlike NamedTuple, dataclass allows field modification. This is useful for objects that need to change over time, like a customer upgrading their account.

To prevent accidentally adding new attributes, you can use @dataclass(slots=True), which creates a fixed set of attributes that cannot be changed:

Output

Loading Python…

💡 What the output shows

Without slots=True, the dataclass would silently create a new attribute nmae on the object. With slots, it raises an error immediately, catching the typo.

Quiz

What does @dataclass(slots=True) prevent?

Mutable Defaults with default_factory

Remember the shared list problem from NamedTuple? Dataclass prevents this by rejecting mutable defaults entirely:

💡 What the output shows

Dataclass raises a ValueError instead of silently sharing the list. It forces you to use field(default_factory=...), which creates a new list for each instance.

Dataclass offers field(default_factory=...) as the solution. The factory function runs at instance creation, so each object gets its own list:

💡 What the output shows

Unlike the NamedTuple example, Order 2 stays empty because default_factory creates a fresh list for each instance. This is the safe way to use mutable defaults.

To see why this works, compare what happens at creation versus after appending:

---
config:
  theme: dark
  layout: dagre
  look: neo
---
flowchart TD
    subgraph After order1.items.append
        o1b[order1.items] --> list1b["['apple']"]
        o2b[order2.items] --> list2b["[ ]"]
    end

    subgraph At creation
        o1a[order1.items] --> list1a["[ ]"]
        o2a[order2.items] --> list2a["[ ]"]
    end

Quiz

Which of these dataclass fields requires field(default_factory=...)?

Exercise: Build a Shopping Cart

Scenario

An e-commerce system creates a shopping cart for each customer. Each cart needs its own independent list of items so that adding to one cart doesn’t affect another.

Task

Define a Cart dataclass where items defaults to an empty list using default_factory. Add items to one cart and verify the other stays empty.

Output

Ready

Post-Init Validation with __post_init__

Dataclass accepts any value that matches the type signature, so invalid data like empty names or negative ages passes through without warning:

💡 What the output shows

An empty name, invalid email, and negative age all pass through without any error. The bad data is now in your system, potentially corrupting downstream operations.

To catch these issues early, dataclass provides a special method called __post_init__ that runs automatically after __init__ finishes. You can add validation logic here to reject bad values at creation time:

Output

Loading Python…

💡 What the output shows

The error fires at object creation, not later when you try to send an email. This means invalid data never enters your system in the first place.

Quiz

A dataclass has __post_init__ that validates email and age. You pass a valid email but age=-5. What happens?

Limitations: Manual Validation Only

__post_init__ requires you to write every validation rule yourself. If you forget to check a field, bad data can still slip through:

Output

Loading Python…

💡 What the output shows

The name is an integer and the age is a string, yet dataclass accepted both. Type hints do not enforce types at runtime, so any validation you need must be written manually in __post_init__.

Limitations: Nested Validation

Most real data is nested: customers have addresses, orders have items. With dataclass, error messages don’t tell you where in the structure the problem occurred:

Output

Loading Python…

💡 What the output shows

The error says “Invalid zip: 9ABC1” but doesn’t tell you it came from address.zip_code. In a deeply nested structure with multiple zip codes, you wouldn’t know which one failed.

Quiz

You pass address={"street": "123 Main St", "city": "NY", "zip_code": "10001"} to a dataclass Customer that expects address: Address. What happens?

Getting Started

So far, every container has treated type hints as documentation only. Pydantic is a third-party validation library that changes this. It checks types at runtime and raises clear errors when values don’t match.

To install Pydantic, run:

pip install pydantic

This course uses Pydantic 2.12.

Let’s verify the installation:

Creating a Pydantic Model

Creating a Pydantic model looks similar to dataclass and NamedTuple. To create a Pydantic model, inherit from BaseModel and declare your fields:

💡 What the output shows

The syntax looks similar to dataclass, but Pydantic validates types automatically when you create the object. You’ll see the difference in the next section.

Runtime Validation

Remember how dataclass accepted name=123 without complaint? Pydantic catches this automatically:

Output

Loading Python…

💡 What the output shows

Pydantic reports all validation failures at once: name should be a string (got int 123) and age should be a valid integer (got string 'thirty'). This saves you from fixing one error, rerunning, and discovering another.

Quiz

How does Pydantic know that name=123 is invalid without any custom validation code?

Exercise: Validate Signup Data

Scenario

A registration endpoint receives user signup data. Some entries have the wrong types: age as a non-numeric string and name as an integer. You need the model to catch all type errors at once.

Task

Define a UserSignup model that validates username (str), email (str), and age (int). Create a user with invalid data and print the validation errors.

Output

Ready

Type Coercion

Unlike dataclass which stores whatever you pass, Pydantic automatically converts compatible types:

Output

Loading Python…

💡 What the output shows

The string "28" was converted to integer 28, and "true" was converted to boolean True. This is useful when reading data from CSV files or APIs where everything comes as strings.

Quiz

You pass age="twenty-eight" to a Pydantic model with age: int. What happens?

Constraint Validation

Beyond types, you often need business rules: age must be positive, names can’t be empty, customer IDs must follow a pattern.

In dataclass, you define fields in one place and validate them in __post_init__. But raise stops at the first error, so you only learn about one problem at a time:

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKICAgIGRlZiBfX3Bvc3RfaW5pdF9fKHNlbGYpOgogICAgICAgIGlmIG5vdCBzZWxmLmN1c3RvbWVyX2lkOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKCJDdXN0b21lciBJRCBjYW5ub3QgYmUgZW1wdHkiKQogICAgICAgIGlmIG5vdCBzZWxmLm5hbWUgb3IgbGVuKHNlbGYubmFtZSkgPCAxOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKCJOYW1lIGNhbm5vdCBiZSBlbXB0eSIpCiAgICAgICAgaWYgIkAiIG5vdCBpbiBzZWxmLmVtYWlsOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKGYiSW52YWxpZCBlbWFpbDoge3NlbGYuZW1haWx9IikKICAgICAgICBpZiBzZWxmLmFnZSA8IDAgb3Igc2VsZi5hZ2UgPiAxNTA6CiAgICAgICAgICAgIHJhaXNlIFZhbHVlRXJyb3IoZiJBZ2UgbXVzdCBiZSBiZXR3ZWVuIDAgYW5kIDE1MDoge3NlbGYuYWdlfSIpCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iIiwgICMgRW1wdHkgSUQKICAgICAgICBuYW1lPSIiLCAgIyBFbXB0eSBuYW1lCiAgICAgICAgZW1haWw9ImludmFsaWQiLCAgIyBNaXNzaW5nIEAKICAgICAgICBhZ2U9LTUsICAjIE5lZ2F0aXZlIGFnZQogICAgKQpleGNlcHQgVmFsdWVFcnJvciBhcyBlOgogICAgcHJpbnQoZSkgICMgT25seSByZXBvcnRzIHRoZSBmaXJzdCB2aW9sYXRpb24=

Output

Loading Python…

💡 What the output shows

Four fields need four if blocks, four raise calls, and four hand-written messages. That’s a lot of boilerplate for simple rules like “name can’t be empty.”

Worse, raise halts at the first failure, so you only learn about "Customer ID cannot be empty" even though three other fields are also invalid.

Pydantic puts constraints directly in Field(), keeping rules next to the data they validate:

Output

Loading Python…

💡 What the output shows

The syntax is minimal: Field(min_length=1) and Field(ge=0, le=150) replace entire if blocks and hand-written error messages. Pydantic also checks every field in one pass, so all four violations surface together instead of one at a time.

Here are the most common Field() constraints:

Constraint	Type	Meaning
`gt`, `ge`	numeric	Greater than / greater than or equal
`lt`, `le`	numeric	Less than / less than or equal
`multiple_of`	numeric	Value must be divisible by this number
`min_length`, `max_length`	str, list	Minimum / maximum length
`pattern`	str	Must match a regex pattern

See the full list of Field parameters in the Pydantic docs.

Quiz

You pass name="", age=-5, and email="bad" to a Pydantic model with Field(min_length=1) on name, Field(ge=0) on age, and email validation. How many errors do you get?

Exercise: Validate a Job Posting

Scenario

A job board receives postings from employers. Each posting must have a non-empty title, a salary between 30,000 and 500,000, and a non-empty company name. Invalid postings should be rejected with all errors at once.

Task

Add Field() constraints to the JobPosting model so that invalid data is caught. Fix the constraints so the test case raises validation errors.

💡 Hint

Useful Field() constraints: gt, ge, lt, le for numbers, min_length and max_length for strings.

Output

Ready

Nested Validation

In the dataclass example, the error only said “Invalid zip: 9ABC1” with no way to trace it back to address.zip_code. Pydantic fixes this by reporting the full path to each error:

Output

Loading Python…

💡 What the output shows

Unlike the dataclass error, Pydantic points directly to address.zip_code. In a structure with multiple addresses or zip codes, you can trace the problem immediately.

Quiz

In the Pydantic example, address is passed as a plain dict, not an Address(...) object. What does Pydantic do with it?

Key Takeaways

Here’s what each tool provides:

dict: Quick to create, but silent failures from typos, missing keys, and wrong types make bugs hard to trace.
NamedTuple: Catches typos at creation and provides immutability, but does not enforce types at runtime and shares mutable defaults.
dataclass: Rejects mutable defaults with default_factory and supports validation via __post_init__, but errors are reported one at a time with no nesting path.
Pydantic: Enforces types at runtime, catches all validation errors at once, and reports the full path through nested structures like address.zip_code.

The Silent Bug

What Are Typed Data Containers?

Creating a Dictionary

Silent Failures

Quiz

Type Confusion

Creating a NamedTuple

Quiz

Catching Typos at Runtime

Quiz

Exercise: Fix a Buggy Pipeline

Scenario

Task

Immutability Prevents Accidental Changes

Quiz

Default Values

Quiz

Limitations: No Runtime Type Validation

Quiz

Exercise: Fix a Type Bug

Scenario

Task

Creating a dataclass

Quiz

Exercise: Build a Product Record

Scenario

Task

Mutability Allows Updates

Quiz

Mutable Defaults with default_factory

Quiz

Exercise: Build a Shopping Cart

Scenario

Task

Post-Init Validation with __post_init__

Quiz

Limitations: Manual Validation Only

Limitations: Nested Validation

Quiz

Getting Started

Creating a Pydantic Model

Runtime Validation

Quiz

Exercise: Validate Signup Data

Scenario

Task

Type Coercion

Quiz

Constraint Validation

Quiz

Exercise: Validate a Job Posting

Scenario

Task

Nested Validation

Quiz

Key Takeaways

Course Complete!

Work with Khuyen Tran

Work with Khuyen Tran