The Silent Bug
Imagine you’re processing customer records. The pipeline runs without errors, but customers never receive their welcome emails. After digging through the code, you discover the issue is a simple typo in a dictionary key.
---
config:
theme: dark
layout: dagre
look: neo
---
flowchart LR
A["Write data
'emial': ..."] --> B["Store
dict saves anything"] --> C["Read data
.get('email')"] --> D["Result
None, no error!"]Press Run below to see it in action.
The output looks like the customer has no email on file, but we passed "alice@example.com". The data is there, just stored under "emial".
.get("email") finds no match and returns None instead of raising an error.
This happens because dictionaries don’t know what keys they should have. Without a schema, Python treats "emial" and "email" as equally valid. The same goes for missing fields, extra fields, and wrong types.
What Are Typed Data Containers?
Python offers several ways to avoid this bug, each adding more safety than the last:
| Safety | Flexibility | Dependencies | Mutability | |
|---|---|---|---|---|
| dict | None | Any key, any value | Built-in | Mutable |
| NamedTuple | Basic | Fixed fields | Built-in | Immutable |
| dataclass | Moderate | Fixed fields, defaults | Built-in | Mutable |
| Pydantic | Full | Fixed fields, validators | pip install | Mutable |
Notice the pattern: each row gains something the previous one lacks:
- dict → NamedTuple: Gain fixed fields, lose flexibility.
- NamedTuple → dataclass: Gain mutability and defaults.
- dataclass → Pydantic: Gain type validation, add a dependency.
In this course, you’ll try each tool yourself and see how it catches the mistakes that dictionaries miss.
Creating a Dictionary
A dictionary maps string keys to values. It’s the most common way to represent a record in Python, but it has no fixed structure. You can add, remove, or misspell any key at any time.
Creating one takes a single pair of curly braces:
The output prints Alice Smith by looking up the "name" key in the dictionary.
Silent Failures
A typo in the key name causes a KeyError at runtime:
The error tells you what went wrong but not where. When dictionaries pass through multiple functions, finding the source of a typo can take significant debugging time:
The error is raised in send_email(), but the actual bug (the typo "emial") was introduced in load_customer(). The bug and its symptom are in different functions.
Using .get() makes it worse by returning None silently:
Quiz
What does {"name": "Alice"}.get("email") return?
Type Confusion
Missing keys aren’t the only risk. Without a schema, dictionaries also accept the wrong type for any field.
Let’s see what happens when age is stored as a string instead of an integer:
"28" * 2 produces "2828" instead of 56. Since "28" is a string, Python repeats it twice instead of doubling the number. The code runs fine, but the result is silently wrong.
Creating a NamedTuple
NamedTuple is a lightweight way to define a fixed structure with named fields and type hints, like a dictionary with a schema.
Instead of string keys, you declare a NamedTuple class with fixed fields. Every object created from it must provide values for those exact fields:
Printing the object displays all five fields by name and value in the order they were defined.
Once created, you can access fields with dot notation instead of string keys like customer["name"]. This allows your IDE to autocomplete the field names and catch typos immediately:
Quiz
What happens if you create a Customer without providing the email field?
Catching Typos at Runtime
In the dictionary pipeline, load_customer returned {"emial": row[2]} and the typo traveled through validate_customer before crashing in send_email. With NamedTuple, the same typo fails at the source:
The error is raised inside load_customer, exactly where the typo was made, so you spend less time tracing through functions to find the bug.
Quiz
A NamedTuple Customer has fields customer_id, name, email, age, is_premium. You write Customer(customer_id="C001", nme="Alice", email="a@b.com", age=28, is_premium=True). When does the error appear?
Exercise: Fix a Buggy Pipeline
Immutability Prevents Accidental Changes
Dictionaries let you change any value at any time, which means fields can be overwritten by accident. NamedTuples are immutable, so once created, their values cannot be changed:
Assigning "Bob" to customer.name raises an AttributeError. Once a NamedTuple is created, its values are fixed.
Quiz
Why is immutability useful when passing a Customer object through multiple functions?
Default Values
NamedTuple supports default values for simple types like bool and str:
Customer("Alice") uses the default False for is_premium, while Customer("Bob", is_premium=True) overrides it. You only need to pass values that differ from the defaults.
However, mutable defaults like lists are shared across all instances, which can cause unexpected behavior:
Both Alice and Bob show ["premium"]. This happens because Python creates the default [] once when it reads the class, then hands that same list to every instance. There’s only one list in memory, so c1.tags and c2.tags are the same object.
This diagram shows how the single default list is shared before and after the append:
---
config:
theme: dark
layout: dagre
look: neo
---
flowchart TD
subgraph After c1.tags.append
c1b[c1.tags] --> list2["['premium']"]
c2b[c2.tags] --> list2
end
subgraph Before append
c1a[c1.tags] --> list1["[ ]"]
c2a[c2.tags] --> list1
endQuiz
NamedTuple is immutable, yet c1.tags.append("premium") works without error. Why?
Limitations: No Runtime Type Validation
Type hints in NamedTuple are not enforced at runtime. You can still pass in wrong types:
Python accepts name=123 and age="old" without complaint. NamedTuple type hints are for documentation and static analysis only. They are not enforced at runtime.
Quiz
What is the purpose of type hints like age: int in a NamedTuple if they are not enforced?
Exercise: Fix a Type Bug
Creating a dataclass
A dataclass is a class decorator that automatically generates __init__, __repr__, and other methods from field definitions. It provides the same fixed fields and IDE support as NamedTuple, plus:
- Mutable fields: Change values after creation, unlike NamedTuple
- Default values: Fields can have defaults, including empty lists and dicts
- Post-init logic: Run custom code right after an object is created
Creating a dataclass looks similar to NamedTuple, but you use the @dataclass decorator instead of inheriting:
The output matches the NamedTuple format. Both give you named fields and readable printing. Where they differ is mutability and default handling, which the next sections cover.
Quiz
What happens if you try to create Customer(customer_id="C001", nmae="Alice", email="a@b.com", age=28)?
Exercise: Build a Product Record
Mutability Allows Updates
Dataclass trades NamedTuple’s immutability protection for flexibility. You can modify fields after creation:
Unlike NamedTuple, dataclass allows field modification. This is useful for objects that need to change over time, like a customer upgrading their account.
To prevent accidentally adding new attributes, you can use @dataclass(slots=True), which creates a fixed set of attributes that cannot be changed:
Without slots=True, the dataclass would silently create a new attribute nmae on the object. With slots, it raises an error immediately, catching the typo.
Quiz
What does @dataclass(slots=True) prevent?
Mutable Defaults with default_factory
Remember the shared list problem from NamedTuple? Dataclass prevents this by rejecting mutable defaults entirely:
Dataclass raises a ValueError instead of silently sharing the list. It forces you to use field(default_factory=...), which creates a new list for each instance.
Dataclass offers field(default_factory=...) as the solution. The factory function runs at instance creation, so each object gets its own list:
Unlike the NamedTuple example, Order 2 stays empty because default_factory creates a fresh list for each instance. This is the safe way to use mutable defaults.
To see why this works, compare what happens at creation versus after appending:
---
config:
theme: dark
layout: dagre
look: neo
---
flowchart TD
subgraph After order1.items.append
o1b[order1.items] --> list1b["['apple']"]
o2b[order2.items] --> list2b["[ ]"]
end
subgraph At creation
o1a[order1.items] --> list1a["[ ]"]
o2a[order2.items] --> list2a["[ ]"]
endQuiz
Which of these dataclass fields requires field(default_factory=...)?
Exercise: Build a Shopping Cart
Post-Init Validation with __post_init__
Dataclass accepts any value that matches the type signature, so invalid data like empty names or negative ages passes through without warning:
An empty name, invalid email, and negative age all pass through without any error. The bad data is now in your system, potentially corrupting downstream operations.
To catch these issues early, dataclass provides a special method called __post_init__ that runs automatically after __init__ finishes. You can add validation logic here to reject bad values at creation time:
The error fires at object creation, not later when you try to send an email. This means invalid data never enters your system in the first place.
Quiz
A dataclass has __post_init__ that validates email and age. You pass a valid email but age=-5. What happens?
Limitations: Manual Validation Only
__post_init__ requires you to write every validation rule yourself. If you forget to check a field, bad data can still slip through:
The name is an integer and the age is a string, yet dataclass accepted both. Type hints do not enforce types at runtime, so any validation you need must be written manually in __post_init__.
Limitations: Nested Validation
Most real data is nested: customers have addresses, orders have items. With dataclass, error messages don’t tell you where in the structure the problem occurred:
The error says “Invalid zip: 9ABC1” but doesn’t tell you it came from address.zip_code. In a deeply nested structure with multiple zip codes, you wouldn’t know which one failed.
Quiz
You pass address={"street": "123 Main St", "city": "NY", "zip_code": "10001"} to a dataclass Customer that expects address: Address. What happens?
Getting Started
So far, every container has treated type hints as documentation only. Pydantic is a third-party validation library that changes this. It checks types at runtime and raises clear errors when values don’t match.
To install Pydantic, run:
pip install pydanticThis course uses Pydantic 2.12.
Let’s verify the installation:
Creating a Pydantic Model
Creating a Pydantic model looks similar to dataclass and NamedTuple. To create a Pydantic model, inherit from BaseModel and declare your fields:
The syntax looks similar to dataclass, but Pydantic validates types automatically when you create the object. You’ll see the difference in the next section.
Runtime Validation
Remember how dataclass accepted name=123 without complaint? Pydantic catches this automatically:
Pydantic reports all validation failures at once: name should be a string (got int 123) and age should be a valid integer (got string 'thirty'). This saves you from fixing one error, rerunning, and discovering another.
Quiz
How does Pydantic know that name=123 is invalid without any custom validation code?
Exercise: Validate Signup Data
Type Coercion
Unlike dataclass which stores whatever you pass, Pydantic automatically converts compatible types:
The string "28" was converted to integer 28, and "true" was converted to boolean True. This is useful when reading data from CSV files or APIs where everything comes as strings.
Quiz
You pass age="twenty-eight" to a Pydantic model with age: int. What happens?
Constraint Validation
Beyond types, you often need business rules: age must be positive, names can’t be empty, customer IDs must follow a pattern.
In dataclass, you define fields in one place and validate them in __post_init__. But raise stops at the first error, so you only learn about one problem at a time:
Four fields need four if blocks, four raise calls, and four hand-written messages. That’s a lot of boilerplate for simple rules like “name can’t be empty.”
Worse, raise halts at the first failure, so you only learn about "Customer ID cannot be empty" even though three other fields are also invalid.
Pydantic puts constraints directly in Field(), keeping rules next to the data they validate:
The syntax is minimal: Field(min_length=1) and Field(ge=0, le=150) replace entire if blocks and hand-written error messages. Pydantic also checks every field in one pass, so all four violations surface together instead of one at a time.
Here are the most common Field() constraints:
| Constraint | Type | Meaning |
|---|---|---|
gt, ge | numeric | Greater than / greater than or equal |
lt, le | numeric | Less than / less than or equal |
multiple_of | numeric | Value must be divisible by this number |
min_length, max_length | str, list | Minimum / maximum length |
pattern | str | Must match a regex pattern |
See the full list of Field parameters in the Pydantic docs.
Quiz
You pass name="", age=-5, and email="bad" to a Pydantic model with Field(min_length=1) on name, Field(ge=0) on age, and email validation. How many errors do you get?
Exercise: Validate a Job Posting
Nested Validation
In the dataclass example, the error only said “Invalid zip: 9ABC1” with no way to trace it back to address.zip_code. Pydantic fixes this by reporting the full path to each error:
Unlike the dataclass error, Pydantic points directly to address.zip_code. In a structure with multiple addresses or zip codes, you can trace the problem immediately.
Quiz
In the Pydantic example, address is passed as a plain dict, not an Address(...) object. What does Pydantic do with it?
Key Takeaways
Here’s what each tool provides:
- dict: Quick to create, but silent failures from typos, missing keys, and wrong types make bugs hard to trace.
- NamedTuple: Catches typos at creation and provides immutability, but does not enforce types at runtime and shares mutable defaults.
- dataclass: Rejects mutable defaults with
default_factoryand supports validation via__post_init__, but errors are reported one at a time with no nesting path. - Pydantic: Enforces types at runtime, catches all validation errors at once, and reports the full path through nested structures like
address.zip_code.