Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Course

Python Data Modeling with Dataclasses and Pydantic

/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}

body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}

/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}

/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}

.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}

.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}

.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}

.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}

.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}

/* Navigation */
.nav-section {
margin-bottom: 1rem;
}

.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}

.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}

.nav-item:hover {
background: #3d3b3c;
color: #fff;
}

.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}

.nav-item.completed .status-icon {
color: #72BEFA;
}

.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}

.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}

.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}

/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}

.lesson {
display: none;
}

.lesson.active {
display: block;
}

.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}

.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}

.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.lesson p {
color: #ccc;
margin-bottom: 1rem;
}

.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}

.lesson li {
margin-bottom: 0.5rem;
}

.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}

.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}

.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}

/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}

.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}

.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}

.callout-tip .callout-title {
color: #72BEFA;
}

.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}

.callout-note .callout-title {
color: #72FCDB;
}

.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}

.callout-warning .callout-title {
color: #E583B6;
}

.callout a {
color: #fff;
text-decoration: underline;
}

.callout a:hover {
color: #72FCDB;
}

/* Collapsible callouts */
details.callout {
cursor: pointer;
}

details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}

details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}

details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}

details.callout summary.callout-title::-webkit-details-marker {
display: none;
}

details.callout > p {
margin-top: 0.75rem;
}

.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}

.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}

/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}

.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}

/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}

/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}

.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}

.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}

.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}

.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}

.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}

.quiz-option:disabled {
cursor: default;
}

.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}

.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}

.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}

.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}

.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}

.option-content {
display: block;
flex: 1;
color: #ccc;
}

.option-content code {
background: #282a36;
padding: 0.15rem 0.4rem;
border-radius: 4px;
font-size: 0.85rem;
color: #f8f8f2;
}

.code-option code {
display: block;
padding: 0.5rem 0.75rem;
}

.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}

.quiz-feedback .callout {
margin: 0;
}

/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}

.codecut-run-btn:hover {
background: #5aa8e8;
}

.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}

.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}

.codecut-editor textarea,
.exercise-editor textarea {
display: none;
}

/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}

.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}

.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror-scroll {
min-height: auto;
}

/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}

.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}

.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}

.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}

.CodeMirror-sizer {
margin-left: 40px !important;
}

.CodeMirror-cursor {
border-left-color: #72BEFA;
}

.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}

.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}

/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}

.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}

.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}

.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}

.course-image {
max-width: 100%;
height: auto;
border-radius: 4px;
display: block;
margin: 1em 0;
}

pre.mermaid {
text-align: center;
background: transparent;
border: none;
padding: 1em 0;
margin: 1em 0;
}

pre.mermaid svg {
background: transparent !important;
}

.codecut-output img {
max-width: 100%;
height: auto;
border-radius: 4px;
}

.codecut-output.has-image {
max-height: none;
white-space: normal;
}

.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }

.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}

@keyframes spin {
to { transform: rotate(360deg); }
}

/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.exercise-split {
display: flex;
flex-direction: column;
}

.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}

.exercise-assignment p {
margin: 0;
}

.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-section {
flex: 1;
min-width: 200px;
}

.exercise-heading + p {
margin-top: 0;
}

.exercise-assignment em {
color: #ffffff;
font-style: italic;
}

.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}

.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}

.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}

.exercise-secret:last-child {
margin-bottom: 0;
}

.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}

.exercise-secret input:focus {
border-color: #72BEFA;
}

.exercise-secret input::placeholder {
color: #666;
}

.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}

.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}

.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}

.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}

.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}

.exercise-btn:hover {
background: #4d4b4c;
}

.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}

.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}

.exercise-btn.primary:hover {
background: #5aa8e8;
}

.exercise-btn.primary:disabled {
background: #666;
}

.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}

.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}

.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }

.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}

.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}

.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}

/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}

.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}

.lesson-nav-btn:hover {
background: #4a4849;
}

.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}

.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}

/* Completion modal */
.completion-overlay {
display: none;
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.7);
z-index: 1000;
align-items: center;
justify-content: center;
padding: 1rem;
}

.completion-modal {
background: #2F2D2E;
border: 1px solid #4a4849;
border-radius: 16px;
max-width: 520px;
width: 100%;
padding: 2.5rem;
text-align: center;
position: relative;
}

.completion-modal-close {
position: absolute;
top: 1rem;
right: 1rem;
background: none;
border: none;
color: #999;
font-size: 1.25rem;
cursor: pointer;
padding: 0.25rem;
line-height: 1;
}

.completion-modal-close:hover {
color: #fff;
}

.completion-modal h2 {
color: #72BEFA;
font-size: 1.5rem;
margin-bottom: 0.5rem;
}

.completion-modal p {
color: #ccc;
margin-bottom: 1.5rem;
font-size: 0.95rem;
line-height: 1.5;
}

.completion-courses {
display: flex;
flex-direction: column;
gap: 0.75rem;
margin-bottom: 1.5rem;
}

.completion-course-card {
display: block;
background: #3d3b3c;
border: 1px solid #4a4849;
border-radius: 10px;
padding: 1rem 1.25rem;
text-decoration: none;
text-align: left;
transition: border-color 0.2s;
}

.completion-course-card:hover {
border-color: #72BEFA;
}

.completion-course-card .card-title {
color: #72BEFA;
font-size: 0.95rem;
font-weight: 600;
margin-bottom: 0.25rem;
}

.completion-course-card .card-desc {
color: #999;
font-size: 0.8rem;
}

.completion-browse {
display: inline-block;
color: #E583B6;
font-size: 0.9rem;
text-decoration: none;
}

.completion-browse:hover {
text-decoration: underline;
}

/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}

.course-content {
margin-left: 0;
padding: 1.5rem;
}

.course-layout {
flex-direction: column;
}
}

Python Data Modeling with Dataclasses and Pydantic
0 of 29 completed

Getting Started


The Silent Bug


What Are Typed Data Containers?

Using Dictionaries


Creating a Dictionary


Silent Failures


Type Confusion

Using NamedTuple


Creating a NamedTuple


Catching Typos at Runtime


Exercise: Fix a Buggy Pipeline


Immutability Prevents Accidental Changes


Default Values


Limitations: No Runtime Type Validation


Exercise: Fix a Type Bug

Using dataclass


Creating a dataclass


Exercise: Build a Product Record


Mutability Allows Updates


Mutable Defaults with default_factory


Exercise: Build a Shopping Cart


Post-Init Validation with __post_init__


Limitations: Manual Validation Only


Limitations: Nested Validation

Using Pydantic


Getting Started


Creating a Pydantic Model


Runtime Validation


Exercise: Validate Signup Data


Type Coercion


Constraint Validation


Exercise: Validate a Job Posting


Nested Validation

Summary


Key Takeaways

The Silent Bug
Imagine you’re processing customer records. The pipeline runs without errors, but customers never receive their welcome emails. After digging through the code, you discover the issue is a simple typo in a dictionary key.


config:
theme: dark
layout: dagre
look: neo

flowchart LR
A[“Write data’emial’: …”] –> B[“Storedict saves anything”] –> C[“Read data.get(’email’)”] –> D[“ResultNone, no error!”]

Press Run below to see it in action.

Python

Run

ZGVmIGxvYWRfY3VzdG9tZXIocm93KToKICAgIHJldHVybiB7ImN1c3RvbWVyX2lkIjogcm93WzBdLCAibmFtZSI6IHJvd1sxXSwgImVtaWFsIjogcm93WzJdfSAgIyBUeXBvCgoKZGVmIHNlbmRfd2VsY29tZV9lbWFpbChjdXN0b21lcik6CiAgICBlbWFpbCA9IGN1c3RvbWVyLmdldCgiZW1haWwiKSAgIyBSZXR1cm5zIE5vbmUgc2lsZW50bHkKICAgIGlmIGVtYWlsOgogICAgICAgIHByaW50KGYiU2VuZGluZyBlbWFpbCB0byB7ZW1haWx9IikKICAgIGVsc2U6CiAgICAgICAgcHJpbnQoIk5vIGVtYWlsIGZvdW5kLiBOb3RoaW5nIHNlbnQhIikKCgpjdXN0b21lciA9IGxvYWRfY3VzdG9tZXIoWyJDMDAxIiwgIkFsaWNlIiwgImFsaWNlQGV4YW1wbGUuY29tIl0pCnNlbmRfd2VsY29tZV9lbWFpbChjdXN0b21lcikgICMgTm90aGluZyBoYXBwZW5z

Output

Loading Python…

💡 Tip
The output looks like the customer has no email on file, but we passed "alice@example.com". The data is there, just stored under "emial".
.get("email") finds no match and returns None instead of raising an error.

This happens because dictionaries don’t know what keys they should have. Without a schema, Python treats "emial" and "email" as equally valid. The same goes for missing fields, extra fields, and wrong types.

Complete & Continue →

What Are Typed Data Containers?
Python offers several ways to avoid this bug, each adding more safety than the last:

Safety
Flexibility
Dependencies
Mutability

dict
None
Any key, any value
Built-in
Mutable

NamedTuple
Basic
Fixed fields
Built-in
Immutable

dataclass
Moderate
Fixed fields, defaults
Built-in
Mutable

Pydantic
Full
Fixed fields, validators
pip install
Mutable

Notice the pattern: each row gains something the previous one lacks:

dict → NamedTuple: Gain fixed fields, lose flexibility.
NamedTuple → dataclass: Gain mutability and defaults.
dataclass → Pydantic: Gain type validation, add a dependency.

In this course, you’ll try each tool yourself and see how it catches the mistakes that dictionaries miss.

← Previous

Complete & Continue →

Creating a Dictionary
A dictionary maps string keys to values. It’s the most common way to represent a record in Python, but it has no fixed structure. You can add, remove, or misspell any key at any time.

Creating one takes a single pair of curly braces:

Python

Run

Y3VzdG9tZXIgPSB7CiAgICAiY3VzdG9tZXJfaWQiOiAiQzAwMSIsCiAgICAibmFtZSI6ICJBbGljZSBTbWl0aCIsCiAgICAiZW1haWwiOiAiYWxpY2VAZXhhbXBsZS5jb20iLAogICAgImFnZSI6IDI4LAogICAgImlzX3ByZW1pdW0iOiBUcnVlLAp9CgpwcmludChjdXN0b21lclsibmFtZSJdKQ==

Output

Loading Python…

💡 Tip
The output prints Alice Smith by looking up the "name" key in the dictionary.

← Previous

Complete & Continue →

Silent Failures
A typo in the key name causes a KeyError at runtime:

Python

Run

Y3VzdG9tZXIgPSB7CiAgICAiY3VzdG9tZXJfaWQiOiAiQzAwMSIsCiAgICAibmFtZSI6ICJBbGljZSBTbWl0aCIsCiAgICAiZW1haWwiOiAiYWxpY2VAZXhhbXBsZS5jb20iLAogICAgImFnZSI6IDI4LAogICAgImlzX3ByZW1pdW0iOiBUcnVlLAp9Cgp0cnk6CiAgICBjdXN0b21lclsiZW1pYWwiXSAgIyBUeXBvOiBzaG91bGQgYmUgImVtYWlsIgpleGNlcHQgS2V5RXJyb3IgYXMgZToKICAgIHByaW50KGYiS2V5RXJyb3I6IHtlfSIp

Output

Loading Python…

The error tells you what went wrong but not where. When dictionaries pass through multiple functions, finding the source of a typo can take significant debugging time:

Python

Run

aW1wb3J0IHRyYWNlYmFjawoKCmRlZiBsb2FkX2N1c3RvbWVyKHJvdyk6CiAgICByZXR1cm4geyJjdXN0b21lcl9pZCI6IHJvd1swXSwgIm5hbWUiOiByb3dbMV0sICJlbWlhbCI6IHJvd1syXX0gICMgVHlwbyBoZXJlCgoKZGVmIHZhbGlkYXRlX2N1c3RvbWVyKGN1c3RvbWVyKToKICAgIHJldHVybiBjdXN0b21lciAgIyBQYXNzZXMgdGhyb3VnaCB1bmNoYW5nZWQKCgpkZWYgc2VuZF9lbWFpbChjdXN0b21lcik6CiAgICByZXR1cm4gY3VzdG9tZXJbImVtYWlsIl0gICMgS2V5RXJyb3IgcmFpc2VkIGhlcmUKCgp0cnk6CiAgICBjdXN0b21lciA9IGxvYWRfY3VzdG9tZXIoWyJDMDAxIiwgIkFsaWNlIiwgImFsaWNlQGV4YW1wbGUuY29tIl0pCiAgICB2YWxpZGF0ZWQgPSB2YWxpZGF0ZV9jdXN0b21lcihjdXN0b21lcikKICAgIHNlbmRfZW1haWwodmFsaWRhdGVkKSAgIyBFcnJvciBwb2ludHMgaGVyZSwgYnV0IGJ1ZyBpcyBpbiBsb2FkX2N1c3RvbWVyCmV4Y2VwdCBLZXlFcnJvcjoKICAgIHRyYWNlYmFjay5wcmludF9leGMoKQ==

Output

Loading Python…

💡 What the output shows
The error is raised in send_email(), but the actual bug (the typo "emial") was introduced in load_customer(). The bug and its symptom are in different functions.

Using .get() makes it worse by returning None silently:

Python

Run

ZGVmIGxvYWRfY3VzdG9tZXIocm93KToKICAgIHJldHVybiB7ImN1c3RvbWVyX2lkIjogcm93WzBdLCAibmFtZSI6IHJvd1sxXSwgImVtaWFsIjogcm93WzJdfQoKCmRlZiBzZW5kX2VtYWlsKGN1c3RvbWVyKToKICAgIGVtYWlsID0gY3VzdG9tZXIuZ2V0KCJlbWFpbCIpICAjIFJldHVybnMgTm9uZSBzaWxlbnRseQogICAgaWYgZW1haWw6CiAgICAgICAgcHJpbnQoZiJTZW5kaW5nIGVtYWlsIHRvIHtlbWFpbH0iKQoKCiMgUnVucyB3aXRob3V0IGFueSBlcnJvciBvciBvdXRwdXQKY3VzdG9tZXIgPSBsb2FkX2N1c3RvbWVyKFsiQzAwMSIsICJBbGljZSIsICJhbGljZUBleGFtcGxlLmNvbSJdKQpzZW5kX2VtYWlsKGN1c3RvbWVyKQpwcmludCgiRG9uZS4gQ3VzdG9tZXIgaGFkIG5vIGVtYWlsIG9uIGZpbGUuIik=

Output

Loading Python…

Quiz

What does {"name": "Alice"}.get("email") return?

A
It raises a KeyError

B
It returns None

C
It returns an empty string ""

⚠ Try Again
That’s what bracket access (d["email"]) does. .get() is designed to avoid raising errors, which is why it can hide bugs.

💡 Correct
Correct! .get() returns None when the key is missing. This is convenient but dangerous: your code keeps running with None instead of failing fast.

⚠ Try Again
.get() doesn’t return an empty string by default. It returns None unless you provide a second argument like .get("email", "").

← Previous

Complete & Continue →

Type Confusion
Missing keys aren’t the only risk. Without a schema, dictionaries also accept the wrong type for any field.

Let’s see what happens when age is stored as a string instead of an integer:

Python

Run

Y3VzdG9tZXIgPSB7CiAgICAiY3VzdG9tZXJfaWQiOiAiQzAwMSIsCiAgICAibmFtZSI6ICJBbGljZSBTbWl0aCIsCiAgICAiYWdlIjogIjI4IiwgICMgU3RyaW5nIGluc3RlYWQgb2YgaW50Cn0KCiMgTm8gZXJyb3Ig4oCUIGJ1dCB0aGUgbWF0aCBpcyB3cm9uZwpwcmludChmIkFnZToge2N1c3RvbWVyWydhZ2UnXX0iKQpwcmludChmIkFnZSB0aW1lcyAyOiB7Y3VzdG9tZXJbJ2FnZSddICogMn0iKQ==

Output

Loading Python…

💡 What the output shows
"28" * 2 produces "2828" instead of 56. Since "28" is a string, Python repeats it twice instead of doubling the number. The code runs fine, but the result is silently wrong.

← Previous

Complete & Continue →

Creating a NamedTuple
NamedTuple is a lightweight way to define a fixed structure with named fields and type hints, like a dictionary with a schema.

Instead of string keys, you declare a NamedTuple class with fixed fields. Every object created from it must provide values for those exact fields:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wKCgpjdXN0b21lciA9IEN1c3RvbWVyKAogICAgY3VzdG9tZXJfaWQ9IkMwMDEiLAogICAgbmFtZT0iQWxpY2UgU21pdGgiLAogICAgZW1haWw9ImFsaWNlQGV4YW1wbGUuY29tIiwKICAgIGFnZT0yOCwKICAgIGlzX3ByZW1pdW09VHJ1ZSwKKQoKcHJpbnQoY3VzdG9tZXIp

Output

Loading Python…

💡 What the output shows
Printing the object displays all five fields by name and value in the order they were defined.

Once created, you can access fields with dot notation instead of string keys like customer["name"]. This allows your IDE to autocomplete the field names and catch typos immediately:

Python

Run

cHJpbnQoY3VzdG9tZXIubmFtZSkKcHJpbnQoY3VzdG9tZXIuZW1haWwp

Output

Loading Python…

Quiz

What happens if you create a Customer without providing the email field?

A
The email field is set to None by default

B
Python raises a TypeError because all fields are required

C
The object is created with an empty string for email

⚠ Try Again
Not quite. NamedTuple does not provide default values unless you explicitly define them. Every field must be provided at creation.

💡 Correct
Correct! NamedTuple requires values for all fields. Leaving one out raises a TypeError immediately, unlike a dict where missing keys fail silently later.

⚠ Try Again
Not quite. NamedTuple does not fill in missing fields with placeholder values. You must provide every field when creating the object.

← Previous

Complete & Continue →

Catching Typos at Runtime
In the dictionary pipeline, load_customer returned {"emial": row[2]} and the typo traveled through validate_customer before crashing in send_email. With NamedTuple, the same typo fails at the source:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wKCgpkZWYgbG9hZF9jdXN0b21lcihyb3cpOgogICAgdHJ5OgogICAgICAgIHJldHVybiBDdXN0b21lcigKICAgICAgICAgICAgY3VzdG9tZXJfaWQ9cm93WzBdLAogICAgICAgICAgICBuYW1lPXJvd1sxXSwKICAgICAgICAgICAgZW1pYWw9cm93WzJdLCAgIyBTYW1lIHR5cG8gYXMgYmVmb3JlCiAgICAgICAgICAgIGFnZT1yb3dbM10sCiAgICAgICAgICAgIGlzX3ByZW1pdW09cm93WzRdLAogICAgICAgICkKICAgIGV4Y2VwdCBUeXBlRXJyb3IgYXMgZToKICAgICAgICBwcmludChmIlR5cGVFcnJvcjoge2V9IikKICAgICAgICByZXR1cm4gTm9uZQoKCmN1c3RvbWVyID0gbG9hZF9jdXN0b21lcihbIkMwMDEiLCAiQWxpY2UiLCAiYWxpY2VAZXhhbXBsZS5jb20iLCAyOCwgVHJ1ZV0pCnByaW50KGYiQ3VzdG9tZXI6IHtjdXN0b21lcn0iKQ==

Output

Loading Python…

💡 What the output shows
The error is raised inside load_customer, exactly where the typo was made, so you spend less time tracing through functions to find the bug.

Quiz

A NamedTuple Customer has fields customer_id, name, email, age, is_premium. You write Customer(customer_id="C001", nme="Alice", email="a@b.com", age=28, is_premium=True). When does the error appear?

A
When you try to access customer.name later in the code

B
Immediately when creating the object, before any other code runs

C
Only when you print the object

⚠ Try Again
Not quite. Unlike a dict where missing keys fail at access time, NamedTuple catches the typo at creation. The object is never created.

💡 Correct
Correct! NamedTuple raises a TypeError at creation because nme is not a valid field. The bug is caught at the source, not downstream.

⚠ Try Again
Not quite. The error happens before the object exists. NamedTuple validates field names during creation, not when you use the object.

← Previous

Complete & Continue →

Exercise: Fix a Buggy Pipeline

ScenarioThe load_customer function from the dictionary section had a typo ("emial") that traveled silently through the pipeline. Your team wants to prevent this class of bug entirely.TaskRewrite this dict-based pipeline to use a Customer NamedTuple so the typo is caught at creation. Fix the typo so the pipeline works.

IyBSZXdyaXRlIHVzaW5nIE5hbWVkVHVwbGUgYW5kIGZpeCB0aGUgYnVnCmRlZiBsb2FkX2N1c3RvbWVyKHJvdyk6CiAgICByZXR1cm4gewogICAgICAgICJjdXN0b21lcl9pZCI6IHJvd1swXSwKICAgICAgICAibmFtZSI6IHJvd1sxXSwKICAgICAgICAiZW1pYWwiOiByb3dbMl0sCiAgICB9CgpkZWYgc2VuZF9lbWFpbChjdXN0b21lcik6CiAgICBwcmludChmIlNlbmRpbmcgZW1haWwgdG8ge2N1c3RvbWVyWydlbWFpbCddfSIpCgpjdXN0b21lciA9IGxvYWRfY3VzdG9tZXIoWyJDMDAxIiwgIkFsaWNlIiwgImFsaWNlQGV4YW1wbGUuY29tIl0pCnNlbmRfZW1haWwoY3VzdG9tZXIp

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Immutability Prevents Accidental Changes
Dictionaries let you change any value at any time, which means fields can be overwritten by accident. NamedTuples are immutable, so once created, their values cannot be changed:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wKCgpjdXN0b21lciA9IEN1c3RvbWVyKAogICAgY3VzdG9tZXJfaWQ9IkMwMDEiLAogICAgbmFtZT0iQWxpY2UgU21pdGgiLAogICAgZW1haWw9ImFsaWNlQGV4YW1wbGUuY29tIiwKICAgIGFnZT0yOCwKICAgIGlzX3ByZW1pdW09VHJ1ZSwKKQoKdHJ5OgogICAgY3VzdG9tZXIubmFtZSA9ICJCb2IiCmV4Y2VwdCBBdHRyaWJ1dGVFcnJvciBhcyBlOgogICAgcHJpbnQoZiJBdHRyaWJ1dGVFcnJvcjoge2V9Iik=

Output

Loading Python…

💡 What the output shows
Assigning "Bob" to customer.name raises an AttributeError. Once a NamedTuple is created, its values are fixed.

Quiz

Why is immutability useful when passing a Customer object through multiple functions?

A
It makes the code run faster because Python optimizes immutable objects

B
No function can accidentally change the data, so each function sees the original values

C
It prevents other developers from reading the data

⚠ Try Again
Not quite. While immutable objects can have some performance benefits, the main advantage is data safety across function calls.

💡 Correct
Correct! When data is immutable, you can pass it through any number of functions knowing that no function can alter the original values. This eliminates a whole class of bugs.

⚠ Try Again
Not quite. Immutability prevents modification, not reading. Any function can still read and use the data freely.

← Previous

Complete & Continue →

Default Values
NamedTuple supports default values for simple types like bool and str:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIG5hbWU6IHN0cgogICAgaXNfcHJlbWl1bTogYm9vbCA9IEZhbHNlCgoKYzEgPSBDdXN0b21lcigiQWxpY2UiKQpjMiA9IEN1c3RvbWVyKCJCb2IiLCBpc19wcmVtaXVtPVRydWUpCnByaW50KGYiQWxpY2UgcHJlbWl1bT8ge2MxLmlzX3ByZW1pdW19IikKcHJpbnQoZiJCb2IgcHJlbWl1bT8ge2MyLmlzX3ByZW1pdW19Iik=

Output

Loading Python…

💡 What the output shows
Customer("Alice") uses the default False for is_premium, while Customer("Bob", is_premium=True) overrides it. You only need to pass values that differ from the defaults.

However, mutable defaults like lists are shared across all instances, which can cause unexpected behavior:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIG5hbWU6IHN0cgogICAgdGFnczogbGlzdCA9IFtdICAjIEFsbCBjdXN0b21lcnMgc2hhcmUgdGhpcyBsaXN0CgoKYzEgPSBDdXN0b21lcigiQWxpY2UiKQpjMiA9IEN1c3RvbWVyKCJCb2IiKQpjMS50YWdzLmFwcGVuZCgicHJlbWl1bSIpCnByaW50KGYiQWRkZWQgJ3ByZW1pdW0nIHRvIEFsaWNlIikKcHJpbnQoZiJBbGljZToge2MxLnRhZ3N9IikKcHJpbnQoZiJCb2I6ICAge2MyLnRhZ3N9Iik=

Output

Loading Python…

💡 What the output shows
Both Alice and Bob show ["premium"]. This happens because Python creates the default [] once when it reads the class, then hands that same list to every instance. There’s only one list in memory, so c1.tags and c2.tags are the same object.

This diagram shows how the single default list is shared before and after the append:


config:
theme: dark
layout: dagre
look: neo

flowchart TD
subgraph After c1.tags.append
c1b[c1.tags] –> list2[“[‘premium’]”]
c2b[c2.tags] –> list2
end

subgraph Before append
c1a[c1.tags] –> list1[“[ ]”]
c2a[c2.tags] –> list1
end

Quiz

NamedTuple is immutable, yet c1.tags.append("premium") works without error. Why?

A
append bypasses immutability because it’s a built-in method

B
Immutability prevents reassigning the field (c1.tags = […]), but the list itself is still mutable

C
NamedTuple is only immutable for string and number fields

⚠ Try Again
Not quite. append has no special privileges. The key is that immutability applies to the field reference, not to the object the field points to.

💡 Correct
Correct! c1.tags = new_list would raise an AttributeError. But c1.tags.append(…) modifies the list object that the field points to, which is allowed because the list itself is mutable.

⚠ Try Again
Not quite. NamedTuple immutability applies equally to all fields. The difference is between reassigning a field and modifying the object it references.

← Previous

Complete & Continue →

Limitations: No Runtime Type Validation
Type hints in NamedTuple are not enforced at runtime. You can still pass in wrong types:

Python

Run

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCgpjbGFzcyBDdXN0b21lcihOYW1lZFR1cGxlKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wKCgojIFdyb25nIHR5cGVzIGFyZSBhY2NlcHRlZCB3aXRob3V0IGVycm9yCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPTEyMywgICMgU2hvdWxkIGJlIHN0ciwgYnV0IGludCBpcyBhY2NlcHRlZAogICAgZW1haWw9ImFsaWNlQGV4YW1wbGUuY29tIiwKICAgIGFnZT0idHdlbnR5LWVpZ2h0IiwgICMgU2hvdWxkIGJlIGludCwgYnV0IHN0ciBpcyBhY2NlcHRlZAogICAgaXNfcHJlbWl1bT1UcnVlLAopCgpwcmludChmIk5hbWU6IHtjdXN0b21lci5uYW1lfSwgQWdlOiB7Y3VzdG9tZXIuYWdlfSIpCnByaW50KGYiTmFtZSB0eXBlOiB7dHlwZShjdXN0b21lci5uYW1lKS5fX25hbWVfX30sIEFnZSB0eXBlOiB7dHlwZShjdXN0b21lci5hZ2UpLl9fbmFtZV9ffSIp

Output

Loading Python…

💡 What the output shows
Python accepts name=123 and age="old" without complaint. NamedTuple type hints are for documentation and static analysis only. They are not enforced at runtime.

Quiz

What is the purpose of type hints like age: int in a NamedTuple if they are not enforced?

A
They help IDEs and static analysis tools like mypy catch type errors before running the code

B
They convert values to the correct type automatically

C
They have no purpose and can be removed safely

💡 Correct
Correct! Type hints enable IDE autocomplete, inline warnings, and tools like mypy to catch type mismatches before you run the code. They are valuable documentation even without runtime enforcement.

⚠ Try Again
Not quite. NamedTuple does not convert types. If you pass age="28", it stays a string. Pydantic is the container that handles automatic type conversion.

⚠ Try Again
Not quite. Type hints are valuable for IDE support and static analysis, even though Python does not enforce them at runtime.

← Previous

Complete & Continue →

Exercise: Fix a Type Bug

ScenarioA sensor monitoring system adjusts temperature readings by a calibration factor of 2. A faulty sensor sends its reading as a string. The code runs without error, but one sensor’s adjusted value is wrong.TaskFix the readings list so that all adjusted temperatures are calculated correctly.

ZnJvbSB0eXBpbmcgaW1wb3J0IE5hbWVkVHVwbGUKCmNsYXNzIFNlbnNvclJlYWRpbmcoTmFtZWRUdXBsZSk6CiAgICBzZW5zb3JfaWQ6IHN0cgogICAgdGVtcGVyYXR1cmU6IGZsb2F0CgpyZWFkaW5ncyA9IFsKICAgIFNlbnNvclJlYWRpbmcoIlMxIiwgOTguNSksCiAgICBTZW5zb3JSZWFkaW5nKCJTMiIsICIxNSIpLAogICAgU2Vuc29yUmVhZGluZygiUzMiLCAyMi4xKSwKXQoKZm9yIHIgaW4gcmVhZGluZ3M6CiAgICBhZGp1c3RlZCA9IHIudGVtcGVyYXR1cmUgKiAyCiAgICBwcmludChmIntyLnNlbnNvcl9pZH06IHthZGp1c3RlZH0iKQ==

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Creating a dataclass
A dataclass is a class decorator that automatically generates __init__, __repr__, and other methods from field definitions. It provides the same fixed fields and IDE support as NamedTuple, plus:

Mutable fields: Change values after creation, unlike NamedTuple
Default values: Fields can have defaults, including empty lists and dicts
Post-init logic: Run custom code right after an object is created

Creating a dataclass looks similar to NamedTuple, but you use the @dataclass decorator instead of inheriting:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKCgpjdXN0b21lciA9IEN1c3RvbWVyKAogICAgY3VzdG9tZXJfaWQ9IkMwMDEiLAogICAgbmFtZT0iQWxpY2UgU21pdGgiLAogICAgZW1haWw9ImFsaWNlQGV4YW1wbGUuY29tIiwKICAgIGFnZT0yOCwKKQoKcHJpbnQoY3VzdG9tZXIp

Output

Loading Python…

💡 What the output shows
The output matches the NamedTuple format. Both give you named fields and readable printing. Where they differ is mutability and default handling, which the next sections cover.

Quiz

What happens if you try to create Customer(customer_id="C001", nmae="Alice", email="a@b.com", age=28)?

A
The object is created with a nmae field instead of name

B
Python raises a TypeError because nmae is not a declared field

C
The nmae value is silently ignored

⚠ Try Again
Not quite. Unlike a dict that accepts any key, dataclass only accepts the fields you declared. Unknown field names are rejected.

💡 Correct
Correct! Like NamedTuple, dataclass only accepts its declared fields. Passing nmae instead of name raises a TypeError immediately.

⚠ Try Again
Not quite. Dataclass does not silently ignore unknown fields. It raises a TypeError because nmae is not in the field definitions.

← Previous

Complete & Continue →

Exercise: Build a Product Record

ScenarioAn inventory system receives product data as separate variables from a database query. You need to structure each product as a dataclass for type-safe access throughout the codebase.TaskDefine a Product dataclass with fields sku (str), name (str), price (float), and in_stock (bool). Create a product and print its formatted summary.

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgojIERlZmluZSB0aGUgUHJvZHVjdCBkYXRhY2xhc3MgYmVsb3cKIyBGaWVsZHM6IHNrdSAoc3RyKSwgbmFtZSAoc3RyKSwgcHJpY2UgKGZsb2F0KSwgaW5fc3RvY2sgKGJvb2wpCgoKcHJvZHVjdCA9IFByb2R1Y3QoCiAgICBza3U9IldILTc4MjEiLAogICAgbmFtZT0iVVNCLUMgQ2FibGUiLAogICAgcHJpY2U9MTIuOTksCiAgICBpbl9zdG9jaz1UcnVlLAopCgpzdGF0dXMgPSAiQXZhaWxhYmxlIiBpZiBwcm9kdWN0LmluX3N0b2NrIGVsc2UgIk91dCBvZiBzdG9jayIKcHJpbnQoZiJ7cHJvZHVjdC5uYW1lfSAoe3Byb2R1Y3Quc2t1fSk6ICR7cHJvZHVjdC5wcmljZX0gLSB7c3RhdHVzfSIp

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Mutability Allows Updates
Dataclass trades NamedTuple’s immutability protection for flexibility. You can modify fields after creation:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPSJBbGljZSBTbWl0aCIsCiAgICBlbWFpbD0iYWxpY2VAZXhhbXBsZS5jb20iLAogICAgYWdlPTI4LAopCgpjdXN0b21lci5uYW1lID0gIkFsaWNlIEpvaG5zb24iICAjIENoYW5nZWQgYWZ0ZXIgbWFycmlhZ2UKY3VzdG9tZXIuaXNfcHJlbWl1bSA9IFRydWUgICMgVXBncmFkZWQgdGhlaXIgYWNjb3VudAoKcHJpbnQoZiJ7Y3VzdG9tZXIubmFtZX0sIFByZW1pdW06IHtjdXN0b21lci5pc19wcmVtaXVtfSIp

Output

Loading Python…

💡 What the output shows
Unlike NamedTuple, dataclass allows field modification. This is useful for objects that need to change over time, like a customer upgrading their account.

To prevent accidentally adding new attributes, you can use @dataclass(slots=True), which creates a fixed set of attributes that cannot be changed:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcyhzbG90cz1UcnVlKQpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPSJBbGljZSIsCiAgICBlbWFpbD0iYWxpY2VAZXhhbXBsZS5jb20iLAogICAgYWdlPTI4LAopCgp0cnk6CiAgICBjdXN0b21lci5ubWFlID0gIkJvYiIgICMgVHlwbwpleGNlcHQgQXR0cmlidXRlRXJyb3IgYXMgZToKICAgIHByaW50KGYiQXR0cmlidXRlRXJyb3I6IHtlfSIp

Output

Loading Python…

💡 What the output shows
Without slots=True, the dataclass would silently create a new attribute nmae on the object. With slots, it raises an error immediately, catching the typo.

Quiz

What does @dataclass(slots=True) prevent?

A
Modifying existing fields like customer.name = "Bob"

B
Adding new attributes that were not declared in the class, like customer.nmae = "Bob"

C
Creating instances with missing fields

⚠ Try Again
Not quite. slots=True still allows modifying existing fields. It only prevents adding new attributes that were not part of the class definition.

💡 Correct
Correct! slots=True restricts the object to only the declared fields. Typos like customer.nmae raise an AttributeError instead of silently creating a new attribute.

⚠ Try Again
Not quite. Missing fields are already caught by the generated __init__ method, with or without slots=True. Slots specifically prevent adding undeclared attributes.

← Previous

Complete & Continue →

Mutable Defaults with default_factory
Remember the shared list problem from NamedTuple? Dataclass prevents this by rejecting mutable defaults entirely:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgp0cnk6CiAgICBAZGF0YWNsYXNzCiAgICBjbGFzcyBPcmRlcjoKICAgICAgICBpdGVtczogbGlzdCA9IFtdCmV4Y2VwdCBWYWx1ZUVycm9yIGFzIGU6CiAgICBwcmludChmIlZhbHVlRXJyb3I6IHtlfSIp

Output

Loading Python…

💡 What the output shows
Dataclass raises a ValueError instead of silently sharing the list. It forces you to use field(default_factory=…), which creates a new list for each instance.

Dataclass offers field(default_factory=…) as the solution. The factory function runs at instance creation, so each object gets its own list:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzLCBmaWVsZAoKCkBkYXRhY2xhc3MKY2xhc3MgT3JkZXI6CiAgICBvcmRlcl9pZDogc3RyCiAgICBpdGVtczogbGlzdCA9IGZpZWxkKGRlZmF1bHRfZmFjdG9yeT1saXN0KSAgIyBFYWNoIGluc3RhbmNlIGdldHMgaXRzIG93biBsaXN0CgoKb3JkZXIxID0gT3JkZXIoIjAwMSIpCm9yZGVyMiA9IE9yZGVyKCIwMDIiKQoKb3JkZXIxLml0ZW1zLmFwcGVuZCgiYXBwbGUiKQpwcmludChmIk9yZGVyIDE6IHtvcmRlcjEuaXRlbXN9IikKcHJpbnQoZiJPcmRlciAyOiB7b3JkZXIyLml0ZW1zfSIpICAjIE5vdCBhZmZlY3RlZCBieSBvcmRlcjE=

Output

Loading Python…

💡 What the output shows
Unlike the NamedTuple example, Order 2 stays empty because default_factory creates a fresh list for each instance. This is the safe way to use mutable defaults.

To see why this works, compare what happens at creation versus after appending:


config:
theme: dark
layout: dagre
look: neo

flowchart TD
subgraph After order1.items.append
o1b[order1.items] –> list1b[“[‘apple’]”]
o2b[order2.items] –> list2b[“[ ]”]
end

subgraph At creation
o1a[order1.items] –> list1a[“[ ]”]
o2a[order2.items] –> list2a[“[ ]”]
end

Quiz

Which of these dataclass fields requires field(default_factory=…)?

A
name: str = "unknown"

B
tags: list = field(default_factory=list)

C
is_active: bool = True

⚠ Try Again
Not quite. Strings are immutable in Python, so "unknown" is safe as a direct default. Each instance gets the same string object, but since it cannot be modified, sharing it causes no problems.

💡 Correct
Correct! Lists are mutable, so a direct default like tags: list = [] would be shared across instances. default_factory=list creates a fresh list for each instance.

⚠ Try Again
Not quite. Booleans are immutable, so True is safe as a direct default. Only mutable types like list, dict, and set need default_factory.

← Previous

Complete & Continue →

Exercise: Build a Shopping Cart

ScenarioAn e-commerce system creates a shopping cart for each customer. Each cart needs its own independent list of items so that adding to one cart doesn’t affect another.TaskDefine a Cart dataclass where items defaults to an empty list using default_factory. Add items to one cart and verify the other stays empty.

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzLCBmaWVsZAoKIyBEZWZpbmUgQ2FydCBkYXRhY2xhc3MKIyBGaWVsZHM6IGN1c3RvbWVyIChzdHIpLCBpdGVtcyAobGlzdCwgZGVmYXVsdCBlbXB0eSkKCgpjYXJ0MSA9IENhcnQoY3VzdG9tZXI9IkFsaWNlIikKY2FydDIgPSBDYXJ0KGN1c3RvbWVyPSJCb2IiKQoKY2FydDEuaXRlbXMuYXBwZW5kKCJMYXB0b3AiKQpjYXJ0MS5pdGVtcy5hcHBlbmQoIk1vdXNlIikKCnByaW50KGYiQWxpY2U6IHtjYXJ0MS5pdGVtc30iKQpwcmludChmIkJvYjoge2NhcnQyLml0ZW1zfSIp

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Post-Init Validation with __post_init__
Dataclass accepts any value that matches the type signature, so invalid data like empty names or negative ages passes through without warning:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPSIiLCAgIyBFbXB0eSBuYW1lCiAgICBlbWFpbD0iaW52YWxpZCIsCiAgICBhZ2U9LTEwMCwKKQpwcmludChmIkNyZWF0ZWQ6IHtjdXN0b21lcn0iKSAgIyBObyBlcnJvciAtIGJhZCBkYXRhIGlzIGluIHlvdXIgc3lzdGVt

Output

Loading Python…

💡 What the output shows
An empty name, invalid email, and negative age all pass through without any error. The bad data is now in your system, potentially corrupting downstream operations.

To catch these issues early, dataclass provides a special method called __post_init__ that runs automatically after __init__ finishes. You can add validation logic here to reject bad values at creation time:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKICAgIGRlZiBfX3Bvc3RfaW5pdF9fKHNlbGYpOgogICAgICAgIGlmIHNlbGYuYWdlIDwgMDoKICAgICAgICAgICAgcmFpc2UgVmFsdWVFcnJvcihmIkFnZSBjYW5ub3QgYmUgbmVnYXRpdmU6IHtzZWxmLmFnZX0iKQogICAgICAgIGlmICJAIiBub3QgaW4gc2VsZi5lbWFpbDoKICAgICAgICAgICAgcmFpc2UgVmFsdWVFcnJvcihmIkludmFsaWQgZW1haWw6IHtzZWxmLmVtYWlsfSIpCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICAgICAgbmFtZT0iQWxpY2UiLAogICAgICAgIGVtYWlsPSJhbGljZS1hdC1lbWFpbCIsCiAgICAgICAgYWdlPTI4LAogICAgKQpleGNlcHQgVmFsdWVFcnJvciBhcyBlOgogICAgcHJpbnQoZiJWYWx1ZUVycm9yOiB7ZX0iKQ==

Output

Loading Python…

💡 What the output shows
The error fires at object creation, not later when you try to send an email. This means invalid data never enters your system in the first place.

Quiz

A dataclass has __post_init__ that validates email and age. You pass a valid email but age=-5. What happens?

A
The object is created with age=-5 and validation runs later when you use the field

B
The object is never created. __post_init__ raises a ValueError during construction.

C
The object is created but age is automatically set to 0

⚠ Try Again
Not quite. __post_init__ runs during construction, not when you access fields. The validation happens immediately.

💡 Correct
Correct! __post_init__ runs right after __init__. Since age=-5 fails the check, a ValueError is raised and the object is never returned to the caller.

⚠ Try Again
Not quite. __post_init__ does not auto-correct values. It either lets the object through or raises an error. Any correction logic must be written explicitly.

← Previous

Complete & Continue →

Limitations: Manual Validation Only
__post_init__ requires you to write every validation rule yourself. If you forget to check a field, bad data can still slip through:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKICAgIGRlZiBfX3Bvc3RfaW5pdF9fKHNlbGYpOgogICAgICAgIGlmICJAIiBub3QgaW4gc2VsZi5lbWFpbDoKICAgICAgICAgICAgcmFpc2UgVmFsdWVFcnJvcihmIkludmFsaWQgZW1haWw6IHtzZWxmLmVtYWlsfSIpCgoKIyBXcm9uZyB0eXBlcyBwYXNzIGJlY2F1c2UgX19wb3N0X2luaXRfXyBvbmx5IGNoZWNrcyBlbWFpbCBmb3JtYXQKY3VzdG9tZXIgPSBDdXN0b21lcigKICAgIGN1c3RvbWVyX2lkPSJDMDAxIiwKICAgIG5hbWU9MTIzLCAgIyBObyB2YWxpZGF0aW9uIGZvciBuYW1lIHR5cGUKICAgIGVtYWlsPSJhbGljZUBleGFtcGxlLmNvbSIsCiAgICBhZ2U9InR3ZW50eS1laWdodCIsICAjIE5vIHZhbGlkYXRpb24gZm9yIGFnZSB0eXBlCikKCnByaW50KGYiTmFtZToge2N1c3RvbWVyLm5hbWV9LCBBZ2U6IHtjdXN0b21lci5hZ2V9Iik=

Output

Loading Python…

💡 What the output shows
The name is an integer and the age is a string, yet dataclass accepted both. Type hints do not enforce types at runtime, so any validation you need must be written manually in __post_init__.

← Previous

Complete & Continue →

Limitations: Nested Validation
Most real data is nested: customers have addresses, orders have items. With dataclass, error messages don’t tell you where in the structure the problem occurred:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCmltcG9ydCByZQoKCkBkYXRhY2xhc3MKY2xhc3MgQWRkcmVzczoKICAgIHN0cmVldDogc3RyCiAgICBjaXR5OiBzdHIKICAgIHppcF9jb2RlOiBzdHIKCiAgICBkZWYgX19wb3N0X2luaXRfXyhzZWxmKToKICAgICAgICBpZiBub3QgcmUubWF0Y2gociJeXGR7NX0kIiwgc2VsZi56aXBfY29kZSk6CiAgICAgICAgICAgIHJhaXNlIFZhbHVlRXJyb3IoZiJJbnZhbGlkIHppcDoge3NlbGYuemlwX2NvZGV9IikKCgpAZGF0YWNsYXNzCmNsYXNzIEN1c3RvbWVyOgogICAgY3VzdG9tZXJfaWQ6IHN0cgogICAgbmFtZTogc3RyCiAgICBhZGRyZXNzOiBBZGRyZXNzCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICAgICAgbmFtZT0iQWxpY2UgU21pdGgiLAogICAgICAgIGFkZHJlc3M9QWRkcmVzcyhzdHJlZXQ9IjEyMyBNYWluIFN0IiwgY2l0eT0iTmV3IFlvcmsiLCB6aXBfY29kZT0iOUFCQzEiKSwKICAgICkKZXhjZXB0IFZhbHVlRXJyb3IgYXMgZToKICAgIHByaW50KGUp

Output

Loading Python…

💡 What the output shows
The error says “Invalid zip: 9ABC1” but doesn’t tell you it came from address.zip_code. In a deeply nested structure with multiple zip codes, you wouldn’t know which one failed.

Quiz

You pass address={"street": "123 Main St", "city": "NY", "zip_code": "10001"} to a dataclass Customer that expects address: Address. What happens?

A
Dataclass converts the dict to an Address object automatically

B
The dict is stored as-is without conversion or validation

C
Python raises a TypeError because a dict is not an Address

⚠ Try Again
Not quite. Dataclass does not convert types. You would need to write manual conversion in __post_init__ or pass Address(…) directly.

💡 Correct
Correct! Dataclass stores whatever you pass without checking types. The dict is accepted even though the type hint says Address. This means customer.address.city would raise an AttributeError later.

⚠ Try Again
Not quite. Dataclass does not enforce type hints at runtime. It accepts the dict silently, which can cause errors later when you try to access attributes.

← Previous

Complete & Continue →

Getting Started
So far, every container has treated type hints as documentation only. Pydantic is a third-party validation library that changes this. It checks types at runtime and raises clear errors when values don’t match.

To install Pydantic, run:

pip install pydantic

This course uses Pydantic 2.12.

Let’s verify the installation:

Python

Run

aW1wb3J0IHB5ZGFudGljCgpwcmludChmIlB5ZGFudGljIHZlcnNpb246IHtweWRhbnRpYy5fX3ZlcnNpb25fX30iKQpwcmludCgiSW5zdGFsbGF0aW9uIHN1Y2Nlc3NmdWwhIik=

Output

Loading Python…

← Previous

Complete & Continue →

Creating a Pydantic Model
Creating a Pydantic model looks similar to dataclass and NamedTuple. To create a Pydantic model, inherit from BaseModel and declare your fields:

Python

Run

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsCgoKY2xhc3MgQ3VzdG9tZXIoQmFzZU1vZGVsKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPSJBbGljZSBTbWl0aCIsCiAgICBlbWFpbD0iYWxpY2VAZXhhbXBsZS5jb20iLAogICAgYWdlPTI4LAopCgpwcmludChmIntjdXN0b21lci5uYW1lfSwgQWdlOiB7Y3VzdG9tZXIuYWdlfSIp

Output

Loading Python…

💡 What the output shows
The syntax looks similar to dataclass, but Pydantic validates types automatically when you create the object. You’ll see the difference in the next section.

← Previous

Complete & Continue →

Runtime Validation
Remember how dataclass accepted name=123 without complaint? Pydantic catches this automatically:

Python

Run

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsLCBWYWxpZGF0aW9uRXJyb3IKCgpjbGFzcyBDdXN0b21lcihCYXNlTW9kZWwpOgogICAgY3VzdG9tZXJfaWQ6IHN0cgogICAgbmFtZTogc3RyCiAgICBlbWFpbDogc3RyCiAgICBhZ2U6IGludAogICAgaXNfcHJlbWl1bTogYm9vbCA9IEZhbHNlCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICAgICAgbmFtZT0xMjMsCiAgICAgICAgZW1haWw9ImFsaWNlQGV4YW1wbGUuY29tIiwKICAgICAgICBhZ2U9InRoaXJ0eSIsCiAgICApCmV4Y2VwdCBWYWxpZGF0aW9uRXJyb3IgYXMgZToKICAgIHByaW50KGUp

Output

Loading Python…

💡 What the output shows
Pydantic reports all validation failures at once: name should be a string (got int 123) and age should be a valid integer (got string 'thirty'). This saves you from fixing one error, rerunning, and discovering another.

Quiz

How does Pydantic know that name=123 is invalid without any custom validation code?

A
Pydantic reads the type hint name: str and enforces it at runtime automatically

B
Pydantic uses the same __post_init__ mechanism as dataclass

C
Pydantic relies on Python’s built-in type checking

💡 Correct
Correct! Pydantic reads type hints and enforces them at runtime. With dataclass, name: str is just documentation. With Pydantic, it’s a rule that gets checked every time you create an object.

⚠ Try Again
Not quite. Pydantic uses its own validation engine, not __post_init__. Type enforcement is built into the BaseModel class itself.

⚠ Try Again
Not quite. Python does not enforce type hints at runtime. Pydantic adds this enforcement through its own validation layer on top of Python’s type system.

← Previous

Complete & Continue →

Exercise: Validate Signup Data

ScenarioA registration endpoint receives user signup data. Some entries have the wrong types: age as a non-numeric string and name as an integer. You need the model to catch all type errors at once.TaskDefine a UserSignup model that validates username (str), email (str), and age (int). Create a user with invalid data and print the validation errors.

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsLCBWYWxpZGF0aW9uRXJyb3IKCiMgRGVmaW5lIHRoZSBVc2VyU2lnbnVwIG1vZGVsIGJlbG93CiMgRmllbGRzOiB1c2VybmFtZSAoc3RyKSwgZW1haWwgKHN0ciksIGFnZSAoaW50KQoKCnRyeToKICAgIHVzZXIgPSBVc2VyU2lnbnVwKAogICAgICAgIHVzZXJuYW1lPTQyLAogICAgICAgIGVtYWlsPSJhbGljZUBleGFtcGxlLmNvbSIsCiAgICAgICAgYWdlPSJub3QtYS1udW1iZXIiLAogICAgKQogICAgcHJpbnQoZiJDcmVhdGVkOiB7dXNlci51c2VybmFtZX0sIHt1c2VyLmVtYWlsfSwgYWdlIHt1c2VyLmFnZX0iKQpleGNlcHQgVmFsaWRhdGlvbkVycm9yIGFzIGU6CiAgICBmb3IgZXJyb3IgaW4gZS5lcnJvcnMoKToKICAgICAgICBwcmludChmIntlcnJvclsnbG9jJ11bMF19OiB7ZXJyb3JbJ3R5cGUnXX0iKQ==

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Type Coercion
Unlike dataclass which stores whatever you pass, Pydantic automatically converts compatible types:

Python

Run

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsCgoKY2xhc3MgQ3VzdG9tZXIoQmFzZU1vZGVsKToKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKCmN1c3RvbWVyID0gQ3VzdG9tZXIoCiAgICBjdXN0b21lcl9pZD0iQzAwMSIsCiAgICBuYW1lPSJBbGljZSBTbWl0aCIsCiAgICBlbWFpbD0iYWxpY2VAZXhhbXBsZS5jb20iLAogICAgYWdlPSIyOCIsICAjIFN0cmluZyAiMjgiIGlzIGNvbnZlcnRlZCB0byBpbnQgMjgKICAgIGlzX3ByZW1pdW09InRydWUiLCAgIyBTdHJpbmcgInRydWUiIGlzIGNvbnZlcnRlZCB0byBib29sIFRydWUKKQoKcHJpbnQoZiJBZ2U6IHtjdXN0b21lci5hZ2V9ICh0eXBlOiB7dHlwZShjdXN0b21lci5hZ2UpLl9fbmFtZV9ffSkiKQpwcmludChmIlByZW1pdW06IHtjdXN0b21lci5pc19wcmVtaXVtfSAodHlwZToge3R5cGUoY3VzdG9tZXIuaXNfcHJlbWl1bSkuX19uYW1lX199KSIp

Output

Loading Python…

💡 What the output shows
The string "28" was converted to integer 28, and "true" was converted to boolean True. This is useful when reading data from CSV files or APIs where everything comes as strings.

Quiz

You pass age="twenty-eight" to a Pydantic model with age: int. What happens?

A
Pydantic converts it to 28 using natural language parsing

B
Pydantic raises a ValidationError because "twenty-eight" cannot be parsed as an integer

C
Pydantic stores it as a string since conversion failed

⚠ Try Again
Not quite. Pydantic only converts values that Python can parse directly, like "28" to 28. It does not interpret natural language.

💡 Correct
Correct! Pydantic tries to convert "twenty-eight" to an integer, fails, and raises a ValidationError. Coercion only works for values that can be directly parsed, like "28" or "3.14".

⚠ Try Again
Not quite. Pydantic does not silently fall back to storing the original value. If conversion fails, it raises a ValidationError.

← Previous

Complete & Continue →

Constraint Validation
Beyond types, you often need business rules: age must be positive, names can’t be empty, customer IDs must follow a pattern.

In dataclass, you define fields in one place and validate them in __post_init__. But raise stops at the first error, so you only learn about one problem at a time:

Python

Run

ZnJvbSBkYXRhY2xhc3NlcyBpbXBvcnQgZGF0YWNsYXNzCgoKQGRhdGFjbGFzcwpjbGFzcyBDdXN0b21lcjoKICAgIGN1c3RvbWVyX2lkOiBzdHIKICAgIG5hbWU6IHN0cgogICAgZW1haWw6IHN0cgogICAgYWdlOiBpbnQKICAgIGlzX3ByZW1pdW06IGJvb2wgPSBGYWxzZQoKICAgIGRlZiBfX3Bvc3RfaW5pdF9fKHNlbGYpOgogICAgICAgIGlmIG5vdCBzZWxmLmN1c3RvbWVyX2lkOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKCJDdXN0b21lciBJRCBjYW5ub3QgYmUgZW1wdHkiKQogICAgICAgIGlmIG5vdCBzZWxmLm5hbWUgb3IgbGVuKHNlbGYubmFtZSkgPCAxOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKCJOYW1lIGNhbm5vdCBiZSBlbXB0eSIpCiAgICAgICAgaWYgIkAiIG5vdCBpbiBzZWxmLmVtYWlsOgogICAgICAgICAgICByYWlzZSBWYWx1ZUVycm9yKGYiSW52YWxpZCBlbWFpbDoge3NlbGYuZW1haWx9IikKICAgICAgICBpZiBzZWxmLmFnZSA8IDAgb3Igc2VsZi5hZ2UgPiAxNTA6CiAgICAgICAgICAgIHJhaXNlIFZhbHVlRXJyb3IoZiJBZ2UgbXVzdCBiZSBiZXR3ZWVuIDAgYW5kIDE1MDoge3NlbGYuYWdlfSIpCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iIiwgICMgRW1wdHkgSUQKICAgICAgICBuYW1lPSIiLCAgIyBFbXB0eSBuYW1lCiAgICAgICAgZW1haWw9ImludmFsaWQiLCAgIyBNaXNzaW5nIEAKICAgICAgICBhZ2U9LTUsICAjIE5lZ2F0aXZlIGFnZQogICAgKQpleGNlcHQgVmFsdWVFcnJvciBhcyBlOgogICAgcHJpbnQoZSkgICMgT25seSByZXBvcnRzIHRoZSBmaXJzdCB2aW9sYXRpb24=

Output

Loading Python…

💡 What the output shows
Four fields need four if blocks, four raise calls, and four hand-written messages. That’s a lot of boilerplate for simple rules like “name can’t be empty.”
Worse, raise halts at the first failure, so you only learn about "Customer ID cannot be empty" even though three other fields are also invalid.

Pydantic puts constraints directly in Field(), keeping rules next to the data they validate:

Python

Run

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsLCBGaWVsZCwgVmFsaWRhdGlvbkVycm9yCgoKY2xhc3MgQ3VzdG9tZXIoQmFzZU1vZGVsKToKICAgIGN1c3RvbWVyX2lkOiBzdHIgPSBGaWVsZChtaW5fbGVuZ3RoPTEpCiAgICBuYW1lOiBzdHIgPSBGaWVsZChtaW5fbGVuZ3RoPTEpCiAgICBlbWFpbDogc3RyID0gRmllbGQocGF0dGVybj1yIi4rQC4rIikKICAgIGFnZTogaW50ID0gRmllbGQoZ2U9MCwgbGU9MTUwKQogICAgaXNfcHJlbWl1bTogYm9vbCA9IEZhbHNlCgoKdHJ5OgogICAgY3VzdG9tZXIgPSBDdXN0b21lcigKICAgICAgICBjdXN0b21lcl9pZD0iIiwgICMgRW1wdHkgSUQKICAgICAgICBuYW1lPSIiLCAgIyBFbXB0eSBuYW1lCiAgICAgICAgZW1haWw9ImludmFsaWQiLCAgIyBNaXNzaW5nIEAKICAgICAgICBhZ2U9LTUsICAjIE5lZ2F0aXZlIGFnZQogICAgKQpleGNlcHQgVmFsaWRhdGlvbkVycm9yIGFzIGU6CiAgICBwcmludChlKQ==

Output

Loading Python…

💡 What the output shows
The syntax is minimal: Field(min_length=1) and Field(ge=0, le=150) replace entire if blocks and hand-written error messages. Pydantic also checks every field in one pass, so all four violations surface together instead of one at a time.

Here are the most common Field() constraints:

Constraint
Type
Meaning

gt, ge
numeric
Greater than / greater than or equal

lt, le
numeric
Less than / less than or equal

multiple_of
numeric
Value must be divisible by this number

min_length, max_length
str, list
Minimum / maximum length

pattern
str
Must match a regex pattern

See the full list of Field parameters in the Pydantic docs.

Quiz

You pass name="", age=-5, and email="bad" to a Pydantic model with Field(min_length=1) on name, Field(ge=0) on age, and email validation. How many errors do you get?

A
One error for the first invalid field

B
Three errors, all reported in a single ValidationError

C
Three separate exceptions raised one after another

⚠ Try Again
Not quite. That’s how dataclass __post_init__ works, stopping at the first failure. Pydantic checks all fields.

💡 Correct
Correct! Pydantic validates every field and collects all failures into one ValidationError. Each violation is listed with its field name and the constraint that was broken.

⚠ Try Again
Not quite. Pydantic bundles all violations into a single ValidationError. You handle one exception that contains all the details.

← Previous

Complete & Continue →

Exercise: Validate a Job Posting

ScenarioA job board receives postings from employers. Each posting must have a non-empty title, a salary between 30,000 and 500,000, and a non-empty company name. Invalid postings should be rejected with all errors at once.TaskAdd Field() constraints to the JobPosting model so that invalid data is caught. Fix the constraints so the test case raises validation errors. 💡 Hint Useful Field() constraints: gt, ge, lt, le for numbers, min_length and max_length for strings.

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsLCBGaWVsZCwgVmFsaWRhdGlvbkVycm9yCgojIEFkZCBGaWVsZCgpIGNvbnN0cmFpbnRzIHRvIHJlamVjdCBpbnZhbGlkIGRhdGEKY2xhc3MgSm9iUG9zdGluZyhCYXNlTW9kZWwpOgogICAgdGl0bGU6IHN0cgogICAgY29tcGFueTogc3RyCiAgICBzYWxhcnk6IGludAoKCnRyeToKICAgIHBvc3RpbmcgPSBKb2JQb3N0aW5nKAogICAgICAgIHRpdGxlPSIiLAogICAgICAgIGNvbXBhbnk9IiIsCiAgICAgICAgc2FsYXJ5PTEwLAogICAgKQogICAgcHJpbnQoZiJDcmVhdGVkOiB7cG9zdGluZy50aXRsZX0gYXQge3Bvc3RpbmcuY29tcGFueX0sICR7cG9zdGluZy5zYWxhcnl9IikKZXhjZXB0IFZhbGlkYXRpb25FcnJvciBhcyBlOgogICAgZm9yIGVycm9yIGluIGUuZXJyb3JzKCk6CiAgICAgICAgcHJpbnQoZiJ7ZXJyb3JbJ2xvYyddWzBdfToge2Vycm9yWyd0eXBlJ119Iik=

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Nested Validation
In the dataclass example, the error only said “Invalid zip: 9ABC1” with no way to trace it back to address.zip_code. Pydantic fixes this by reporting the full path to each error:

Python

Run

ZnJvbSBweWRhbnRpYyBpbXBvcnQgQmFzZU1vZGVsLCBGaWVsZCwgVmFsaWRhdGlvbkVycm9yCgoKY2xhc3MgQWRkcmVzcyhCYXNlTW9kZWwpOgogICAgc3RyZWV0OiBzdHIKICAgIGNpdHk6IHN0cgogICAgemlwX2NvZGU6IHN0ciA9IEZpZWxkKHBhdHRlcm49ciJeXGR7NX0kIikgICMgTXVzdCBiZSA1IGRpZ2l0cwoKCmNsYXNzIEN1c3RvbWVyKEJhc2VNb2RlbCk6CiAgICBjdXN0b21lcl9pZDogc3RyCiAgICBuYW1lOiBzdHIKICAgIGFkZHJlc3M6IEFkZHJlc3MKCgp0cnk6CiAgICBjdXN0b21lciA9IEN1c3RvbWVyKAogICAgICAgIGN1c3RvbWVyX2lkPSJDMDAxIiwKICAgICAgICBuYW1lPSJBbGljZSBTbWl0aCIsCiAgICAgICAgYWRkcmVzcz17CiAgICAgICAgICAgICJzdHJlZXQiOiAiMTIzIE1haW4gU3QiLAogICAgICAgICAgICAiY2l0eSI6ICJOZXcgWW9yayIsCiAgICAgICAgICAgICJ6aXBfY29kZSI6ICI5QUJDMSIsICAjIEludmFsaWQgemlwIGNvZGUKICAgICAgICB9LAogICAgKQpleGNlcHQgVmFsaWRhdGlvbkVycm9yIGFzIGU6CiAgICBwcmludChlKQ==

Output

Loading Python…

💡 What the output shows
Unlike the dataclass error, Pydantic points directly to address.zip_code. In a structure with multiple addresses or zip codes, you can trace the problem immediately.

Quiz

In the Pydantic example, address is passed as a plain dict, not an Address(…) object. What does Pydantic do with it?

A
Stores the dict as-is, like dataclass would

B
Converts the dict into an Address model and validates all its fields automatically

C
Raises an error because a dict is not an Address object

⚠ Try Again
Not quite. That’s what dataclass does. Pydantic recognizes that the dict matches the Address model’s fields and converts it automatically.

💡 Correct
Correct! Pydantic converts the raw dict into an Address model, then validates each field against its type hints and constraints. This is why you can pass nested data from JSON or APIs without manual conversion.

⚠ Try Again
Not quite. Pydantic accepts dicts for nested models and converts them automatically. This is one of the key differences from dataclass, which stores the dict without conversion.

← Previous

Complete & Continue →

Key Takeaways
Here’s what each tool provides:

dict: Quick to create, but silent failures from typos, missing keys, and wrong types make bugs hard to trace.
NamedTuple: Catches typos at creation and provides immutability, but does not enforce types at runtime and shares mutable defaults.
dataclass: Rejects mutable defaults with default_factory and supports validation via __post_init__, but errors are reported one at a time with no nesting path.
Pydantic: Enforces types at runtime, catches all validation errors at once, and reports the full path through nested structures like address.zip_code.

← Previous

Complete Course

×
Course Complete!
Nice work finishing this course. Ready to go deeper? Check out these courses with hands-on exercises:


DuckDB for Data Scientists
Query CSV, Parquet, and databases with SQL. No server needed.


Entity Extraction with spaCy and LLMs
Extract names, dates, and custom entities from text.

Browse all courses →

Python Data Modeling with Dataclasses and Pydantic Read More »

Entity Extraction with spaCy and LLMs

/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}

body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}

/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}

/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}

.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}

.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}

.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}

.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}

.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}

/* Navigation */
.nav-section {
margin-bottom: 1rem;
}

.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}

.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}

.nav-item:hover {
background: #3d3b3c;
color: #fff;
}

.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}

.nav-item.completed .status-icon {
color: #72BEFA;
}

.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}

.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}

.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}

/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}

.lesson {
display: none;
}

.lesson.active {
display: block;
}

.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}

.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}

.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.lesson p {
color: #ccc;
margin-bottom: 1rem;
}

.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}

.lesson li {
margin-bottom: 0.5rem;
}

.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}

.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}

.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}

/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}

.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}

.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}

.callout-tip .callout-title {
color: #72BEFA;
}

.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}

.callout-note .callout-title {
color: #72FCDB;
}

.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}

.callout-warning .callout-title {
color: #E583B6;
}

.callout a {
color: #fff;
text-decoration: underline;
}

.callout a:hover {
color: #72FCDB;
}

/* Collapsible callouts */
details.callout {
cursor: pointer;
}

details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}

details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}

details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}

details.callout summary.callout-title::-webkit-details-marker {
display: none;
}

details.callout > p {
margin-top: 0.75rem;
}

.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}

.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}

/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}

.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}

/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}

/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}

.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}

.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}

.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}

.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}

.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}

.quiz-option:disabled {
cursor: default;
}

.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}

.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}

.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}

.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}

.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}

.option-content {
display: block;
flex: 1;
color: #ccc;
}

.option-content code {
background: #282a36;
padding: 0.15rem 0.4rem;
border-radius: 4px;
font-size: 0.85rem;
color: #f8f8f2;
}

.code-option code {
display: block;
padding: 0.5rem 0.75rem;
}

.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}

.quiz-feedback .callout {
margin: 0;
}

/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}

.codecut-run-btn:hover {
background: #5aa8e8;
}

.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}

.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}

.codecut-editor textarea,
.exercise-editor textarea {
display: none;
}

/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}

.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}

.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror-scroll {
min-height: auto;
}

/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}

.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}

.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}

.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}

.CodeMirror-sizer {
margin-left: 40px !important;
}

.CodeMirror-cursor {
border-left-color: #72BEFA;
}

.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}

.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}

/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}

.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}

.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}

.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}

.course-image {
max-width: 100%;
height: auto;
border-radius: 4px;
display: block;
margin: 1em 0;
}

pre.mermaid {
text-align: center;
background: transparent;
border: none;
padding: 1em 0;
margin: 1em 0;
}

pre.mermaid svg {
background: transparent !important;
}

.codecut-output img {
max-width: 100%;
height: auto;
border-radius: 4px;
}

.codecut-output.has-image {
max-height: none;
white-space: normal;
}

.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }

.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}

@keyframes spin {
to { transform: rotate(360deg); }
}

/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.exercise-split {
display: flex;
flex-direction: column;
}

.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}

.exercise-assignment p {
margin: 0;
}

.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-section {
flex: 1;
min-width: 200px;
}

.exercise-heading + p {
margin-top: 0;
}

.exercise-assignment em {
color: #ffffff;
font-style: italic;
}

.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}

.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}

.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}

.exercise-secret:last-child {
margin-bottom: 0;
}

.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}

.exercise-secret input:focus {
border-color: #72BEFA;
}

.exercise-secret input::placeholder {
color: #666;
}

.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}

.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}

.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}

.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}

.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}

.exercise-btn:hover {
background: #4d4b4c;
}

.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}

.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}

.exercise-btn.primary:hover {
background: #5aa8e8;
}

.exercise-btn.primary:disabled {
background: #666;
}

.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}

.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}

.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }

.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}

.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}

.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}

/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}

.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}

.lesson-nav-btn:hover {
background: #4a4849;
}

.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}

.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}

/* Completion modal */
.completion-overlay {
display: none;
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.7);
z-index: 1000;
align-items: center;
justify-content: center;
padding: 1rem;
}

.completion-modal {
background: #2F2D2E;
border: 1px solid #4a4849;
border-radius: 16px;
max-width: 520px;
width: 100%;
padding: 2.5rem;
text-align: center;
position: relative;
}

.completion-modal-close {
position: absolute;
top: 1rem;
right: 1rem;
background: none;
border: none;
color: #999;
font-size: 1.25rem;
cursor: pointer;
padding: 0.25rem;
line-height: 1;
}

.completion-modal-close:hover {
color: #fff;
}

.completion-modal h2 {
color: #72BEFA;
font-size: 1.5rem;
margin-bottom: 0.5rem;
}

.completion-modal p {
color: #ccc;
margin-bottom: 1.5rem;
font-size: 0.95rem;
line-height: 1.5;
}

.completion-courses {
display: flex;
flex-direction: column;
gap: 0.75rem;
margin-bottom: 1.5rem;
}

.completion-course-card {
display: block;
background: #3d3b3c;
border: 1px solid #4a4849;
border-radius: 10px;
padding: 1rem 1.25rem;
text-decoration: none;
text-align: left;
transition: border-color 0.2s;
}

.completion-course-card:hover {
border-color: #72BEFA;
}

.completion-course-card .card-title {
color: #72BEFA;
font-size: 0.95rem;
font-weight: 600;
margin-bottom: 0.25rem;
}

.completion-course-card .card-desc {
color: #999;
font-size: 0.8rem;
}

.completion-browse {
display: inline-block;
color: #E583B6;
font-size: 0.9rem;
text-decoration: none;
}

.completion-browse:hover {
text-decoration: underline;
}

/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}

.course-content {
margin-left: 0;
padding: 1.5rem;
}

.course-layout {
flex-direction: column;
}
}

Entity Extraction with spaCy and LLMs
0 of 17 completed

Getting Started


What is Entity Extraction?


Sample Document

The Manual Approach


Why Not Use Regex?

spaCy NER


Production-Grade Named Entity Recognition


Exercise: Build a Contact List


Extracting from Business Documents


Exercise: Export Contact List


Visualizing Entities with displaCy

GLiNER


Zero-Shot Custom Entity Extraction


Extracting Business Entities


Exercise: Parse Business Metrics


Using Confidence Scores for Quality Control


Exercise: Route Low-Confidence to Review

langextract


AI-Powered Extraction with Source Grounding


Exercise: Analyze Customer Feedback


Visualizing Extractions

Summary


When to Use Each Tool

What is Entity Extraction?
Entity extraction (also called Named Entity Recognition or NER) automatically identifies and classifies key information from unstructured text. For instance, financial reports contain company names, monetary figures, executives, dates, and locations used for competitive analysis and executive tracking.

Extracting these entities manually is time-consuming and error-prone. Automated entity extraction provides a faster and more reliable alternative.

In this course, you’ll learn three modern tools for entity extraction:

spaCy: Production-ready NER with pre-trained models
GLiNER: Zero-shot custom entity recognition
langextract: AI-powered extraction with source grounding

Complete & Continue →

Sample Document
Throughout this course, we’ll extract entities from this earnings report.

Press Run below to try it out.

Python

Run

ZWFybmluZ19yZXBvcnQgPSAiIiIKQXBwbGUgSW5jLiAoTkFTREFROiBBQVBMKSByZXBvcnRlZCB0aGlyZCBxdWFydGVyIHJldmVudWUgb2YgJDgxLjQgYmlsbGlvbiwKdXAgMiUgeWVhciBvdmVyIHllYXIuIENFTyBUaW0gQ29vayBzdGF0ZWQgdGhhdCBTZXJ2aWNlcyByZXZlbnVlIHJlYWNoZWQKYSBuZXcgYWxsLXRpbWUgaGlnaCBvZiAkMjEuMiBiaWxsaW9uLiBUaGUgY29tcGFueSdzIGJvYXJkIG9mIGRpcmVjdG9ycwpkZWNsYXJlZCBhIGNhc2ggZGl2aWRlbmQgb2YgJDAuMjQgcGVyIHNoYXJlLgoKQ0ZPIEx1Y2EgTWFlc3RyaSBtZW50aW9uZWQgdGhhdCBpUGhvbmUgcmV2ZW51ZSB3YXMgJDM5LjMgYmlsbGlvbiBmb3IKdGhlIHF1YXJ0ZXIgZW5kaW5nIEp1bmUgMzAsIDIwMjMuIFRoZSBjb21wYW55IGV4cGVjdHMgdG90YWwgcmV2ZW51ZQpiZXR3ZWVuICQ4OSBiaWxsaW9uIGFuZCAkOTMgYmlsbGlvbiBmb3IgdGhlIGZvdXJ0aCBxdWFydGVyLgoKQXBwbGUncyBDdXBlcnRpbm8gaGVhZHF1YXJ0ZXJzIGFubm91bmNlZCB0aGUgYWNxdWlzaXRpb24gb2YgQUkgc3RhcnR1cApXYXZlT25lIGZvciBhbiB1bmRpc2Nsb3NlZCBhbW91bnQuIFRoZSBkZWFsIGlzIGV4cGVjdGVkIHRvIGNsb3NlIGluClE0IDIwMjMsIHBlbmRpbmcgcmVndWxhdG9yeSBhcHByb3ZhbCBmcm9tIHRoZSBTRUMuCiIiIgoKcHJpbnQoIkVhcm5pbmdzIHJlcG9ydCBsb2FkZWQhIikKcHJpbnQoZiJEb2N1bWVudCBsZW5ndGg6IHtsZW4oZWFybmluZ19yZXBvcnQpfSBjaGFyYWN0ZXJzIik=

Output

Loading Python…

We chose this report because it’s dense with overlapping entity types, which is exactly what makes real-world extraction challenging:

Monetary amounts appear in different contexts: revenue ($81.4B), dividends ($0.24), and forecasted ranges ($89B-$93B)
Named entities overlap: “Apple Inc.” is both a company and a stock ticker (AAPL), and “SEC” is an abbreviation that needs context to identify
Temporal references mix formats: exact dates (June 30, 2023), quarters (Q4 2023), and relative time (year over year)

← Previous

Complete & Continue →

Why Not Use Regex?
Regular expressions define text patterns using special syntax to find matches in strings. While they may seem like a natural first choice for entity extraction, they require a separate pattern for each entity type and fail when formats vary.

Here’s what extracting financial amounts, dates, stock symbols, and quarters with regex looks like:

Python

Run

aW1wb3J0IHJlCgplYXJuaW5nX3JlcG9ydCA9ICIiIgpBcHBsZSBJbmMuIChOQVNEQVE6IEFBUEwpIHJlcG9ydGVkIHRoaXJkIHF1YXJ0ZXIgcmV2ZW51ZSBvZiAkODEuNCBiaWxsaW9uLAp1cCAyJSB5ZWFyIG92ZXIgeWVhci4gQ0VPIFRpbSBDb29rIHN0YXRlZCB0aGF0IFNlcnZpY2VzIHJldmVudWUgcmVhY2hlZAphIG5ldyBhbGwtdGltZSBoaWdoIG9mICQyMS4yIGJpbGxpb24uIENGTyBMdWNhIE1hZXN0cmkgbWVudGlvbmVkIHRoYXQKaVBob25lIHJldmVudWUgd2FzICQzOS4zIGJpbGxpb24gZm9yIHRoZSBxdWFydGVyIGVuZGluZyBKdW5lIDMwLCAyMDIzLgoiIiIKCiMgRWFjaCBlbnRpdHkgdHlwZSBuZWVkcyBhIHNlcGFyYXRlIGNvbXBsZXggcGF0dGVybgpmaW5hbmNpYWxfcGF0dGVybiA9IHIiXCQoPzpcZHsxLDN9KD86LFxkezN9KSt8XGQrKSg/OlwuWzAtOV0rKT8oPzpccyooPzpiaWxsaW9ufG1pbGxpb258dHJpbGxpb24pKT8iCmRhdGVfcGF0dGVybiA9IHIiXGIoPzpKYW51YXJ5fEZlYnJ1YXJ5fE1hcmNofEFwcmlsfE1heXxKdW5lfEp1bHl8QXVndXN0fFNlcHRlbWJlcnxPY3RvYmVyfE5vdmVtYmVyfERlY2VtYmVyKVxzK1xkezEsMn0sXHMrXGR7NH0iCnN0b2NrX3BhdHRlcm4gPSByIlxiKD86TkFTREFRfE5ZU0V8TllTRUFSQ0EpOlxzKltBLVpdezIsNX1cYiIKcXVhcnRlcl9wYXR0ZXJuID0gciJcYihRWzEtNF1ccytcZHs0fSlcYiIKCnByaW50KCJGaW5hbmNpYWwgYW1vdW50czoiLCByZS5maW5kYWxsKGZpbmFuY2lhbF9wYXR0ZXJuLCBlYXJuaW5nX3JlcG9ydCwgcmUuSUdOT1JFQ0FTRSkpCnByaW50KCJEYXRlczoiLCByZS5maW5kYWxsKGRhdGVfcGF0dGVybiwgZWFybmluZ19yZXBvcnQpKQpwcmludCgiU3RvY2sgc3ltYm9sczoiLCByZS5maW5kYWxsKHN0b2NrX3BhdHRlcm4sIGVhcm5pbmdfcmVwb3J0KSkKcHJpbnQoIlF1YXJ0ZXJzOiIsIHJlLmZpbmRhbGwocXVhcnRlcl9wYXR0ZXJuLCBlYXJuaW5nX3JlcG9ydCkp

Output

Loading Python…

From the code above, several limitations become apparent:

Each entity type requires its own pattern, resulting in verbose boilerplate code that is difficult to read and maintain.
The patterns only match numeric quarter formats like “Q4 2023” and miss textual forms such as “third quarter” unless additional exact-match patterns are added.

Quiz

A document contains dates in formats like “January 15, 2024”, “15/01/2024”, and “2024-01-15”. What challenge does regex face here?

A
Regex cannot match numeric characters

B
Each date format requires a separate pattern, making the code harder to maintain as formats increase

C
Regex patterns are limited to 100 characters in length

⚠ Try Again
Not quite. Regex handles numeric characters easily with patterns like \d. The challenge is handling multiple format variations.

💡 Correct
Correct! Each date format (ISO, US, European, written) needs its own pattern. As formats multiply, the codebase grows harder to maintain and test.

⚠ Try Again
Not quite. Regex patterns have no practical length limit. The challenge is writing and maintaining patterns for every format variation.

← Previous

Complete & Continue →

Production-Grade Named Entity Recognition
spaCy provides pre-trained models that automatically identify entities like PERSON, ORG, MONEY, DATE, and PERCENT from context. No pattern writing required.

Let’s install spaCy and download a small English model to get started:

pip install spacy
python -m spacy download en_core_web_sm

Extracting entities with spaCy takes just two steps:

Load the model
Process your text

Python

Run

aW1wb3J0IHNwYWN5CgojIExvYWQgdGhlIG1vZGVsCm5scCA9IHNwYWN5LmxvYWQoImVuX2NvcmVfd2ViX3NtIikKCiMgUHJvY2VzcyB5b3VyIHRleHQKc2FtcGxlX3RleHQgPSAiQXBwbGUgSW5jLiByZXBvcnRlZCByZXZlbnVlIG9mICQ4MS40IGJpbGxpb24gd2l0aCBDRU8gVGltIENvb2suIgpkb2MgPSBubHAoc2FtcGxlX3RleHQpCgpwcmludCgiRW50aXRpZXMgZm91bmQ6IikKZm9yIGVudCBpbiBkb2MuZW50czoKICAgIHByaW50KGYiICAne2VudC50ZXh0fScgLT4ge2VudC5sYWJlbF99Iik=

Output

💡 What the output shows

spaCy extracted three entity types (ORG, MONEY, PERSON) without any configuration
The model understood that “Apple Inc.” is a company, not just a fruit
It captured the complete monetary amount “$81.4 billion” including the unit
Person names are recognized even without titles like “CEO”

How spaCy NER Works

spaCy labels each token individually using its BILUO tagging scheme, then groups consecutive entity tokens into spans:

"Apple" "Inc." "CEO" "Tim" "Cook" "$81.4" "billion"
│ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼
B-ORG L-ORG O B-PER L-PER B-MONEY L-MONEY
└───┬───┘ └──┬──┘ └────┬────┘
▼ ▼ ▼
"Apple Inc." → ORG "Tim Cook" → PERSON "$81.4 billion" → MONEY

Begin / Inside / Last mark multi-token entities
Unit marks single-token entities (e.g., “London” → U-LOC)
O means outside any entity

The model learns these tagging patterns from thousands of labeled examples during training.

Quiz

How does spaCy determine that “Apple Inc.” is an ORG entity?

A
It matches against a built-in dictionary of known company names

B
It uses regex to match common organization name patterns

C
The pre-trained model learned patterns from labeled training data

⚠ Try Again
Not quite. spaCy doesn’t use a fixed lookup table. It uses a statistical model that can recognize entities it has never seen before based on learned patterns.

⚠ Try Again
Not quite. Regex uses fixed text patterns. spaCy’s NER model uses neural networks trained on annotated text to predict entity types from context.

💡 Correct
Correct! spaCy’s NER is a statistical model trained on annotated text. It learned patterns like capitalization, surrounding words, and name structures from its training data, not from a fixed list or regex rules.

← Previous

Complete & Continue →

Exercise: Build a Contact List

ScenarioThe sales team wants to build a contact database from meeting notes. They only need people’s names, not dates or other information.TaskExtract only PERSON entities into a list. 💡 Hint Use ent.label_ to check an entity’s type.

aW1wb3J0IHNwYWN5Cm5scCA9IHNwYWN5LmxvYWQoImVuX2NvcmVfd2ViX3NtIikKCnRleHQgPSAiT24gSmFudWFyeSAxNSwgMjAyNCwgU2FyYWggSm9obnNvbiBhbmQgTWljaGFlbCBDaGVuIHNpZ25lZCB0aGUgY29udHJhY3QuIgpkb2MgPSBubHAodGV4dCkKCmNvbnRhY3RzID0gW10KZm9yIGVudCBpbiBkb2MuZW50czoKICAgIGlmIF9fXzoKICAgICAgICBjb250YWN0cy5hcHBlbmQoZW50LnRleHQpCnByaW50KGNvbnRhY3RzKQ==

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Extracting from Business Documents
First, create a helper function that extracts entities and groups them by type:

Python

Run

aW1wb3J0IHNwYWN5CmZyb20gY29sbGVjdGlvbnMgaW1wb3J0IGRlZmF1bHRkaWN0CgpubHAgPSBzcGFjeS5sb2FkKCJlbl9jb3JlX3dlYl9zbSIpCgpkZWYgZXh0cmFjdF9lbnRpdGllc19zcGFjeSh0ZXh0KToKICAgICIiIkV4dHJhY3QgYnVzaW5lc3MgZW50aXRpZXMgdXNpbmcgc3BhQ3kgTkVSLiIiIgogICAgZG9jID0gbmxwKHRleHQpCiAgICBlbnRpdGllcyA9IGRlZmF1bHRkaWN0KGxpc3QpCiAgICBmb3IgZW50IGluIGRvYy5lbnRzOgogICAgICAgIGVudGl0aWVzW2VudC5sYWJlbF9dLmFwcGVuZChlbnQudGV4dCkKICAgIHJldHVybiBkaWN0KGVudGl0aWVzKQ==

Output

Here’s an earnings report with companies, executives, financial figures, and dates:

Python

Run

ZWFybmluZ19yZXBvcnQgPSAiIiIKQXBwbGUgSW5jLiAoTkFTREFROiBBQVBMKSByZXBvcnRlZCB0aGlyZCBxdWFydGVyIHJldmVudWUgb2YgJDgxLjQgYmlsbGlvbiwKdXAgMiUgeWVhciBvdmVyIHllYXIuIENFTyBUaW0gQ29vayBzdGF0ZWQgdGhhdCBTZXJ2aWNlcyByZXZlbnVlIHJlYWNoZWQKYSBuZXcgYWxsLXRpbWUgaGlnaCBvZiAkMjEuMiBiaWxsaW9uLiBUaGUgY29tcGFueSdzIGJvYXJkIG9mIGRpcmVjdG9ycwpkZWNsYXJlZCBhIGNhc2ggZGl2aWRlbmQgb2YgJDAuMjQgcGVyIHNoYXJlLgoKQ0ZPIEx1Y2EgTWFlc3RyaSBtZW50aW9uZWQgdGhhdCBpUGhvbmUgcmV2ZW51ZSB3YXMgJDM5LjMgYmlsbGlvbiBmb3IKdGhlIHF1YXJ0ZXIgZW5kaW5nIEp1bmUgMzAsIDIwMjMuIFRoZSBjb21wYW55IGV4cGVjdHMgdG90YWwgcmV2ZW51ZQpiZXR3ZWVuICQ4OSBiaWxsaW9uIGFuZCAkOTMgYmlsbGlvbiBmb3IgdGhlIGZvdXJ0aCBxdWFydGVyLgoKQXBwbGUncyBDdXBlcnRpbm8gaGVhZHF1YXJ0ZXJzIGFubm91bmNlZCB0aGUgYWNxdWlzaXRpb24gb2YgQUkgc3RhcnR1cApXYXZlT25lIGZvciBhbiB1bmRpc2Nsb3NlZCBhbW91bnQuIFRoZSBkZWFsIGlzIGV4cGVjdGVkIHRvIGNsb3NlIGluClE0IDIwMjMsIHBlbmRpbmcgcmVndWxhdG9yeSBhcHByb3ZhbCBmcm9tIHRoZSBTRUMuCiIiIg==

Output

Extract and display all entities found:

Python

Run

c3BhY3lfZW50aXRpZXMgPSBleHRyYWN0X2VudGl0aWVzX3NwYWN5KGVhcm5pbmdfcmVwb3J0KQoKZm9yIGVudGl0eV90eXBlLCBlbnRpdGllc19saXN0IGluIHNwYWN5X2VudGl0aWVzLml0ZW1zKCk6CiAgICBwcmludChmIlxue2VudGl0eV90eXBlfSAoe2xlbihlbnRpdGllc19saXN0KX0gZm91bmQpOiIpCiAgICBmb3IgZW50aXR5IGluIGVudGl0aWVzX2xpc3Q6CiAgICAgICAgcHJpbnQoZiIgIHtlbnRpdHl9Iik=

Output

💡 What the output shows

spaCy extracted 20+ entities across 6 different types from a single document
It recognized textual dates like “third quarter” and “the quarter ending June 30, 2023”
All five monetary values were captured with their full amounts
However, some domain terms are misclassified: “iPhone” as ORG and “AI” as GPE (location)

Quiz

Why did spaCy classify “iPhone” as ORG instead of a product?

A
spaCy doesn’t have a PRODUCT entity type

B
The small model lacks sufficient training examples to recognize iPhone as a product

C
iPhone is actually a company name

⚠ Try Again
Not quite. spaCy does have a PRODUCT entity type, but the small model wasn’t trained with enough product examples.

💡 Correct
Correct! While spaCy has a PRODUCT entity type, the en_core_web_sm model wasn’t trained with enough product name examples. It predicts ORG because capitalized terms like company names are more common in its training data.

⚠ Try Again
Not quite. iPhone is Apple’s smartphone product, not a company.

← Previous

Complete & Continue →

Exercise: Export Contact List

ScenarioHR needs a spreadsheet of all people mentioned in meeting notes, with their mention positions for reference.TaskCreate a DataFrame with only PERSON entities, columns: name, position. 💡 Hint Filter by ent.label_ and use ent.start_char for position.

aW1wb3J0IHNwYWN5CmltcG9ydCBwYW5kYXMgYXMgcGQKCm5scCA9IHNwYWN5LmxvYWQoImVuX2NvcmVfd2ViX3NtIikKdGV4dCA9ICJTYXJhaCBDaGVuIGpvaW5lZCBpbiAyMDIwLiBTaGUgbWV0IHdpdGggRGF2aWQgS2ltIGFuZCBMaXNhIFBhcmsgeWVzdGVyZGF5LiIKZG9jID0gbmxwKHRleHQpCgpjb250YWN0cyA9IFtdCmZvciBlbnQgaW4gZG9jLmVudHM6CiAgICBpZiBlbnQubGFiZWxfID09ICJfX18iOgogICAgICAgIGNvbnRhY3RzLmFwcGVuZCh7Im5hbWUiOiBfX18sICJwb3NpdGlvbiI6IF9fX30pCgpkZiA9IHBkLkRhdGFGcmFtZShjb250YWN0cykKcHJpbnQoZGYp

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Visualizing Entities with displaCy
spaCy includes displaCy, a built-in visualizer that highlights entities directly in your text. This helps you quickly verify extraction results and debug misclassifications.

aW1wb3J0IHNwYWN5CmZyb20gc3BhY3kgaW1wb3J0IGRpc3BsYWN5CgpubHAgPSBzcGFjeS5sb2FkKCJlbl9jb3JlX3dlYl9zbSIpCgp0ZXh0ID0gIkFwcGxlIEluYy4gQ0VPIFRpbSBDb29rIGFubm91bmNlZCAkODEuNCBiaWxsaW9uIGluIHJldmVudWUgZm9yIFE0IDIwMjMuIgpkb2MgPSBubHAodGV4dCk=

The displacy.render() function generates an HTML visualization with color-coded entity labels:

Python

Run

ZGlzcGxhY3kucmVuZGVyKGRvYywgc3R5bGU9ImVudCIp

Output

💡 What the output shows

Each entity type has a distinct color: teal for ORG, purple for PERSON, beige for MONEY, and mint for DATE
Labels appear inline next to each entity, making it easy to verify classifications at a glance

When documents contain many entities, you can filter to show only specific types using the options parameter:

Python

Run

ZGlzcGxhY3kucmVuZGVyKGRvYywgc3R5bGU9ImVudCIsIG9wdGlvbnM9eyJlbnRzIjogWyJQRVJTT04iLCAiT1JHIiwgIk1PTkVZIl19KQ==

Output

💡 What the output shows
“Q4 2023” is no longer highlighted since DATE was excluded from the filter.

← Previous

Complete & Continue →

Zero-Shot Custom Entity Extraction
GLiNER solves spaCy’s limitation of fixed entity types. Instead of being locked into categories like ORG or GPE, GLiNER lets you define custom types using natural language descriptions.

pip install gliner

GLiNER offers several pretrained models. We’ll use gliner_small-v2.1 with threshold=0.3 to capture entities with at least 30% confidence:

Python

Run

ZnJvbSBnbGluZXIgaW1wb3J0IEdMaU5FUgoKbW9kZWwgPSBHTGlORVIuZnJvbV9wcmV0cmFpbmVkKCJ1cmNoYWRlL2dsaW5lcl9zbWFsbC12Mi4xIikKCnRlc3RfdGV4dCA9ICJBcHBsZSBJbmMuIENFTyBUaW0gQ29vayBhbm5vdW5jZWQgcXVhcnRlcmx5IHJldmVudWUgb2YgJDgxLjQgYmlsbGlvbi4iCmN1c3RvbV90eXBlcyA9IFsiQ29tcGFueSIsICJQZXJzb24iLCAiQ3VycmVuY3kiXQoKZW50aXRpZXMgPSBtb2RlbC5wcmVkaWN0X2VudGl0aWVzKHRlc3RfdGV4dCwgY3VzdG9tX3R5cGVzLCB0aHJlc2hvbGQ9MC4zKQoKZm9yIGVudGl0eSBpbiBlbnRpdGllczoKICAgIHByaW50KGYiJ3tlbnRpdHlbJ3RleHQnXX0nIC0+IHtlbnRpdHlbJ2xhYmVsJ119IChjb25maWRlbmNlOiB7ZW50aXR5WydzY29yZSddOi4zZn0pIik=

Output

💡 What the output shows

GLiNER recognized custom entity types without any training
Confidence scores vary: “Tim Cook” (0.563) scores highest as names are distinctive, while “$81.4 billion” (0.310) scores lower because “Currency” is a less common label

📝 Other model options
For higher accuracy, try gliner_medium-v2.1. For multilingual support, use gliner_multi-v2.1.

How GLiNER Works

Instead of tagging individual tokens, GLiNER scores entire spans against every label you provide. The highest-scoring label wins, and spans below your threshold are filtered out:

┌──────────────┬───────────┬──────────────────┐
│ Span │ Label │ Confidence │
├──────────────┼───────────┼──────────────────┤
│ Apple Inc │ Company │ ████░░░░░░░ 0.36 │ ✓ above 0.3
│ Apple Inc │ Person │ █░░░░░░░░░░ 0.05 │ ✗
├──────────────┼───────────┼──────────────────┤
│ Tim Cook │ Company │ █░░░░░░░░░░ 0.04 │ ✗
│ Tim Cook │ Person │ ██████░░░░░ 0.56 │ ✓ above 0.3
├──────────────┼───────────┼──────────────────┤
│ $81.4 billion│ Company │ ░░░░░░░░░░░ 0.01 │ ✗
│ $81.4 billion│ Currency │ ███░░░░░░░░ 0.31 │ ✓ above 0.3
└──────────────┴───────────┴──────────────────┘
threshold = 0.3 ▲

This gives you two controls spaCy doesn’t: custom labels (any text, not a fixed set) and a confidence threshold to filter results.

Quiz

How does GLiNER decide which label to assign to a text span?

A
It picks the first label in your list that partially matches

B
It scores the span against every label and picks the highest

C
It uses a dictionary lookup to map known words to labels

⚠ Try Again
Not quite. The order of labels in your list doesn’t affect the result. GLiNER evaluates all labels equally for each span.

💡 Correct
Correct! As shown in the diagram, each span is scored against all labels. “Apple Inc” scored 0.36 for Company, 0.05 for Person, and 0.02 for Currency. The highest score (Company) wins.

⚠ Try Again
Not quite. GLiNER doesn’t use a fixed dictionary. It uses a BERT-like encoder to compare text spans against label descriptions semantically.

← Previous

Complete & Continue →

Extracting Business Entities
First, define entity types specific to financial documents:

Python

Run

ZnJvbSBnbGluZXIgaW1wb3J0IEdMaU5FUgpmcm9tIGNvbGxlY3Rpb25zIGltcG9ydCBkZWZhdWx0ZGljdAoKbW9kZWwgPSBHTGlORVIuZnJvbV9wcmV0cmFpbmVkKCJ1cmNoYWRlL2dsaW5lcl9zbWFsbC12Mi4xIikKCmJ1c2luZXNzX2VudGl0aWVzID0gWwogICAgIkNvbXBhbnkiLCAiRXhlY3V0aXZlIiwgIk1vbmV0YXJ5IFZhbHVlIiwgIlByb2R1Y3QiLAogICAgIlN0YXJ0dXAiLCAiUXVhcnRlciIsICJMb2NhdGlvbiIsCl0=

Output

We’ll use the same earnings report from the spaCy section:

Python

Run

ZWFybmluZ19yZXBvcnQgPSAiIiIKQXBwbGUgSW5jLiAoTkFTREFROiBBQVBMKSByZXBvcnRlZCB0aGlyZCBxdWFydGVyIHJldmVudWUgb2YgJDgxLjQgYmlsbGlvbiwKdXAgMiUgeWVhciBvdmVyIHllYXIuIENFTyBUaW0gQ29vayBzdGF0ZWQgdGhhdCBTZXJ2aWNlcyByZXZlbnVlIHJlYWNoZWQKYSBuZXcgYWxsLXRpbWUgaGlnaCBvZiAkMjEuMiBiaWxsaW9uLiBUaGUgY29tcGFueSdzIGJvYXJkIG9mIGRpcmVjdG9ycwpkZWNsYXJlZCBhIGNhc2ggZGl2aWRlbmQgb2YgJDAuMjQgcGVyIHNoYXJlLgoKQ0ZPIEx1Y2EgTWFlc3RyaSBtZW50aW9uZWQgdGhhdCBpUGhvbmUgcmV2ZW51ZSB3YXMgJDM5LjMgYmlsbGlvbiBmb3IKdGhlIHF1YXJ0ZXIgZW5kaW5nIEp1bmUgMzAsIDIwMjMuIFRoZSBjb21wYW55IGV4cGVjdHMgdG90YWwgcmV2ZW51ZQpiZXR3ZWVuICQ4OSBiaWxsaW9uIGFuZCAkOTMgYmlsbGlvbiBmb3IgdGhlIGZvdXJ0aCBxdWFydGVyLgoKQXBwbGUncyBDdXBlcnRpbm8gaGVhZHF1YXJ0ZXJzIGFubm91bmNlZCB0aGUgYWNxdWlzaXRpb24gb2YgQUkgc3RhcnR1cApXYXZlT25lIGZvciBhbiB1bmRpc2Nsb3NlZCBhbW91bnQuIFRoZSBkZWFsIGlzIGV4cGVjdGVkIHRvIGNsb3NlIGluClE0IDIwMjMsIHBlbmRpbmcgcmVndWxhdG9yeSBhcHByb3ZhbCBmcm9tIHRoZSBTRUMuCiIiIg==

Output

Extract entities and group them by type:

Python

Run

ZW50aXRpZXMgPSBtb2RlbC5wcmVkaWN0X2VudGl0aWVzKGVhcm5pbmdfcmVwb3J0LCBidXNpbmVzc19lbnRpdGllcywgdGhyZXNob2xkPTAuMykKCmdyb3VwZWQgPSBkZWZhdWx0ZGljdChsaXN0KQpmb3IgZW50aXR5IGluIGVudGl0aWVzOgogICAgZ3JvdXBlZFtlbnRpdHlbJ2xhYmVsJ11dLmFwcGVuZCh7CiAgICAgICAgJ3RleHQnOiBlbnRpdHlbJ3RleHQnXSwKICAgICAgICAnY29uZmlkZW5jZSc6IHJvdW5kKGVudGl0eVsnc2NvcmUnXSwgMykKICAgIH0pCgpmb3IgZW50aXR5X3R5cGUsIGVudGl0aWVzX2xpc3QgaW4gZ3JvdXBlZC5pdGVtcygpOgogICAgcHJpbnQoZiJcbntlbnRpdHlfdHlwZS51cHBlcigpfSAoe2xlbihlbnRpdGllc19saXN0KX0gZm91bmQpOiIpCiAgICBmb3IgZW50aXR5IGluIGVudGl0aWVzX2xpc3Q6CiAgICAgICAgcHJpbnQoZiIgICd7ZW50aXR5Wyd0ZXh0J119JyAoY29uZmlkZW5jZToge2VudGl0eVsnY29uZmlkZW5jZSddfSkiKQ==

Output

💡 What the output shows

GLiNER found entities spaCy couldn’t: “WaveOne” as STARTUP, “third quarter” as QUARTER
“iPhone” is now PRODUCT instead of ORG
“Cupertino headquarters” was captured as a complete LOCATION phrase

Quiz

“Apple Inc.” scored 0.908 while “$0.24 per share” scored 0.302. What explains this gap?

A
GLiNER prioritizes company names over monetary values

B
“Apple Inc.” closely matches the “Company” label, while “$0.24 per share” is an unusual format for “Monetary Value”

C
Shorter text spans always score higher than longer ones

⚠ Try Again
Not quite. GLiNER doesn’t prioritize any entity type. All labels are scored equally for each span.

💡 Correct
Correct! Confidence reflects how well a span matches its label semantically. “Apple Inc.” is a clear company name, while “$0.24 per share” includes extra context that makes it a less typical match for “Monetary Value.”

⚠ Try Again
Not quite. Span length doesn’t determine the score. “Cupertino headquarters” (long) scored 0.781 while “third quarter” (short) scored only 0.48.

← Previous

Complete & Continue →

Exercise: Parse Business Metrics

ScenarioThe BI team needs to automatically extract KPIs from quarterly reports to populate dashboards. They want to capture metric names and time periods from business summaries.TaskExtract metrics and time periods from a business report using custom labels. 💡 Hint Use labels like "Metric" and "Time Period" to capture business KPIs and dates.

ZnJvbSBnbGluZXIgaW1wb3J0IEdMaU5FUgoKbW9kZWwgPSBHTGlORVIuZnJvbV9wcmV0cmFpbmVkKCJ1cmNoYWRlL2dsaW5lcl9zbWFsbC12Mi4xIikKCnJlcG9ydCA9ICJNb250aGx5IGFjdGl2ZSB1c2VycyBncmV3IDE1JSBpbiBRMy4gQ3VzdG9tZXIgY2h1cm4gZHJvcHBlZCB0byAyLjElIGxhc3QgcXVhcnRlci4iCmxhYmVscyA9IFtfX18sIF9fX10KCmVudGl0aWVzID0gbW9kZWwucHJlZGljdF9lbnRpdGllcyhyZXBvcnQsIGxhYmVscywgdGhyZXNob2xkPTAuMykKCmZvciBlbnQgaW4gZW50aXRpZXM6CiAgICBwcmludChmIntlbnRbJ2xhYmVsJ119OiB7ZW50Wyd0ZXh0J119Iik=

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Using Confidence Scores for Quality Control
To implement quality control, categorize entities by confidence and flag low-scoring matches for manual review.

First, extract entities with a low threshold to capture all potential matches:

Python

Run

ZnJvbSBnbGluZXIgaW1wb3J0IEdMaU5FUgoKbW9kZWwgPSBHTGlORVIuZnJvbV9wcmV0cmFpbmVkKCJ1cmNoYWRlL2dsaW5lcl9zbWFsbC12Mi4xIikKCnRleHQgPSAiQXBwbGUgYW5kIEphdmEgYW5ub3VuY2VkIGEgcGFydG5lcnNoaXAgYXQgdGhlIE9yYWNsZSBjb25mZXJlbmNlIGluIEF1c3Rpbi4iCmxhYmVscyA9IFsiQ29tcGFueSIsICJQcm9kdWN0IiwgIkxvY2F0aW9uIiwgIlByb2dyYW1taW5nIExhbmd1YWdlIl0KCmVudGl0aWVzID0gbW9kZWwucHJlZGljdF9lbnRpdGllcyh0ZXh0LCBsYWJlbHMsIHRocmVzaG9sZD0wLjMp

Output

Sort entities into two groups based on a 0.5 confidence threshold:

High confidence: Entities scoring 0.5 or above
Needs review: Entities below 0.5 that require manual check

Python

Run

aGlnaF9jb25maWRlbmNlID0gW10KbmVlZHNfcmV2aWV3ID0gW10KCmZvciBlbnQgaW4gZW50aXRpZXM6CiAgICBpZiBlbnRbJ3Njb3JlJ10gPj0gMC41OgogICAgICAgIGhpZ2hfY29uZmlkZW5jZS5hcHBlbmQoZW50KQogICAgZWxzZToKICAgICAgICBuZWVkc19yZXZpZXcuYXBwZW5kKGVudCkKCnByaW50KCJIaWdoIGNvbmZpZGVuY2UgKHNjb3JlID49IDAuNSk6IikKZm9yIGVudCBpbiBoaWdoX2NvbmZpZGVuY2U6CiAgICBwcmludChmIiAgJ3tlbnRbJ3RleHQnXX0nIC0+IHtlbnRbJ2xhYmVsJ119IikKCnByaW50KCJcbk5lZWRzIHJldmlldyAoc2NvcmUgPCAwLjUpOiIpCmZvciBlbnQgaW4gbmVlZHNfcmV2aWV3OgogICAgcHJpbnQoZiIgICd7ZW50Wyd0ZXh0J119JyAtPiB7ZW50WydsYWJlbCddfSAoc2NvcmU6IHtlbnRbJ3Njb3JlJ106LjNmfSkiKQ==

Output

💡 What the output shows

“Apple” scores high (0.798) because it’s unambiguously a company in this context
“Java” scores low (0.366) because it could mean the programming language, coffee brand, or Indonesian island
The model correctly flags ambiguous terms for human review

Quiz

What is the key advantage of GLiNER over spaCy?

A
GLiNER is faster than spaCy

B
GLiNER allows custom entity types without training data

C
GLiNER has more pre-trained models

⚠ Try Again
Not quite. spaCy is typically faster. GLiNER’s advantage is flexibility, not speed.

💡 Correct
Correct! GLiNER’s zero-shot learning lets you define custom entity types like “startup” or “regulatory_body” using natural language descriptions, without needing training examples.

⚠ Try Again
Not quite. The advantage is the ability to define custom entity types, not the number of models available.

← Previous

Complete & Continue →

Exercise: Route Low-Confidence to Review

ScenarioYour data pipeline extracts entities from customer emails. Ambiguous extractions need human review before updating the CRM.TaskCreate a needs_review list containing entities with score < 0.5, storing tuples of (text, label, score). 💡 Hint Each entity has ent['text'], ent['label'], and ent['score'] keys. ZnJvbSBnbGluZXIgaW1wb3J0IEdMaU5FUgoKbW9kZWwgPSBHTGlORVIuZnJvbV9wcmV0cmFpbmVkKCJ1cmNoYWRlL2dsaW5lcl9zbWFsbC12Mi4xIikKCnRleHQgPSAiTm90aWZ5IHRoZSBwbGF0Zm9ybSB0ZWFtIGFib3V0IHRoZSBTaW5nYXBvcmUgZGVwbG95bWVudCBkZWxheS4iCmxhYmVscyA9IFsiVGVhbSIsICJMb2NhdGlvbiIsICJFdmVudCJdCgplbnRpdGllcyA9IG1vZGVsLnByZWRpY3RfZW50aXRpZXModGV4dCwgbGFiZWxzLCB0aHJlc2hvbGQ9MC4zKQoKbmVlZHNfcmV2aWV3ID0gW10KZm9yIGVudCBpbiBlbnRpdGllczoKICAgIGlmIF9fXzoKICAgICAgICBuZWVkc19yZXZpZXcuYXBwZW5kKChfX18sIF9fXywgX19fKSkKCmZvciB0ZXh0LCBsYWJlbCwgc2NvcmUgaW4gbmVlZHNfcmV2aWV3OgogICAgcHJpbnQoZiJSZXZpZXc6ICd7dGV4dH0nIFt7bGFiZWx9XSAoc2NvcmU6IHtzY29yZTouM2Z9KSIp Run Submit Solution Reset Output Ready ← Previous Complete & Continue → AI-Powered Extraction with Source Grounding langextract uses large language models (Gemini, GPT) to understand entity relationships and provide source attribution.

It captures semantic context like “AI startup WaveOne” (category + name) and “between $89 billion and $93 billion” (revenue ranges) as complete phrases rather than separate pieces.

Let’s install langextract along with its dependencies to try it out:

pip install langextract python-dotenv google-genai

To authenticate, add your API key to a .env file. This course uses Gemini (get a key from AI Studio), but OpenAI models also work:

# .env file
LANGEXTRACT_API_KEY=your-api-key-here

langextract uses an LLM to extract entities. You provide examples that teach the model what to look for and how to format the output:

Example (you provide):
┌─────────────────────────────────────────────────────┐
│ Text: "Microsoft Corp. CEO Satya Nadella reported │
│ Q2 2024 revenue of $65B" │
│ │
│ Extractions: │
│ company → "Microsoft Corp." │
│ executive → "CEO Satya Nadella" ← role + name │
│ quarter → "Q2 2024" │
│ financial → "$65B" │
└──────────────────────┬──────────────────────────────┘
│ teaches format

New text: "Apple Inc… CEO Tim Cook… $81.4 billion"


Output (model generates):
┌─────────────────────────────────────────────────────┐
│ company → "Apple Inc." │
│ executive → "CEO Tim Cook" ← same format │
│ executive → "CFO Luca Maestri" ← generalized │
│ financial → "undisclosed amount" ← semantic │
└─────────────────────────────────────────────────────┘

The LLM generalizes from your examples. One example showing “CEO Satya Nadella” is enough for it to also extract “CFO Luca Maestri” and understand “undisclosed amount” as a financial figure, something spaCy and GLiNER would miss.

Few-Shot Learning with Examples

To use langextract, provide two components:

Prompt: A description listing entity types to extract (companies, executives, financial figures)
Examples: Sample text paired with labeled extractions showing expected output

Python

Run

aW1wb3J0IG9zCmZyb20gZG90ZW52IGltcG9ydCBsb2FkX2RvdGVudgppbXBvcnQgbGFuZ2V4dHJhY3QgYXMgbHgKZnJvbSBsYW5nZXh0cmFjdCBpbXBvcnQgZXh0cmFjdAoKbG9hZF9kb3RlbnYoKQoKZGVmIGV4dHJhY3RfZmluYW5jaWFsX2VudGl0aWVzKHRleHQpOgogICAgIiIiRXh0cmFjdCBlbnRpdGllcyB1c2luZyBsYW5nZXh0cmFjdC4iIiIKICAgIHByb21wdF9kZXNjcmlwdGlvbiA9ICIiIkV4dHJhY3QgYnVzaW5lc3MgZW50aXRpZXM6IGNvbXBhbmllcywgZXhlY3V0aXZlcywKICAgIGZpbmFuY2lhbCBmaWd1cmVzLCBxdWFydGVycywgbG9jYXRpb25zLCBwcm9kdWN0cywgc3RhcnR1cHMsCiAgICByZWd1bGF0b3J5IGJvZGllcywgc3RvY2tfc3ltYm9scywgbWFya2V0X3JlYWN0aW9uLiIiIgoKICAgIGV4YW1wbGVzID0gWwogICAgICAgIGx4LmRhdGEuRXhhbXBsZURhdGEoCiAgICAgICAgICAgIHRleHQ9Ik1pY3Jvc29mdCBDb3JwLiAoTllTRTogTVNGVCkgQ0VPIFNhdHlhIE5hZGVsbGEgcmVwb3J0ZWQgUTIgMjAyNCByZXZlbnVlIG9mICQ2NUIsIGRvd24gNSUgcXVhcnRlci1vdmVyLXF1YXJ0ZXIuIiwKICAgICAgICAgICAgZXh0cmFjdGlvbnM9WwogICAgICAgICAgICAgICAgbHguZGF0YS5FeHRyYWN0aW9uKGV4dHJhY3Rpb25fY2xhc3M9ImNvbXBhbnkiLCBleHRyYWN0aW9uX3RleHQ9Ik1pY3Jvc29mdCBDb3JwLiIpLAogICAgICAgICAgICAgICAgbHguZGF0YS5FeHRyYWN0aW9uKGV4dHJhY3Rpb25fY2xhc3M9ImV4ZWN1dGl2ZSIsIGV4dHJhY3Rpb25fdGV4dD0iQ0VPIFNhdHlhIE5hZGVsbGEiKSwKICAgICAgICAgICAgICAgIGx4LmRhdGEuRXh0cmFjdGlvbihleHRyYWN0aW9uX2NsYXNzPSJzdG9ja19zeW1ib2wiLCBleHRyYWN0aW9uX3RleHQ9Ik5ZU0U6IE1TRlQiKSwKICAgICAgICAgICAgICAgIGx4LmRhdGEuRXh0cmFjdGlvbihleHRyYWN0aW9uX2NsYXNzPSJxdWFydGVyIiwgZXh0cmFjdGlvbl90ZXh0PSJRMiAyMDI0IiksCiAgICAgICAgICAgICAgICBseC5kYXRhLkV4dHJhY3Rpb24oZXh0cmFjdGlvbl9jbGFzcz0iZmluYW5jaWFsX2ZpZ3VyZSIsIGV4dHJhY3Rpb25fdGV4dD0iJDY1QiIpLAogICAgICAgICAgICAgICAgbHguZGF0YS5FeHRyYWN0aW9uKGV4dHJhY3Rpb25fY2xhc3M9Im1hcmtldF9yZWFjdGlvbiIsIGV4dHJhY3Rpb25fdGV4dD0iZG93biA1JSBxdWFydGVyLW92ZXItcXVhcnRlciIpLAogICAgICAgICAgICBdCiAgICAgICAgKQogICAgXQoKICAgIHJldHVybiBleHRyYWN0KAogICAgICAgIHRleHRfb3JfZG9jdW1lbnRzPXRleHQsCiAgICAgICAgcHJvbXB0X2Rlc2NyaXB0aW9uPXByb21wdF9kZXNjcmlwdGlvbiwKICAgICAgICBleGFtcGxlcz1leGFtcGxlcywKICAgICAgICBtb2RlbF9pZD0iZ2VtaW5pLTIuNS1mbGFzaCIKICAgICk=

Output

Now extract entities from the earnings report:

Python

Run

ZnJvbSBjb2xsZWN0aW9ucyBpbXBvcnQgZGVmYXVsdGRpY3QKCmVhcm5pbmdfcmVwb3J0ID0gIiIiCkFwcGxlIEluYy4gKE5BU0RBUTogQUFQTCkgcmVwb3J0ZWQgdGhpcmQgcXVhcnRlciByZXZlbnVlIG9mICQ4MS40IGJpbGxpb24sCnVwIDIlIHllYXIgb3ZlciB5ZWFyLiBDRU8gVGltIENvb2sgc3RhdGVkIHRoYXQgU2VydmljZXMgcmV2ZW51ZSByZWFjaGVkCmEgbmV3IGFsbC10aW1lIGhpZ2ggb2YgJDIxLjIgYmlsbGlvbi4gVGhlIGNvbXBhbnkncyBib2FyZCBvZiBkaXJlY3RvcnMKZGVjbGFyZWQgYSBjYXNoIGRpdmlkZW5kIG9mICQwLjI0IHBlciBzaGFyZS4KCkNGTyBMdWNhIE1hZXN0cmkgbWVudGlvbmVkIHRoYXQgaVBob25lIHJldmVudWUgd2FzICQzOS4zIGJpbGxpb24gZm9yCnRoZSBxdWFydGVyIGVuZGluZyBKdW5lIDMwLCAyMDIzLiBUaGUgY29tcGFueSBleHBlY3RzIHRvdGFsIHJldmVudWUKYmV0d2VlbiAkODkgYmlsbGlvbiBhbmQgJDkzIGJpbGxpb24gZm9yIHRoZSBmb3VydGggcXVhcnRlci4KCkFwcGxlJ3MgQ3VwZXJ0aW5vIGhlYWRxdWFydGVycyBhbm5vdW5jZWQgdGhlIGFjcXVpc2l0aW9uIG9mIEFJIHN0YXJ0dXAKV2F2ZU9uZSBmb3IgYW4gdW5kaXNjbG9zZWQgYW1vdW50LiBUaGUgZGVhbCBpcyBleHBlY3RlZCB0byBjbG9zZSBpbgpRNCAyMDIzLCBwZW5kaW5nIHJlZ3VsYXRvcnkgYXBwcm92YWwgZnJvbSB0aGUgU0VDLgoiIiIKCnJlc3VsdCA9IGV4dHJhY3RfZmluYW5jaWFsX2VudGl0aWVzKGVhcm5pbmdfcmVwb3J0KQoKbm9uX2VtcHR5ID0gW2UgZm9yIGUgaW4gcmVzdWx0LmV4dHJhY3Rpb25zIGlmIGUuZXh0cmFjdGlvbl90ZXh0XQpwcmludChmIkV4dHJhY3RlZCB7bGVuKG5vbl9lbXB0eSl9IGVudGl0aWVzOiIpCgpncm91cGVkID0gZGVmYXVsdGRpY3QobGlzdCkKZm9yIGV4dHJhY3Rpb24gaW4gcmVzdWx0LmV4dHJhY3Rpb25zOgogICAgaWYgZXh0cmFjdGlvbi5leHRyYWN0aW9uX3RleHQ6ICAjIEZpbHRlciBlbXB0eSBleHRyYWN0aW9ucwogICAgICAgIGdyb3VwZWRbZXh0cmFjdGlvbi5leHRyYWN0aW9uX2NsYXNzXS5hcHBlbmQoZXh0cmFjdGlvbi5leHRyYWN0aW9uX3RleHQpCgpmb3IgZW50aXR5X2NsYXNzLCB0ZXh0cyBpbiBncm91cGVkLml0ZW1zKCk6CiAgICBwcmludChmIlxue2VudGl0eV9jbGFzcy51cHBlcigpfSAoe2xlbih0ZXh0cyl9IGZvdW5kKToiKQogICAgZm9yIHRleHQgaW4gdGV4dHM6CiAgICAgICAgcHJpbnQoZiIgICd7dGV4dH0nIik=

Output

💡 What the output shows

Role-linked executives (“CEO Tim Cook”) instead of just the name
Semantic understanding of “undisclosed amount” as a financial figure
Market reaction “up 2% year over year” captured with full context

Quiz

The example extracts “CEO Satya Nadella” as an executive. How does this affect the model’s output?

A
The model will only extract executives from Microsoft

B
The model learns to include the role (CEO/CFO) with the name

C
The model copies the exact format and ignores other patterns

⚠ Try Again
Not quite. The example teaches a pattern, not a specific company. The model applied the same pattern to extract “CEO Tim Cook” and “CFO Luca Maestri” from Apple’s report.

💡 Correct
Correct! The few-shot example teaches the model what format to use. Since the example linked the role to the name, the model did the same for “CEO Tim Cook” and “CFO Luca Maestri.”

⚠ Try Again
Not quite. The model generalizes from the example. It extracted “CFO Luca Maestri” even though the example only showed a CEO pattern.

langextract extracted “undisclosed amount” as a financial figure. Why would spaCy and GLiNER likely miss this?

A
“undisclosed amount” is too long for token-based models

B
It contains no numbers or currency symbols, which pattern-based models rely on to identify financial entities

C
spaCy and GLiNER can’t process sentences about acquisitions

⚠ Try Again
Not quite. Both spaCy and GLiNER handle multi-token spans. “Cupertino headquarters” was captured as a two-word span by GLiNER.

💡 Correct
Correct! spaCy’s MONEY type and GLiNER’s “Monetary Value” label both depend on numeric patterns. langextract’s LLM understands that “undisclosed amount” refers to money semantically, even without numbers.

⚠ Try Again
Not quite. Both tools can process any text. The issue is that “undisclosed amount” lacks the numeric patterns these models use to identify financial entities.

← Previous

Complete & Continue →

Exercise: Analyze Customer Feedback

ScenarioThe product team reviews app store feedback to prioritize fixes. They need to identify which feature users mention and whether the feedback is positive or negative.TaskComplete the example by identifying what text to extract for each label. Paste your AI Studio key in the secure field below. 💡 Hint Read the example: “Love the calendar sync, hate the notification sounds.” What words are features? What words express how the user feels?
Langextract Api Key

ZnJvbSBjb2xsZWN0aW9ucyBpbXBvcnQgZGVmYXVsdGRpY3QKaW1wb3J0IGxhbmdleHRyYWN0IGFzIGx4CmZyb20gbGFuZ2V4dHJhY3QgaW1wb3J0IGV4dHJhY3QKCmZlZWRiYWNrID0gIlRoZSBuZXcgZGFyayBtb2RlIGlzIGFtYXppbmchIEJ1dCB0aGUgc2VhcmNoIGZ1bmN0aW9uIGlzIHBhaW5mdWxseSBzbG93LiIKCmV4YW1wbGVzID0gWwogICAgbHguZGF0YS5FeGFtcGxlRGF0YSgKICAgICAgICB0ZXh0PSJMb3ZlIHRoZSBjYWxlbmRhciBzeW5jLCBoYXRlIHRoZSBub3RpZmljYXRpb24gc291bmRzLiIsCiAgICAgICAgZXh0cmFjdGlvbnM9WwogICAgICAgICAgICBseC5kYXRhLkV4dHJhY3Rpb24oZXh0cmFjdGlvbl9jbGFzcz0iZmVhdHVyZSIsIGV4dHJhY3Rpb25fdGV4dD0iY2FsZW5kYXIgc3luYyIpLAogICAgICAgICAgICBseC5kYXRhLkV4dHJhY3Rpb24oZXh0cmFjdGlvbl9jbGFzcz0ic2VudGltZW50IiwgZXh0cmFjdGlvbl90ZXh0PSJMb3ZlIiksCiAgICAgICAgICAgIGx4LmRhdGEuRXh0cmFjdGlvbihleHRyYWN0aW9uX2NsYXNzPSJmZWF0dXJlIiwgZXh0cmFjdGlvbl90ZXh0PSJfX18iKSwKICAgICAgICAgICAgbHguZGF0YS5FeHRyYWN0aW9uKGV4dHJhY3Rpb25fY2xhc3M9InNlbnRpbWVudCIsIGV4dHJhY3Rpb25fdGV4dD0iX19fIiksCiAgICAgICAgXQogICAgKQpdCgpyZXN1bHQgPSBleHRyYWN0KAogICAgdGV4dF9vcl9kb2N1bWVudHM9ZmVlZGJhY2ssCiAgICBwcm9tcHRfZGVzY3JpcHRpb249IkV4dHJhY3QgZmVhdHVyZXMgbWVudGlvbmVkIGFuZCB1c2VyIHNlbnRpbWVudC4iLAogICAgZXhhbXBsZXM9ZXhhbXBsZXMsCiAgICBtb2RlbF9pZD0iZ2VtaW5pLTIuNS1mbGFzaCIKKQoKZW50aXRpZXMgPSBkZWZhdWx0ZGljdChsaXN0KQpmb3IgZSBpbiByZXN1bHQuZXh0cmFjdGlvbnM6CiAgICBpZiBlLmV4dHJhY3Rpb25fdGV4dDoKICAgICAgICBlbnRpdGllc1tlLmV4dHJhY3Rpb25fY2xhc3NdLmFwcGVuZChlLmV4dHJhY3Rpb25fdGV4dCkKcHJpbnQoZGljdChlbnRpdGllcykp

Run

Submit

Solution

Reset

Output

Ready

← Previous

Complete & Continue →

Visualizing Extractions
langextract can generate an interactive HTML visualization where each entity is color-coded and hoverable. First, save the results to a JSONL file, then generate the visualization:

Python

Run

aW1wb3J0IGxhbmdleHRyYWN0IGFzIGx4CmZyb20gbGFuZ2V4dHJhY3QgaW1wb3J0IGV4dHJhY3QKCnRleHQgPSAiQXBwbGUgQ0VPIFRpbSBDb29rIHJlcG9ydGVkICQ4MS40IGJpbGxpb24gaW4gUTMgMjAyMyByZXZlbnVlLiIKCnJlc3VsdCA9IGV4dHJhY3QoCiAgICB0ZXh0X29yX2RvY3VtZW50cz10ZXh0LAogICAgcHJvbXB0X2Rlc2NyaXB0aW9uPSJFeHRyYWN0IGNvbXBhbmllcywgZXhlY3V0aXZlcywgYW5kIGZpbmFuY2lhbCBmaWd1cmVzLiIsCiAgICBleGFtcGxlcz1bCiAgICAgICAgbHguZGF0YS5FeGFtcGxlRGF0YSgKICAgICAgICAgICAgdGV4dD0iTWljcm9zb2Z0IENFTyBTYXR5YSBOYWRlbGxhIHJlcG9ydGVkICQ2NUIgcmV2ZW51ZS4iLAogICAgICAgICAgICBleHRyYWN0aW9ucz1bCiAgICAgICAgICAgICAgICBseC5kYXRhLkV4dHJhY3Rpb24oZXh0cmFjdGlvbl9jbGFzcz0iY29tcGFueSIsIGV4dHJhY3Rpb25fdGV4dD0iTWljcm9zb2Z0IiksCiAgICAgICAgICAgICAgICBseC5kYXRhLkV4dHJhY3Rpb24oZXh0cmFjdGlvbl9jbGFzcz0iZXhlY3V0aXZlIiwgZXh0cmFjdGlvbl90ZXh0PSJDRU8gU2F0eWEgTmFkZWxsYSIpLAogICAgICAgICAgICAgICAgbHguZGF0YS5FeHRyYWN0aW9uKGV4dHJhY3Rpb25fY2xhc3M9ImZpbmFuY2lhbF9maWd1cmUiLCBleHRyYWN0aW9uX3RleHQ9IiQ2NUIiKSwKICAgICAgICAgICAgXQogICAgICAgICkKICAgIF0sCiAgICBtb2RlbF9pZD0iZ2VtaW5pLTIuNS1mbGFzaCIKKQoKIyBTYXZlIHJlc3VsdHMgYW5kIGdlbmVyYXRlIGludGVyYWN0aXZlIHZpc3VhbGl6YXRpb24KbHguaW8uc2F2ZV9hbm5vdGF0ZWRfZG9jdW1lbnRzKFtyZXN1bHRdLCBvdXRwdXRfbmFtZT0iZXh0cmFjdGlvbnMuanNvbmwiLCBvdXRwdXRfZGlyPSIuIikKaHRtbF9jb250ZW50ID0gbHgudmlzdWFsaXplKCJleHRyYWN0aW9ucy5qc29ubCIpCgp3aXRoIG9wZW4oInZpc3VhbGl6YXRpb24uaHRtbCIsICJ3IikgYXMgZjoKICAgIGlmIGhhc2F0dHIoaHRtbF9jb250ZW50LCAnZGF0YScpOgogICAgICAgIGYud3JpdGUoaHRtbF9jb250ZW50LmRhdGEpCiAgICBlbHNlOgogICAgICAgIGYud3JpdGUoaHRtbF9jb250ZW50KQ==

Output

💡 What the output shows

Each entity type gets a distinct color in the visualization
Hovering over highlighted text shows the extraction class and any attributes
The full source text is displayed with all entities highlighted inline

Quiz

What does langextract use under the hood to extract entities?

A
Pre-trained NER models to classify tokens

B
LLMs to interpret context and extract structured entities

C
Regex patterns enhanced with machine learning

⚠ Try Again
Not quite. Pre-trained NER models are what spaCy and GLiNER use. langextract takes a different approach by sending text to large language models.

💡 Correct
Correct! langextract sends your text along with a prompt description and few-shot examples to an LLM (like Gemini or GPT), which interprets the context to extract structured entities.

⚠ Try Again
Not quite. Regex uses fixed patterns. langextract relies on LLMs, which understand context and can extract entities they’ve never seen before.

← Previous

Complete & Continue →

When to Use Each Tool
Now that you’ve seen all three tools in action, here’s how they compare across key dimensions to help you decide which fits your workflow:

Feature
spaCy
GLiNER
langextract

Setup
Model download
Model download
API key

Speed
Fast
Moderate
Slower (API)

Cost
Free
Free
Per-request

Privacy
Local
Local
Cloud API

Custom Types
Limited
Zero-shot
Few-shot

Context Understanding
Basic
Good
Best

Here’s when to reach for each tool:

Start with spaCy if your entities fit standard types (PERSON, ORG, MONEY). It’s fast, free, and runs locally.
Move to GLiNER when you need custom entity types. It adds zero-shot flexibility while still running locally.
Use langextract when you need the deepest context understanding. It captures relationships and nuance that local models miss, at the cost of API calls.

← Previous

Complete Course

×
Course Complete!
Nice work finishing this course. Ready to go deeper? Check out these courses with hands-on exercises:


DuckDB for Data Scientists
Query CSV, Parquet, and databases with SQL. No server needed.


Python Data Modeling with Dataclasses and Pydantic
Choose the right data container: dict, NamedTuple, dataclass, or Pydantic.

Browse all courses →

Entity Extraction with spaCy and LLMs Read More »

DuckDB for Data Scientists

/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}

body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}

/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}

/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}

.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}

.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}

.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}

.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}

.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}

/* Navigation */
.nav-section {
margin-bottom: 1rem;
}

.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}

.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}

.nav-item:hover {
background: #3d3b3c;
color: #fff;
}

.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}

.nav-item.completed .status-icon {
color: #72BEFA;
}

.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}

.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}

.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}

/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}

.lesson {
display: none;
}

.lesson.active {
display: block;
}

.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}

.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}

.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.lesson p {
color: #ccc;
margin-bottom: 1rem;
}

.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}

.lesson li {
margin-bottom: 0.5rem;
}

.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}

.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}

.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}

/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}

.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}

.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}

.callout-tip .callout-title {
color: #72BEFA;
}

.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}

.callout-note .callout-title {
color: #72FCDB;
}

.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}

.callout-warning .callout-title {
color: #E583B6;
}

.callout a {
color: #fff;
text-decoration: underline;
}

.callout a:hover {
color: #72FCDB;
}

/* Collapsible callouts */
details.callout {
cursor: pointer;
}

details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}

details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}

details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}

details.callout summary.callout-title::-webkit-details-marker {
display: none;
}

details.callout > p {
margin-top: 0.75rem;
}

.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}

.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}

/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}

.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}

/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}

/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}

.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}

.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}

.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}

.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}

.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}

.quiz-option:disabled {
cursor: default;
}

.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}

.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}

.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}

.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}

.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}

.option-content {
display: block;
flex: 1;
color: #ccc;
}

.option-content code {
background: #282a36;
padding: 0.15rem 0.4rem;
border-radius: 4px;
font-size: 0.85rem;
color: #f8f8f2;
}

.code-option code {
display: block;
padding: 0.5rem 0.75rem;
}

.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}

.quiz-feedback .callout {
margin: 0;
}

/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}

.codecut-run-btn:hover {
background: #5aa8e8;
}

.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}

.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}

.codecut-editor textarea,
.exercise-editor textarea {
display: none;
}

/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}

.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}

.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-demo=”true”] .CodeMirror-scroll {
min-height: auto;
}

/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}

.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}

.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}

.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}

.CodeMirror-sizer {
margin-left: 40px !important;
}

.CodeMirror-cursor {
border-left-color: #72BEFA;
}

.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}

.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}

/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}

.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}

.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}

.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}

.course-image {
max-width: 100%;
height: auto;
border-radius: 4px;
display: block;
margin: 1em 0;
}

pre.mermaid {
text-align: center;
background: transparent;
border: none;
padding: 1em 0;
margin: 1em 0;
}

pre.mermaid svg {
background: transparent !important;
}

.codecut-output img {
max-width: 100%;
height: auto;
border-radius: 4px;
}

.codecut-output.has-image {
max-height: none;
white-space: normal;
}

.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }

.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}

@keyframes spin {
to { transform: rotate(360deg); }
}

/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.exercise-split {
display: flex;
flex-direction: column;
}

.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}

.exercise-assignment p {
margin: 0;
}

.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-section {
flex: 1;
min-width: 200px;
}

.exercise-heading + p {
margin-top: 0;
}

.exercise-assignment em {
color: #ffffff;
font-style: italic;
}

.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}

.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}

.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}

.exercise-secret:last-child {
margin-bottom: 0;
}

.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}

.exercise-secret input:focus {
border-color: #72BEFA;
}

.exercise-secret input::placeholder {
color: #666;
}

.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}

.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}

.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}

.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}

.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}

.exercise-btn:hover {
background: #4d4b4c;
}

.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}

.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}

.exercise-btn.primary:hover {
background: #5aa8e8;
}

.exercise-btn.primary:disabled {
background: #666;
}

.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}

.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}

.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }

.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}

.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}

.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}

/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}

.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}

.lesson-nav-btn:hover {
background: #4a4849;
}

.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}

.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}

/* Completion modal */
.completion-overlay {
display: none;
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.7);
z-index: 1000;
align-items: center;
justify-content: center;
padding: 1rem;
}

.completion-modal {
background: #2F2D2E;
border: 1px solid #4a4849;
border-radius: 16px;
max-width: 520px;
width: 100%;
padding: 2.5rem;
text-align: center;
position: relative;
}

.completion-modal-close {
position: absolute;
top: 1rem;
right: 1rem;
background: none;
border: none;
color: #999;
font-size: 1.25rem;
cursor: pointer;
padding: 0.25rem;
line-height: 1;
}

.completion-modal-close:hover {
color: #fff;
}

.completion-modal h2 {
color: #72BEFA;
font-size: 1.5rem;
margin-bottom: 0.5rem;
}

.completion-modal p {
color: #ccc;
margin-bottom: 1.5rem;
font-size: 0.95rem;
line-height: 1.5;
}

.completion-courses {
display: flex;
flex-direction: column;
gap: 0.75rem;
margin-bottom: 1.5rem;
}

.completion-course-card {
display: block;
background: #3d3b3c;
border: 1px solid #4a4849;
border-radius: 10px;
padding: 1rem 1.25rem;
text-decoration: none;
text-align: left;
transition: border-color 0.2s;
}

.completion-course-card:hover {
border-color: #72BEFA;
}

.completion-course-card .card-title {
color: #72BEFA;
font-size: 0.95rem;
font-weight: 600;
margin-bottom: 0.25rem;
}

.completion-course-card .card-desc {
color: #999;
font-size: 0.8rem;
}

.completion-browse {
display: inline-block;
color: #E583B6;
font-size: 0.9rem;
text-decoration: none;
}

.completion-browse:hover {
text-decoration: underline;
}

/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}

.course-content {
margin-left: 0;
padding: 1.5rem;
}

.course-layout {
flex-direction: column;
}
}

DuckDB for Data Scientists
0 of 25 completed

Getting Started


What is DuckDB?


Installation


Zero Configuration

Working with DataFrames


Integrate Seamlessly with pandas and Polars


Memory Efficiency


Out-of-Core Processing


Fast Performance

SQL Syntax Shortcuts


FROM-First Syntax


GROUP BY ALL


SELECT * EXCLUDE


SELECT * REPLACE

File Operations


Streamlined File Reading


Query Cloud Storage


Automatic Parsing of CSV Files


Automatic Flattening of Nested Parquet Files


Automatic Flattening of Nested JSON Files


Reading Multiple Files


Hive Partitioned Datasets


Exporting Data

Working with Complex Types


Creating Lists, Structs, and Maps


Manipulating Nested Data

Advanced Features


Parameterized Queries


ACID Transactions


Attach External Databases

Summary


Key Takeaways

What is DuckDB?
DuckDB is a fast, in-process SQL OLAP database optimized for analytics. Unlike traditional databases like PostgreSQL or MySQL that require server setup and maintenance, DuckDB runs directly in your Python process.

It’s perfect for data scientists because:

Zero Configuration: No database server setup required
Memory Efficiency: Out-of-core processing for datasets larger than RAM
Familiar Interface: SQL syntax with shortcuts like GROUP BY ALL
Performance: Columnar-vectorized engine faster than pandas
Universal Access: Query files, cloud storage, and external databases

Complete & Continue →

Installation
Install DuckDB with pip:

pip install duckdb

Let’s verify the installation:

Python

Run

aW1wb3J0IGR1Y2tkYgoKcHJpbnQoZiJEdWNrREIgdmVyc2lvbjoge2R1Y2tkYi5fX3ZlcnNpb25fX30iKQpwcmludCgiSW5zdGFsbGF0aW9uIHN1Y2Nlc3NmdWwhIik=

Output

Loading Python…

← Previous

Complete & Continue →

Zero Configuration
SQL operations on DataFrames typically require setting up database servers. With pandas and PostgreSQL, you need to:

Install and configure a database server
Ensure the service is running
Set up credentials and connections
Write the DataFrame to a table first

IyBUcmFkaXRpb25hbCBhcHByb2FjaCB3aXRoIHBhbmRhcyArIFBvc3RncmVTUUwKaW1wb3J0IHBhbmRhcyBhcyBwZApmcm9tIHNxbGFsY2hlbXkgaW1wb3J0IGNyZWF0ZV9lbmdpbmUKCnNhbGVzID0gcGQuRGF0YUZyYW1lKHsKICAgICJwcm9kdWN0IjogWyJBIiwgIkIiLCAiQyJdLAogICAgImFtb3VudCI6IFsxMDAsIDE1MCwgMjAwXQp9KQoKIyBSZXF1aXJlcyBzZXJ2ZXIgc2V0dXAsIGNyZWRlbnRpYWxzLCBydW5uaW5nIHNlcnZpY2UuLi4KZW5naW5lID0gY3JlYXRlX2VuZ2luZSgicG9zdGdyZXNxbDovL3VzZXI6cGFzc0Bsb2NhbGhvc3Q6NTQzMi9kYiIpCnNhbGVzLnRvX3NxbCgic2FsZXMiLCBlbmdpbmUsIGlmX2V4aXN0cz0icmVwbGFjZSIpCgp3aXRoIGVuZ2luZS5jb25uZWN0KCkgYXMgY29ubjoKICAgIHJlc3VsdCA9IHBkLnJlYWRfc3FsKCJTRUxFQ1QgKiBGUk9NIHNhbGVzIiwgY29ubik=

DuckDB eliminates this overhead. Query DataFrames directly with SQL:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpzYWxlcyA9IHBkLkRhdGFGcmFtZSh7CiAgICAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiXSwKICAgICJhbW91bnQiOiBbMTAwLCAxNTAsIDIwMF0KfSkKCiMgTm8gc2VydmVyIG5lZWRlZCAtIHF1ZXJ5IERhdGFGcmFtZSBkaXJlY3RseSEKcmVzdWx0ID0gZHVja2RiLnNxbCgiU0VMRUNUICogRlJPTSBzYWxlcyIpLmRmKCkKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
Notice how the query returns results instantly. There’s no connection string, no server startup time, and no authentication steps.

Try it

Edit the query to select items with quantity greater than 30 from the inventory DataFrame:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgppbnZlbnRvcnkgPSBwZC5EYXRhRnJhbWUoewogICAgIml0ZW0iOiBbIkNoYWlyIiwgIkRlc2siLCAiTGFtcCJdLAogICAgInF1YW50aXR5IjogWzUwLCAyMCwgMTAwXQp9KQoKIyBFZGl0IHRoaXMgcXVlcnkgdG8gZmlsdGVyIGZvciBxdWFudGl0eSA+IDMwCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIlNFTEVDVCAqIEZST00gaW52ZW50b3J5IikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * FROM inventory WHERE quantity > 30").df() “

Quiz

In the code above, how does DuckDB access the sales DataFrame?

A
It automatically detects Python variables and makes them queryable

B
You must register the DataFrame with duckdb.register() first

C
The DataFrame must be saved to disk before querying

💡 Correct
Correct! DuckDB scans your Python namespace and makes DataFrames available as SQL tables automatically.

⚠ Try Again
Not quite. Look at the code above. There’s no duckdb.register() call before the SQL query runs.

⚠ Try Again
Not quite. The DataFrame stays in memory. There’s no file saving step before the SQL query runs.

← Previous

Complete & Continue →

Integrate Seamlessly with pandas and Polars
Have you ever wanted to leverage SQL’s power while working with your favorite data manipulation libraries such as pandas and Polars?

DuckDB makes it seamless to query pandas and Polars DataFrames via the duckdb.sql function.

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCmltcG9ydCBwb2xhcnMgYXMgcGwKCnBkX2RmID0gcGQuRGF0YUZyYW1lKHsiYSI6IFsxLCAyLCAzXSwgImIiOiBbNCwgNSwgNl19KQoKcGxfZGYgPSBwbC5EYXRhRnJhbWUoeyJhIjogWzEsIDIsIDNdLCAiYiI6IFs0LCA1LCA2XX0pCgpwcmludCgiUXVlcnkgcGFuZGFzIERhdGFGcmFtZToiKQpwcmludChkdWNrZGIuc3FsKCJTRUxFQ1QgKiBGUk9NIHBkX2RmIikuZGYoKSkKCnByaW50KCJcblF1ZXJ5IFBvbGFycyBEYXRhRnJhbWU6IikKcHJpbnQoZHVja2RiLnNxbCgiU0VMRUNUICogRlJPTSBwbF9kZiIpLmRmKCkp

Output

💡 What the output shows
DuckDB recognized both pd_df (pandas) and pl_df (Polars) as DataFrame variables and queried them directly with SQL. No imports or registration needed.

DuckDB’s integration with pandas and Polars lets you combine the strengths of each tool. For example, you can:

Use pandas for data cleaning and feature engineering
Use DuckDB for complex aggregations and complex queries

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgZHVja2RiCgojIENyZWF0ZSBzYWxlcyBkYXRhCnNhbGVzID0gcGQuRGF0YUZyYW1lKHsKICAgICJwcm9kdWN0IjogWyJBIiwgIkIiLCAiQyIsICJBIiwgIkIiLCAiQyJdICogMiwKICAgICJyZWdpb24iOiBbIk5vcnRoIiwgIlNvdXRoIl0gKiA2LAogICAgImFtb3VudCI6IFsxMDAsIDE1MCwgMjAwLCAxMjAsIDE4MCwgMjIwLCAxMTAsIDE2MCwgMjEwLCAxMzAsIDE3MCwgMjMwXSwKICAgICJkYXRlIjogcGQuZGF0ZV9yYW5nZSgiMjAyNC0wMS0wMSIsIHBlcmlvZHM9MTIpCn0pCgojIFVzZSBwYW5kYXMgZm9yIGZlYXR1cmUgZW5naW5lZXJpbmcKc2FsZXNbJ21vbnRoJ10gPSBzYWxlc1snZGF0ZSddLmR0Lm1vbnRoCnNhbGVzWydpc19oaWdoX3ZhbHVlJ10gPSBzYWxlc1snYW1vdW50J10gPiAxNTAKcHJpbnQoIlNhbGVzIGFmdGVyIGZlYXR1cmUgZW5naW5lZXJpbmc6IikKcHJpbnQoc2FsZXMuaGVhZCgpKQ==

Output

Loading Python…

💡 What the output shows
pandas makes feature engineering straightforward: extracting month from dates and creating is_high_value flags are common transformations for preparing data for analysis or machine learning.

Now use DuckDB for complex aggregations:

Python

Run

IyBVc2UgRHVja0RCIGZvciBjb21wbGV4IGFnZ3JlZ2F0aW9ucwphbmFseXNpcyA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QKICAgICAgICBwcm9kdWN0LAogICAgICAgIHJlZ2lvbiwKICAgICAgICBDT1VOVCgqKSBhcyB0b3RhbF9zYWxlcywKICAgICAgICBBVkcoYW1vdW50KSBhcyBhdmdfYW1vdW50LAogICAgICAgIFNVTShDQVNFIFdIRU4gaXNfaGlnaF92YWx1ZSBUSEVOIDEgRUxTRSAwIEVORCkgYXMgaGlnaF92YWx1ZV9zYWxlcwogICAgRlJPTSBzYWxlcwogICAgR1JPVVAgQlkgcHJvZHVjdCwgcmVnaW9uCiAgICBPUkRFUiBCWSBhdmdfYW1vdW50IERFU0MKIiIiKS5kZigpCgpwcmludCgiU2FsZXMgYW5hbHlzaXMgYnkgcHJvZHVjdCBhbmQgcmVnaW9uOiIpCnByaW50KGFuYWx5c2lzKQ==

Output

Loading Python…

💡 What the output shows
DuckDB excels at complex aggregations: combining GROUP BY, AVG, and conditional CASE WHEN in a single query is more readable and efficient than equivalent pandas code.

Try it

Edit the query to combine results from both df_2023 and df_2024 using UNION ALL:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpkZl8yMDIzID0gcGQuRGF0YUZyYW1lKHsieWVhciI6IFsyMDIzLCAyMDIzXSwgInNhbGVzIjogWzEwMCwgMTUwXX0pCmRmXzIwMjQgPSBwZC5EYXRhRnJhbWUoeyJ5ZWFyIjogWzIwMjQsIDIwMjRdLCAic2FsZXMiOiBbMjAwLCAyNTBdfSkKCiMgRWRpdCB0byBjb21iaW5lIGJvdGggRGF0YUZyYW1lcyB3aXRoIFVOSU9OIEFMTApyZXN1bHQgPSBkdWNrZGIuc3FsKCJTRUxFQ1QgKiBGUk9NIGRmXzIwMjMiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * FROM df_2023 UNION ALL SELECT * FROM df_2024").df() “

Quiz

What makes DuckDB’s approach to complex aggregations more readable than pandas?

A
All operations are expressed in a single, declarative query

B
DuckDB uses shorter function names

C
DuckDB automatically formats the output

💡 Correct
Correct! SQL lets you express GROUP BY, aggregates, and sorting in one cohesive statement, while pandas requires chaining multiple methods.

⚠ Try Again
Not quite. Function name length isn’t the key difference. Think about how operations are structured.

⚠ Try Again
Not quite. Output formatting isn’t what makes DuckDB’s approach more readable. Look at how the query combines multiple operations.

← Previous

Complete & Continue →

Memory Efficiency
Pandas loads entire datasets into RAM before filtering, which can cause out-of-memory errors. DuckDB processes only the rows that match your filter, using a fraction of the memory. To see this in action, let’s compare both approaches on the same dataset.

First, create a sample CSV file:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKIyBDcmVhdGUgc2FtcGxlIGRhdGEgYW5kIHNhdmUgdG8gQ1NWCmN1c3RvbWVycyA9IHBkLkRhdGFGcmFtZSh7CiAgICAiaWQiOiByYW5nZSgxMDAwKSwKICAgICJuYW1lIjogW2YiQ3VzdG9tZXJfe2l9IiBmb3IgaSBpbiByYW5nZSgxMDAwKV0sCiAgICAicmVnaW9uIjogWyJOb3J0aCIsICJTb3V0aCIsICJFYXN0IiwgIldlc3QiXSAqIDI1MAp9KQpjdXN0b21lcnMudG9fY3N2KCJjdXN0b21lcnMuY3N2IiwgaW5kZXg9RmFsc2UpCnByaW50KGYiQ3JlYXRlZCBjdXN0b21lcnMuY3N2IHdpdGgge2xlbihjdXN0b21lcnMpfSByb3dzIik=

Output

Loading Python…

With pandas, filtering loads ALL records into RAM first:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKIyBSZWFkIGVudGlyZSBDU1YgaW50byBtZW1vcnksIHRoZW4gZmlsdGVyCmRmID0gcGQucmVhZF9jc3YoImN1c3RvbWVycy5jc3YiKQpyZXN1bHQgPSBkZltkZlsicmVnaW9uIl0gPT0gIk5vcnRoIl0KcHJpbnQoZiJMb2FkZWQge2xlbihkZil9IHJvd3MgdG8gZ2V0IHtsZW4ocmVzdWx0KX0gbWF0Y2hlcyIp

Output

Loading Python…

With DuckDB, only matching rows enter memory:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBTdHJlYW0gZnJvbSBmaWxlLCBmaWx0ZXIgZHVyaW5nIHJlYWQKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqCiAgICBGUk9NICdjdXN0b21lcnMuY3N2JwogICAgV0hFUkUgcmVnaW9uID0gJ05vcnRoJwoiIiIpLmRmKCkKcHJpbnQoZiJSZXR1cm5lZCB7bGVuKHJlc3VsdCl9IHJvd3Mgd2l0aG91dCBsb2FkaW5nIGZ1bGwgZmlsZSIp

Output

Loading Python…

The diagram below summarizes the memory difference:

RAM Usage

│ ████████████ Pandas (loads all 1M rows)

│ ██ DuckDB (streams, keeps 10K matches)

└──────────────────────────────────────────────

Quiz

What’s the key difference between how pandas and DuckDB handle the filter region = 'North'?

A
Pandas loads all rows first then filters; DuckDB processes only matching rows

B
Pandas uses more CPU; DuckDB uses more RAM

C
Pandas filters in Python; DuckDB filters in C++

💡 Correct
Correct! Pandas must load the entire DataFrame into memory before applying any filter. DuckDB evaluates the WHERE clause during scanning, never loading non-matching rows.

⚠ Try Again
Not quite. The difference isn’t about CPU vs RAM usage. Think about when filtering happens relative to data loading.

⚠ Try Again
Not quite. While implementation languages differ, the key difference is the order of operations: load-then-filter vs filter-while-loading.

← Previous

Complete & Continue →

Out-of-Core Processing
When your dataset exceeds available RAM, pandas requires workarounds like chunking or Dask. DuckDB solves this with out-of-core processing:

Breaks data into chunks that fit in memory
Processes each chunk separately
Stores intermediate results on disk
Merges results into the final output

Dataset (50GB) Memory (16GB) Disk (temp files)
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ Chunk 1 │ ────────> │ Process │ ─────────> │ result_1.tmp │
│ Chunk 2 │ │ in RAM │ │ result_2.tmp │
│ Chunk 3 │ └──────────┘ │ result_3.tmp │
│ … │ └──────┬───────┘
└──────────┘ │ merge

┌─────────────┐
│ Final Result│
└─────────────┘

The following example demonstrates this process. We’ll limit DuckDB to 10MB of memory and sort 5 million rows:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBDb25maWd1cmUgYSB0aW55IG1lbW9yeSBsaW1pdCB0byBmb3JjZSBkaXNrIHNwaWxsaW5nCmNvbm4gPSBkdWNrZGIuY29ubmVjdCgpCmNvbm4uZXhlY3V0ZSgiU0VUIG1lbW9yeV9saW1pdCA9ICcxME1CJyIpCgojIFNvcnRpbmcgNU0gcm93cyByZXF1aXJlcyBtb3JlIG1lbW9yeSB0aGFuIDEwTUIgYWxsb3dzCnJlc3VsdCA9IGNvbm4uZXhlY3V0ZSgiIiIKICAgIFNFTEVDVCBDT1VOVCgqKSBhcyB0b3RhbF9yb3dzLCBBVkcodmFsdWUpIGFzIGF2Z192YWx1ZQogICAgRlJPTSAoCiAgICAgICAgU0VMRUNUIHJhbmRvbSgpIGFzIHZhbHVlCiAgICAgICAgRlJPTSByYW5nZSg1XzAwMF8wMDApCiAgICAgICAgT1JERVIgQlkgdmFsdWUKICAgICkKIiIiKS5mZXRjaG9uZSgpCgpwcmludChmIlByb2Nlc3NlZCA1TSBzb3J0ZWQgcm93cyB3aXRoIG9ubHkgMTBNQiBtZW1vcnk6IikKcHJpbnQoZiJUb3RhbCByb3dzOiB7cmVzdWx0WzBdOix9LCBBdmVyYWdlOiB7cmVzdWx0WzFdOi40Zn0iKQ==

Output

💡 What the output shows
The query succeeds despite processing 5 million rows with only 10MB of memory. If you check your temp directory while the query runs, you’ll see temporary files being created. These files are automatically deleted once the query completes.

You can also configure where DuckDB stores temporary files:

IyBTZXQgY3VzdG9tIHRlbXAgZGlyZWN0b3J5IGZvciBzcGlsbCBmaWxlcwpjb25uLmV4ZWN1dGUoIlNFVCB0ZW1wX2RpcmVjdG9yeSA9ICcvcGF0aC90by90ZW1wJyIp

Quiz

What does “out-of-core processing” mean?

A
Processing data that exceeds available RAM by spilling to disk

B
Processing data on a remote server

C
Processing data using multiple CPU cores

💡 Correct
Correct! Out-of-core processing allows DuckDB to handle datasets larger than memory by temporarily storing intermediate results on disk.

⚠ Try Again
Not quite. Out-of-core refers to memory management, not server location. It’s about handling data that doesn’t fit in RAM.

⚠ Try Again
Not quite. That’s parallel processing. Out-of-core specifically refers to spilling data to disk when RAM is insufficient.

When running the example above, what happens to the temporary files after the query completes?

A
They remain on disk for future queries

B
They are automatically deleted

C
They are moved to a permanent storage location

⚠ Try Again
Not quite. Temporary files are only needed during query execution. Keeping them would waste disk space.

💡 Correct
Correct! DuckDB automatically cleans up temporary files after the query completes. They only exist during execution to hold intermediate results.

⚠ Try Again
Not quite. Temporary files are deleted, not moved. To persist data, use a persistent database file instead.

← Previous

Complete & Continue →

Fast Performance
While pandas processes data sequentially row-by-row, DuckDB uses a columnar-vectorized execution engine that processes data in parallel chunks. The diagram below shows how each approach handles data:

Pandas DuckDB
│ │
├─ Row 1 ──────> process ├─ Chunk 1 (2048 rows) ─┐
├─ Row 2 ──────> process ├─ Chunk 2 (2048 rows) ─┼─> process
├─ Row 3 ──────> process ├─ Chunk 3 (2048 rows) ─┘
├─ Row 4 ──────> process │
│ … │
▼ ▼
Sequential Parallel chunks

This architectural difference enables DuckDB to significantly outperform pandas, especially for computationally intensive operations like aggregations and joins.

Let’s compare the performance of pandas and DuckDB for aggregations on a million rows of data.

Python

Run

aW1wb3J0IHRpbWUKCiMgUGFuZGFzIGFnZ3JlZ2F0aW9uCnN0YXJ0X3RpbWUgPSB0aW1lLnRpbWUoKQpwYW5kYXNfYWdnID0gY3VzdG9tZXJzLmdyb3VwYnkoWydyZWdpb24nLCAnc2VnbWVudCddKS5zaXplKCkucmVzZXRfaW5kZXgobmFtZT0nY291bnQnKQpwYW5kYXNfdGltZSA9IHRpbWUudGltZSgpIC0gc3RhcnRfdGltZQoKIyBEdWNrREIgYWdncmVnYXRpb24Kc3RhcnRfdGltZSA9IHRpbWUudGltZSgpCmR1Y2tkYl9hZ2cgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUIHJlZ2lvbiwgc2VnbWVudCwgQ09VTlQoKikgYXMgY291bnQgRlJPTSBjdXN0b21lcnMgR1JPVVAgQlkgcmVnaW9uLCBzZWdtZW50CiIiIikuZGYoKQpkdWNrZGJfdGltZSA9IHRpbWUudGltZSgpIC0gc3RhcnRfdGltZQoKcHJpbnQoZiJQYW5kYXMgYWdncmVnYXRpb24gdGltZToge3BhbmRhc190aW1lOi4yZn0gc2Vjb25kcyIpCnByaW50KGYiRHVja0RCIGFnZ3JlZ2F0aW9uIHRpbWU6IHtkdWNrZGJfdGltZTouMmZ9IHNlY29uZHMiKQpwcmludChmIlNwZWVkdXA6IHtwYW5kYXNfdGltZS9kdWNrZGJfdGltZTouMWZ9eCIp

Output

💡 What the output shows
DuckDB completes the same aggregation ~8x faster than pandas. The speedup comes from DuckDB’s columnar-vectorized execution engine processing data in parallel chunks.

📝 Note
The following benchmark was run on native Python. Results may vary in browser-based environments.

Quiz

How does pandas process data differently from DuckDB?

A
Pandas processes rows sequentially; DuckDB processes chunks in parallel

B
Pandas uses disk storage; DuckDB uses only RAM

C
Pandas compiles queries; DuckDB interprets them

💡 Correct
Correct! Pandas iterates through rows one at a time. DuckDB’s columnar-vectorized engine processes multiple rows simultaneously, enabling significant speedups for operations like GROUP BY.

⚠ Try Again
Not quite. Both can work with in-memory data. The difference is in execution strategy, not storage location.

⚠ Try Again
Not quite. This is reversed. DuckDB actually compiles queries into optimized execution plans, while pandas interprets method chains.

← Previous

Complete & Continue →

FROM-First Syntax
Traditional SQL requires SELECT before FROM. This adds unnecessary boilerplate when you just want a quick look at your data:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpzYWxlcyA9IHBkLkRhdGFGcmFtZSh7CiAgICAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiLCAiQSIsICJCIl0sCiAgICAicmVnaW9uIjogWyJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCJdLAogICAgImFtb3VudCI6IFsxMDAsIDIwMCwgMTUwLCAxMjAsIDE4MF0KfSkKCiMgVHJhZGl0aW9uYWwgU1FMCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIlNFTEVDVCAqIEZST00gc2FsZXMiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

DuckDB lets you skip SELECT * entirely, making quick data exploration faster:

Python

Run

IyBEdWNrREI6IEZST00tZmlyc3QgKFNFTEVDVCAqIGlzIGltcGxpZWQpCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIkZST00gc2FsZXMiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
Notice the results are the same. This confirms that FROM table automatically selects all columns.

Try it

Write a FROM-first query to get all sales with amount > 150:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpzYWxlcyA9IHBkLkRhdGFGcmFtZSh7CiAgICAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiLCAiQSIsICJCIl0sCiAgICAicmVnaW9uIjogWyJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCJdLAogICAgImFtb3VudCI6IFsxMDAsIDIwMCwgMTUwLCAxMjAsIDE4MF0KfSkKCiMgV3JpdGUgYSBGUk9NLWZpcnN0IHF1ZXJ5IHdpdGggV0hFUkUgY2xhdXNlCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIl9fXyIpLmRmKCkKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("FROM sales WHERE amount > 150").df() “

Quiz

What happens when you run FROM sales in DuckDB?

A
Returns only the first row from the sales table

B
Returns all rows and columns from the sales table

C
Returns the table schema without data

⚠ Try Again
Not quite. FROM table returns all rows, not just the first one. To limit rows, you’d use FROM table LIMIT 1.

💡 Correct
Correct! FROM table is shorthand for SELECT * FROM table, returning all rows and all columns.

⚠ Try Again
Not quite. FROM table returns data, not schema. To see the schema, use DESCRIBE table or SUMMARIZE table.

← Previous

Complete & Continue →

GROUP BY ALL
When using GROUP BY, you typically repeat every non-aggregated column:

Python

Run

IyBUcmFkaXRpb25hbDogcmVwZWF0IGFsbCBncm91cGluZyBjb2x1bW5zCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgcHJvZHVjdCwgcmVnaW9uLCBTVU0oYW1vdW50KSBhcyB0b3RhbAogICAgRlJPTSBzYWxlcwogICAgR1JPVVAgQlkgcHJvZHVjdCwgcmVnaW9uCiAgICBPUkRFUiBCWSBwcm9kdWN0LCByZWdpb24KIiIiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

DuckDB infers grouping columns automatically with GROUP BY ALL:

Python

Run

IyBEdWNrREI6IEdST1VQIEJZIEFMTCBpbmZlcnMgZ3JvdXBpbmcgY29sdW1ucwpyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUIHByb2R1Y3QsIHJlZ2lvbiwgU1VNKGFtb3VudCkgYXMgdG90YWwKICAgIEZST00gc2FsZXMKICAgIEdST1VQIEJZIEFMTAogICAgT1JERVIgQlkgcHJvZHVjdCwgcmVnaW9uCiIiIikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 What the output shows
Both queries produce the same result. GROUP BY ALL automatically detects product and region as grouping columns, so you don’t have to list them twice.

Try it

Rewrite this query using GROUP BY ALL instead of listing columns:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpzYWxlcyA9IHBkLkRhdGFGcmFtZSh7CiAgICAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiLCAiQSIsICJCIl0sCiAgICAicmVnaW9uIjogWyJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCIsICJTb3V0aCIsICJOb3J0aCJdLAogICAgImFtb3VudCI6IFsxMDAsIDIwMCwgMTUwLCAxMjAsIDE4MF0KfSkKCiMgUmV3cml0ZSB1c2luZyBHUk9VUCBCWSBBTEwKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCByZWdpb24sIENPVU5UKCopIGFzIGNvdW50LCBBVkcoYW1vdW50KSBhcyBhdmdfYW1vdW50CiAgICBGUk9NIHNhbGVzCiAgICBHUk9VUCBCWSByZWdpb24KIiIiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 Solution
“python result = duckdb.sql(""" SELECT region, COUNT(*) as count, AVG(amount) as avg_amount FROM sales GROUP BY ALL """).df() “

Quiz

How does GROUP BY ALL determine which columns to group by?

A
It groups by all columns in the table

B
It groups by the first column in SELECT

C
It groups by columns in SELECT that aren’t inside aggregate functions

⚠ Try Again
Not quite. GROUP BY ALL only groups by columns that appear in your SELECT clause, not all columns in the table.

⚠ Try Again
Not quite. GROUP BY ALL considers all non-aggregated columns in SELECT, not just the first one.

💡 Correct
Correct! GROUP BY ALL automatically identifies columns in your SELECT that aren’t wrapped in aggregate functions like SUM, COUNT, or AVG.

← Previous

Complete & Continue →

SELECT * EXCLUDE
When you need all columns except a few, traditional SQL requires listing every column you want:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgZHVja2RiCgp1c2VycyA9IHBkLkRhdGFGcmFtZSh7CiAgICAiaWQiOiBbMSwgMiwgM10sCiAgICAibmFtZSI6IFsiQWxpY2UiLCAiQm9iIiwgIkNoYXJsaWUiXSwKICAgICJlbWFpbCI6IFsiYUB0ZXN0LmNvbSIsICJiQHRlc3QuY29tIiwgImNAdGVzdC5jb20iXSwKICAgICJwYXNzd29yZF9oYXNoIjogWyJoYXNoMSIsICJoYXNoMiIsICJoYXNoMyJdLAogICAgImNyZWF0ZWRfYXQiOiBbIjIwMjQtMDEtMDEiLCAiMjAyNC0wMS0wMiIsICIyMDI0LTAxLTAzIl0KfSkKCiMgVHJhZGl0aW9uYWw6IGxpc3QgZXZlcnkgY29sdW1uIHlvdSB3YW50CnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgaWQsIG5hbWUsIGVtYWlsLCBjcmVhdGVkX2F0CiAgICBGUk9NIHVzZXJzCiIiIikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

DuckDB’s EXCLUDE lets you specify what to remove instead:

Python

Run

IyBEdWNrREI6IGV4Y2x1ZGUgd2hhdCB5b3UgZG9uJ3Qgd2FudApyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUICogRVhDTFVERSAocGFzc3dvcmRfaGFzaCkKICAgIEZST00gdXNlcnMKIiIiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
The results are identical. EXCLUDE is shorter to write and stays correct even if you add new columns to the table later.

You can also exclude multiple columns at once with a comma-separated list:

Python

Run

cmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqIEVYQ0xVREUgKHBhc3N3b3JkX2hhc2gsIGNyZWF0ZWRfYXQpCiAgICBGUk9NIHVzZXJzCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

Try it

Write a query to get all columns except email from the users table:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgp1c2VycyA9IHBkLkRhdGFGcmFtZSh7CiAgICAiaWQiOiBbMSwgMiwgM10sCiAgICAibmFtZSI6IFsiQWxpY2UiLCAiQm9iIiwgIkNoYXJsaWUiXSwKICAgICJlbWFpbCI6IFsiYUB0ZXN0LmNvbSIsICJiQHRlc3QuY29tIiwgImNAdGVzdC5jb20iXSwKICAgICJwYXNzd29yZF9oYXNoIjogWyJoYXNoMSIsICJoYXNoMiIsICJoYXNoMyJdCn0pCgojIFdyaXRlIGEgcXVlcnkgZXhjbHVkaW5nIHRoZSBlbWFpbCBjb2x1bW4KcmVzdWx0ID0gZHVja2RiLnNxbCgiX19fIikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * EXCLUDE (email) FROM users").df() “

Quiz

What happens when you add a new column to a table queried with SELECT * EXCLUDE (password)?

A
The query fails with an error

B
The new column is excluded by default

C
The new column is automatically included in results

⚠ Try Again
Not quite. The query continues to work. EXCLUDE only removes the specified columns.

⚠ Try Again
Not quite. Only explicitly listed columns are excluded. New columns are included automatically.

💡 Correct
Correct! Since EXCLUDE starts with SELECT *, any new columns added to the table are automatically included in the results.

← Previous

Complete & Continue →

SELECT * REPLACE
When you need to transform one column while keeping others, traditional SQL requires listing every column:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgZHVja2RiCgpwcm9kdWN0cyA9IHBkLkRhdGFGcmFtZSh7CiAgICAibmFtZSI6IFsiTGFwdG9wIiwgIlBob25lIiwgIlRhYmxldCJdLAogICAgInByaWNlX2NlbnRzIjogWzk5OTk5LCA2OTk5OSwgNDQ5OTldLAogICAgInN0b2NrIjogWzEwLCAyNSwgMTVdCn0pCgojIFRyYWRpdGlvbmFsOiBsaXN0IGFsbCBjb2x1bW5zIHdpdGggdHJhbnNmb3JtYXRpb24KcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCBuYW1lLCBwcmljZV9jZW50cyAvIDEwMCBBUyBwcmljZV9jZW50cywgc3RvY2sKICAgIEZST00gcHJvZHVjdHMKIiIiKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

DuckDB’s REPLACE lets you transform just the column you need:

Python

Run

IyBEdWNrREI6IFJFUExBQ0UganVzdCB0aGUgY29sdW1uIHlvdSdyZSB0cmFuc2Zvcm1pbmcKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqIFJFUExBQ0UgKHByaWNlX2NlbnRzIC8gMTAwIEFTIHByaWNlX2NlbnRzKQogICAgRlJPTSBwcm9kdWN0cwoiIiIpLmRmKCkKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
The results are identical. REPLACE modified price_cents while automatically keeping name and stock unchanged.

Try it

Write a query to convert name to uppercase using REPLACE:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgpwcm9kdWN0cyA9IHBkLkRhdGFGcmFtZSh7CiAgICAibmFtZSI6IFsiTGFwdG9wIiwgIlBob25lIiwgIlRhYmxldCJdLAogICAgInByaWNlIjogWzk5OS45OSwgNjk5Ljk5LCA0NDkuOTldLAogICAgInN0b2NrIjogWzEwLCAyNSwgMTVdCn0pCgojIFdyaXRlIGEgcXVlcnkgdG8gbWFrZSBuYW1lIHVwcGVyY2FzZQpyZXN1bHQgPSBkdWNrZGIuc3FsKCJfX18iKS5kZigpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * REPLACE (UPPER(name) AS name) FROM products").df() “

Quiz

What does SELECT * REPLACE (price / 100 AS price) do?

A
Adds a new column called price

B
Transforms the price column while keeping all other columns unchanged

C
Removes the price column from results

⚠ Try Again
Not quite. REPLACE modifies an existing column, it doesn’t add a new one. The column name stays the same.

💡 Correct
Correct! REPLACE transforms the specified column while SELECT * keeps all other columns unchanged.

⚠ Try Again
Not quite. That’s what EXCLUDE does. REPLACE transforms a column, keeping it in the results.

← Previous

Complete & Continue →

Streamlined File Reading
DuckDB can query files directly without loading them into memory first. It supports CSV, Parquet, and JSON formats, automatically detecting structure, delimiters, and column types.

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBRdWVyeSBDU1YgZGlyZWN0bHkgZnJvbSBVUkwKdXJsID0gImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9td2Fza29tL3NlYWJvcm4tZGF0YS9tYXN0ZXIvaXJpcy5jc3YiCgpyZXN1bHQgPSBkdWNrZGIuc3FsKGYiIiIKICAgIFNFTEVDVCBzcGVjaWVzLCBDT1VOVCgqKSBhcyBjb3VudAogICAgRlJPTSAne3VybH0nCiAgICBHUk9VUCBCWSBzcGVjaWVzCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

💡 What the output shows
The query runs directly on a URL without downloading the file first. DuckDB streams the data and returns aggregated counts per species.

Quiz

What’s the advantage of querying a remote file with DuckDB compared to pd.read_csv(url)?

A
DuckDB downloads files faster

B
DuckDB can filter and aggregate before loading all data into memory

C
DuckDB supports more URL protocols

⚠ Try Again
Not quite. The download speed depends on network conditions, not the tool. The key difference is how data is processed.

💡 Correct
Correct! DuckDB pushes filters and aggregations to the streaming process, so it only keeps relevant data in memory. Pandas loads the entire file first, then filters.

⚠ Try Again
Not quite. Both support standard HTTP/HTTPS URLs. The advantage is in how DuckDB processes data during streaming.

← Previous

Complete & Continue →

Query Cloud Storage
DuckDB can query files directly from cloud storage providers like AWS S3, Google Cloud Storage, and Azure Blob Storage without downloading them first.

Query from S3:

Reading files from S3 works just like reading local files. Simply pass an S3 path to read_parquet.

aW1wb3J0IGR1Y2tkYgoKIyBRdWVyeSBQYXJxdWV0IGZpbGUgZGlyZWN0bHkgZnJvbSBTMwpyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUICoKICAgIEZST00gcmVhZF9wYXJxdWV0KCdzMzovL215LWJ1Y2tldC9kYXRhL3NhbGVzLnBhcnF1ZXQnKQogICAgV0hFUkUgeWVhciA9IDIwMjQKICAgIExJTUlUIDEwCiIiIikuZGYoKQ==

For private buckets, add your credentials once and DuckDB handles authentication automatically.

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKCiMgU2V0IHVwIFMzIGNyZWRlbnRpYWxzCmNvbm4uZXhlY3V0ZSgiIiIKICAgIENSRUFURSBTRUNSRVQgKAogICAgICAgIFRZUEUgUzMsCiAgICAgICAgS0VZX0lEICd5b3VyLWFjY2Vzcy1rZXknLAogICAgICAgIFNFQ1JFVCAneW91ci1zZWNyZXQta2V5JywKICAgICAgICBSRUdJT04gJ3VzLWVhc3QtMScKICAgICkKIiIiKQoKIyBOb3cgcXVlcnkgUzMgZmlsZXMKcmVzdWx0ID0gY29ubi5zcWwoIiIiCiAgICBTRUxFQ1QgcmVnaW9uLCBTVU0oYW1vdW50KSBhcyB0b3RhbAogICAgRlJPTSByZWFkX3BhcnF1ZXQoJ3MzOi8vbXktYnVja2V0L3NhbGVzLyoucGFycXVldCcpCiAgICBHUk9VUCBCWSByZWdpb24KIiIiKS5kZigp

The same approach works with other cloud providers.

IyBBenVyZSBCbG9iIFN0b3JhZ2UKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqIEZST00gcmVhZF9wYXJxdWV0KCdhejovL2NvbnRhaW5lci9wYXRoL2ZpbGUucGFycXVldCcpCiIiIikKCiMgR29vZ2xlIENsb3VkIFN0b3JhZ2UKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqIEZST00gcmVhZF9wYXJxdWV0KCdnczovL2J1Y2tldC9wYXRoL2ZpbGUucGFycXVldCcpCiIiIikKCiMgQ2xvdWRmbGFyZSBSMgpyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUICogRlJPTSByZWFkX3BhcnF1ZXQoJ3IyOi8vYnVja2V0L3BhdGgvZmlsZS5wYXJxdWV0JykKIiIiKQ==

Quiz

What makes querying cloud storage in DuckDB different from traditional approaches?

A
DuckDB streams data directly without downloading entire files first

B
DuckDB requires you to download files before querying

C
DuckDB only supports AWS S3

💡 Correct
Correct! DuckDB streams data from cloud storage, applying filters and projections remotely when possible. This is especially efficient for Parquet files where DuckDB only reads the columns and row groups it needs.

⚠ Try Again
Not quite. The key advantage is that DuckDB doesn’t require downloading. It queries files in place, streaming only the data needed.

⚠ Try Again
Not quite. DuckDB supports multiple cloud providers including AWS S3, Azure, Google Cloud Storage, and Cloudflare R2.

← Previous

Complete & Continue →

Automatic Parsing of CSV Files
When working with CSV files that have non-standard delimiters, pandas requires you to specify parameters like delimiter to avoid parsing errors.

With pandas, a pipe-delimited CSV is parsed incorrectly:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKIyBDcmVhdGUgQ1NWIHdpdGggcGlwZSBkZWxpbWl0ZXIKY3N2X2NvbnRlbnQgPSAiIiJGbGlnaHREYXRlfENhcnJpZXJ8T3JpZ2lufERlc3RpbmF0aW9uCjIwMjQtMDEtMDF8QUF8TllDfExBWAoyMDI0LTAxLTAyfFVBfFNGT3xPUkQKMjAyNC0wMS0wM3xETHxBVEx8REVOIiIiCgojIFNhdmUgdG8gZmlsZQp3aXRoIG9wZW4oIi90bXAvZmxpZ2h0cy5jc3YiLCAidyIpIGFzIGY6CiAgICBmLndyaXRlKGNzdl9jb250ZW50KQoKIyBQYW5kYXMgYXNzdW1lcyBjb21tYSBkZWxpbWl0ZXIgLSBwYXJzZXMgaW5jb3JyZWN0bHkhCmRmID0gcGQucmVhZF9jc3YoIi90bXAvZmxpZ2h0cy5jc3YiKQpwcmludCgiUGFuZGFzICh3cm9uZyAtIGFsbCBkYXRhIGluIG9uZSBjb2x1bW4pOiIpCnByaW50KGRmKQ==

Output

Loading Python…

💡 What the output shows
Pandas created a single column containing the entire pipe-separated row as one string. Without specifying delimiter='|', it defaulted to comma separation.

With DuckDB, the delimiter is auto-detected:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBEdWNrREIgYXV0by1kZXRlY3RzIHRoZSBwaXBlIGRlbGltaXRlcgpyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUICogRlJPTSByZWFkX2NzdignL3RtcC9mbGlnaHRzLmNzdicpCiIiIikuZGYoKQoKcHJpbnQoIkR1Y2tEQiAoY29ycmVjdCAtIGF1dG8tZGV0ZWN0ZWQgcGlwZSBkZWxpbWl0ZXIpOiIpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
DuckDB correctly parsed 4 separate columns (FlightDate, Carrier, Origin, Destination) by auto-detecting the pipe delimiter. No configuration needed.

Try it

Change the delimiter from | to ; and run the query. Does DuckDB still parse it correctly?

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBDaGFuZ2UgdGhlIGRlbGltaXRlciBmcm9tIHwgdG8gOyBhbmQgc2VlIGlmIER1Y2tEQiBhdXRvLWRldGVjdHMgaXQKY3N2X2NvbnRlbnQgPSAiIiJGbGlnaHREYXRlfENhcnJpZXJ8T3JpZ2lufERlc3RpbmF0aW9uCjIwMjQtMDEtMDF8QUF8TllDfExBWAoyMDI0LTAxLTAyfFVBfFNGT3xPUkQiIiIKCndpdGggb3BlbigiL3RtcC9mbGlnaHRzX3Rlc3QuY3N2IiwgInciKSBhcyBmOgogICAgZi53cml0ZShjc3ZfY29udGVudCkKCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIlNFTEVDVCAqIEZST00gcmVhZF9jc3YoJy90bXAvZmxpZ2h0c190ZXN0LmNzdicpIikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 Solution
Change | to ; in the csv_content string:
“python csv_content = """FlightDate;Carrier;Origin;Destination 2024-01-01;AA;NYC;LAX 2024-01-02;UA;SFO;ORD""" “
DuckDB auto-detects semicolons, tabs, pipes, and other common delimiters.

Quiz

How did DuckDB correctly parse the pipe-delimited file?

A
It required delimiter='|' in the read_csv function

B
It automatically detected the pipe delimiter

C
It converted pipes to commas before parsing

⚠ Try Again
Not quite. Look at the code. There’s no delimiter parameter. DuckDB figured it out automatically.

💡 Correct
Correct! DuckDB’s read_csv function analyzes the file content and automatically detects the delimiter, whether it’s commas, pipes, tabs, or other characters.

⚠ Try Again
Not quite. DuckDB doesn’t modify the file. It detects and uses the original delimiter directly.

← Previous

Complete & Continue →

Automatic Flattening of Nested Parquet Files
When working with large, nested Parquet files, you typically need to pre-process the data to flatten nested structures or write complex extraction scripts, which adds time and complexity to your workflow.

With pandas, you need to manually flatten nested structures:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKIyBDcmVhdGUgYSBuZXN0ZWQgZGF0YXNldApkYXRhID0gewogICAgImlkIjogWzEsIDJdLAogICAgImRldGFpbHMiOiBbeyJuYW1lIjogIkFsaWNlIiwgImFnZSI6IDI1fSwgeyJuYW1lIjogIkJvYiIsICJhZ2UiOiAzMH1dCn0KCmRmID0gcGQuRGF0YUZyYW1lKGRhdGEpCgojIFNhdmUgYXMgYSBuZXN0ZWQgUGFycXVldCBmaWxlCmRmLnRvX3BhcnF1ZXQoIi90bXAvY3VzdG9tZXJzLnBhcnF1ZXQiKQoKIyBSZWFkIGJhY2sgLSBuZXN0ZWQgc3RydWN0dXJlIGlzIHByZXNlcnZlZApkZiA9IHBkLnJlYWRfcGFycXVldCgiL3RtcC9jdXN0b21lcnMucGFycXVldCIpCnByaW50KCJQYW5kYXMgKG5lc3RlZCBzdHJ1Y3R1cmUpOiIpCnByaW50KGRmKQ==

Output

Loading Python…

💡 What the output shows
The details column contains dictionaries. To access name or age, you’d need to manually flatten with list comprehensions or apply().

With DuckDB, query nested fields directly using dot notation:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBRdWVyeSBuZXN0ZWQgUGFycXVldCBmaWxlIGRpcmVjdGx5IHdpdGggZG90IG5vdGF0aW9uCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QKICAgICAgICBpZCwKICAgICAgICBkZXRhaWxzLm5hbWUgQVMgbmFtZSwKICAgICAgICBkZXRhaWxzLmFnZSBBUyBhZ2UKICAgIEZST00gcmVhZF9wYXJxdWV0KCcvdG1wL2N1c3RvbWVycy5wYXJxdWV0JykKIiIiKS5kZigpCgpwcmludCgiRHVja0RCIChmbGF0dGVuZWQgd2l0aCBkb3Qgbm90YXRpb24pOiIpCnByaW50KHJlc3VsdCk=

Output

💡 What the output shows
DuckDB’s dot notation (details.name, details.age) extracts nested fields directly in SQL. No manual flattening, list comprehensions, or apply() needed.

Try it

Fill in the dot notation to extract color and weight from the specs struct:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgojIENyZWF0ZSBuZXN0ZWQgcHJvZHVjdCBkYXRhCnByb2R1Y3RzID0gcGQuRGF0YUZyYW1lKHsKICAgICJwcm9kdWN0X2lkIjogWzEsIDIsIDNdLAogICAgIm5hbWUiOiBbIkxhcHRvcCIsICJQaG9uZSIsICJUYWJsZXQiXSwKICAgICJzcGVjcyI6IFsKICAgICAgICB7ImNvbG9yIjogIlNpbHZlciIsICJ3ZWlnaHQiOiAyLjV9LAogICAgICAgIHsiY29sb3IiOiAiQmxhY2siLCAid2VpZ2h0IjogMC40fSwKICAgICAgICB7ImNvbG9yIjogIldoaXRlIiwgIndlaWdodCI6IDAuOH0KICAgIF0KfSkKcHJvZHVjdHMudG9fcGFycXVldCgiL3RtcC9wcm9kdWN0cy5wYXJxdWV0IikKCiMgRmlsbCBpbiB0aGUgZG90IG5vdGF0aW9uIHRvIGV4dHJhY3QgY29sb3IgYW5kIHdlaWdodApyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUIG5hbWUsIF9fXyBBUyBjb2xvciwgX19fIEFTIHdlaWdodAogICAgRlJPTSByZWFkX3BhcnF1ZXQoJy90bXAvcHJvZHVjdHMucGFycXVldCcpCiIiIikuZGYoKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 Solution
“python result = duckdb.sql(""" SELECT name, specs.color AS color, specs.weight AS weight FROM read_parquet('/tmp/products.parquet') """).df() “

Quiz

How does DuckDB access nested fields in Parquet files?

A
Using bracket notation like details['name']

B
Calling a flatten() function first

C
Using dot notation like details.name

⚠ Try Again
Not quite. Bracket notation is Python syntax. In DuckDB SQL, you use dot notation to access nested fields.

⚠ Try Again
Not quite. DuckDB doesn’t require a separate flatten step. You access nested fields directly in the SELECT clause.

💡 Correct
Correct! DuckDB’s dot notation lets you navigate nested structures directly in SQL: details.name extracts the name field from the details struct.

← Previous

Complete & Continue →

Automatic Flattening of Nested JSON Files
When working with JSON files that have nested structures, you need to normalize the data with pandas to access nested fields.

With pandas, you need json_normalize() to flatten nested structures:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQganNvbgoKIyBDcmVhdGUgbmVzdGVkIEpTT04gZGF0YQpkYXRhID0gWwogICAgeyJ1c2VyX2lkIjogMSwgInByb2ZpbGUiOiB7Im5hbWUiOiAiQWxpY2UiLCAiYWN0aXZlIjogVHJ1ZX19LAogICAgeyJ1c2VyX2lkIjogMiwgInByb2ZpbGUiOiB7Im5hbWUiOiAiQm9iIiwgImFjdGl2ZSI6IEZhbHNlfX0sCiAgICB7InVzZXJfaWQiOiAzLCAicHJvZmlsZSI6IHsibmFtZSI6ICJDaGFybGllIiwgImFjdGl2ZSI6IFRydWV9fQpdCgp3aXRoIG9wZW4oIi90bXAvdXNlcnMuanNvbiIsICJ3IikgYXMgZjoKICAgIGpzb24uZHVtcChkYXRhLCBmKQoKIyBQYW5kYXM6IG5lZWQganNvbl9ub3JtYWxpemUgdG8gZmxhdHRlbgpkZiA9IHBkLmpzb25fbm9ybWFsaXplKGRhdGEpCnByaW50KCJQYW5kYXMgKHJlcXVpcmVzIGpzb25fbm9ybWFsaXplKToiKQpwcmludChkZik=

Output

Loading Python…

💡 What the output shows
Flattening nested JSON required importing json_normalize(), calling it on your data, then likely renaming the dot-notation columns (profile.name, profile.active) to something cleaner.

With DuckDB, you can query each nested field directly with the syntax field_name.nested_field_name:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBEdWNrREI6IHF1ZXJ5IG5lc3RlZCBmaWVsZHMgd2l0aCBkb3Qgbm90YXRpb24KcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVAogICAgICAgIHVzZXJfaWQsCiAgICAgICAgcHJvZmlsZS5uYW1lIEFTIG5hbWUsCiAgICAgICAgcHJvZmlsZS5hY3RpdmUgQVMgaXNfYWN0aXZlCiAgICBGUk9NIHJlYWRfanNvbignL3RtcC91c2Vycy5qc29uJykKIiIiKS5kZigpCgpwcmludCgiRHVja0RCIChkb3Qgbm90YXRpb24gaW4gU1FMKToiKQpwcmludChyZXN1bHQp

Output

Loading Python…

💡 What the output shows
DuckDB queries nested JSON fields with the same dot notation as Parquet. No json_normalize() step needed before analysis.

Quiz

In the query above, what does profile.name AS name do?

A
Extracts the name field from the nested profile object and renames it

B
Creates a new column called profile.name

C
Joins the profile table with the name table

💡 Correct
Correct! The dot notation navigates into the profile struct to get name, and AS name gives the result column a cleaner alias.

⚠ Try Again
Not quite. The dot notation extracts a nested field. AS name renames the output column to just name, not profile.name.

⚠ Try Again
Not quite. There’s no JOIN here. The dot notation accesses a nested field within a single column, not a separate table.

← Previous

Complete & Continue →

Reading Multiple Files
Reading Multiple Files from a Directory

Reading multiple files from a directory is common in data pipelines.

First, let’s create some sample CSV files to work with:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZApmcm9tIHBhdGhsaWIgaW1wb3J0IFBhdGgKCiMgQ3JlYXRlIHNhbXBsZSBzYWxlcyBmaWxlcwpQYXRoKCIvdG1wL3NhbGVzIikubWtkaXIoZXhpc3Rfb2s9VHJ1ZSkKCmphbiA9IHBkLkRhdGFGcmFtZSh7Im1vbnRoIjogWyJKYW4iXSozLCAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiXSwgInNhbGVzIjogWzEwMCwgMjAwLCAxNTBdfSkKZmViID0gcGQuRGF0YUZyYW1lKHsibW9udGgiOiBbIkZlYiJdKjMsICJwcm9kdWN0IjogWyJBIiwgIkIiLCAiQyJdLCAic2FsZXMiOiBbMTIwLCAxODAsIDE2MF19KQoKamFuLnRvX2NzdigiL3RtcC9zYWxlcy9qYW4uY3N2IiwgaW5kZXg9RmFsc2UpCmZlYi50b19jc3YoIi90bXAvc2FsZXMvZmViLmNzdiIsIGluZGV4PUZhbHNlKQoKcHJpbnQoIkNyZWF0ZWQgamFuLmNzdiBhbmQgZmViLmNzdiBpbiAvdG1wL3NhbGVzLyIp

Output

Loading Python…

With pandas, you need to read each file separately then concatenate:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKIyBSZWFkIGVhY2ggZmlsZSBpbmRpdmlkdWFsbHkKZGYxID0gcGQucmVhZF9jc3YoIi90bXAvc2FsZXMvamFuLmNzdiIpCmRmMiA9IHBkLnJlYWRfY3N2KCIvdG1wL3NhbGVzL2ZlYi5jc3YiKQoKIyBNYW51YWxseSBjb21iaW5lIHRoZW0KY29tYmluZWQgPSBwZC5jb25jYXQoW2RmMSwgZGYyXSkKCnByaW50KCJQYW5kYXMgKHJlYWQgZWFjaCBmaWxlIHNlcGFyYXRlbHksIHRoZW4gY29uY2F0KToiKQpwcmludChjb21iaW5lZCk=

Output

Loading Python…

💡 What the output shows
Pandas required reading each file into separate DataFrames, then concatenating them. With more files, this approach becomes tedious and error-prone.

With DuckDB, use glob patterns to read all files at once:

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBEdWNrREI6IHJlYWQgYWxsIGZpbGVzIHdpdGggZ2xvYiBwYXR0ZXJuCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgbW9udGgsIHByb2R1Y3QsIHNhbGVzCiAgICBGUk9NICcvdG1wL3NhbGVzLyouY3N2JwogICAgT1JERVIgQlkgbW9udGgsIHByb2R1Y3QKIiIiKS5kZigpCgpwcmludCgiRHVja0RCIChzaW5nbGUgcXVlcnkgd2l0aCBnbG9iIHBhdHRlcm4pOiIpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
Both January and February data appear in a single result from one query. The *.csv glob pattern matched all CSV files in the directory automatically.

Quiz

Why is the glob pattern useful as your data grows?

A
Adding new CSV files to the directory automatically includes them in queries

B
It compresses files to save disk space

C
It converts CSV files to a faster format

💡 Correct
Correct! If you add mar.csv and apr.csv to the directory, the same *.csv query will include them without code changes. Your pipeline scales automatically.

⚠ Try Again
Not quite. Glob patterns don’t affect file storage. They specify which files to read, not how files are stored.

⚠ Try Again
Not quite. Glob patterns don’t convert file formats. They simply match and read multiple files at once.

Read From Multiple Sources

DuckDB allows you to read data from multiple sources in a single query, making it easier to combine data from different sources.

You can mix DataFrames, CSV files, Parquet files, and JSON files seamlessly.

First, let’s create two related DataFrames:

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAoKY3VzdG9tZXJzID0gcGQuRGF0YUZyYW1lKHsKICAgICJjdXN0b21lcl9pZCI6IFsxLCAyLCAzXSwKICAgICJuYW1lIjogWyJBbGljZSIsICJCb2IiLCAiQ2hhcmxpZSJdLAogICAgInJlZ2lvbiI6IFsiTm9ydGgiLCAiU291dGgiLCAiTm9ydGgiXQp9KQoKb3JkZXJzID0gcGQuRGF0YUZyYW1lKHsKICAgICJvcmRlcl9pZCI6IFsxMDEsIDEwMiwgMTAzLCAxMDRdLAogICAgImN1c3RvbWVyX2lkIjogWzEsIDIsIDEsIDNdLAogICAgImFtb3VudCI6IFsyNTAsIDE1MCwgMzAwLCAyMDBdCn0pCgpwcmludCgiQ3VzdG9tZXJzOiIpCnByaW50KGN1c3RvbWVycykKcHJpbnQoIlxuT3JkZXJzOiIpCnByaW50KG9yZGVycyk=

Output

Loading Python…

Now use DuckDB to join and aggregate these DataFrames with a single SQL query:

Python

Run

aW1wb3J0IGR1Y2tkYgoKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVAogICAgICAgIGMubmFtZSwKICAgICAgICBjLnJlZ2lvbiwKICAgICAgICBDT1VOVChvLm9yZGVyX2lkKSBhcyBvcmRlcl9jb3VudCwKICAgICAgICBTVU0oby5hbW91bnQpIGFzIHRvdGFsX3NwZW50CiAgICBGUk9NIGN1c3RvbWVycyBjCiAgICBKT0lOIG9yZGVycyBvIE9OIGMuY3VzdG9tZXJfaWQgPSBvLmN1c3RvbWVyX2lkCiAgICBHUk9VUCBCWSBjLm5hbWUsIGMucmVnaW9uCiAgICBPUkRFUiBCWSB0b3RhbF9zcGVudCBERVNDCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
The query joined two separate DataFrames (customers and orders) with standard SQL syntax. No need to merge DataFrames in pandas first.

Quiz

What pandas operation does DuckDB’s JOIN replace?

A
pd.concat()

B
pd.merge()

C
pd.pivot()

⚠ Try Again
Not quite. concat() stacks DataFrames vertically. JOIN combines DataFrames horizontally based on matching keys.

💡 Correct
Correct! In pandas, you’d use pd.merge(customers, orders, on='customer_id') to combine the DataFrames. DuckDB’s SQL JOIN does this directly without an intermediate merge step.

⚠ Try Again
Not quite. pivot() reshapes data from long to wide format. JOIN combines rows from different tables based on a related column.

← Previous

Complete & Continue →

Hive Partitioned Datasets
When working with large datasets, it’s common to partition data into separate files by date, region, or other columns. A typical approach is organizing files into folders:

data/
├── 2023/
│ ├── 01/data.parquet
│ └── 02/data.parquet
└── 2024/
└── 01/data.parquet

With this structure, you lose the partition information when reading. You’d need to extract year and month from file paths manually.

Hive-style partitioning solves this by encoding column values in directory names:

data/
├── year=2023/
│ ├── month=01/data.parquet
│ └── month=02/data.parquet
└── year=2024/
└── month=01/data.parquet

DuckDB can both write and read Hive-partitioned datasets automatically.

Writing partitioned data:

Use PARTITION_BY to create a Hive-partitioned dataset:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgojIENyZWF0ZSBzYW1wbGUgZGF0YQpvcmRlcnMgPSBwZC5EYXRhRnJhbWUoewogICAgIm9yZGVyX2lkIjogcmFuZ2UoMSwgNyksCiAgICAicHJvZHVjdCI6IFsiQSIsICJCIiwgIkMiLCAiQSIsICJCIiwgIkMiXSwKICAgICJhbW91bnQiOiBbMTAwLCAyMDAsIDE1MCwgMTIwLCAxODAsIDE2MF0sCiAgICAieWVhciI6IFsyMDIzLCAyMDIzLCAyMDIzLCAyMDI0LCAyMDI0LCAyMDI0XSwKICAgICJxdWFydGVyIjogWyJRMSIsICJRMSIsICJRMiIsICJRMSIsICJRMSIsICJRMiJdCn0pCgojIFdyaXRlIGFzIHBhcnRpdGlvbmVkIFBhcnF1ZXQKZHVja2RiLnNxbCgiIiIKICAgIENPUFkgb3JkZXJzIFRPICcvdG1wL29yZGVyc19wYXJ0aXRpb25lZCcKICAgIChGT1JNQVQgcGFycXVldCwgUEFSVElUSU9OX0JZICh5ZWFyLCBxdWFydGVyKSkKIiIiKQoKcHJpbnQoIlBhcnRpdGlvbmVkIGRhdGFzZXQgd3JpdHRlbiB0byAvdG1wL29yZGVyc19wYXJ0aXRpb25lZC8iKQ==

Output

Loading Python…

Verify the directory structure:

Python

Run

aW1wb3J0IG9zCgpmb3Igcm9vdCwgZGlycywgZmlsZXMgaW4gb3Mud2FsaygiL3RtcC9vcmRlcnNfcGFydGl0aW9uZWQiKToKICAgIGxldmVsID0gcm9vdC5yZXBsYWNlKCIvdG1wL29yZGVyc19wYXJ0aXRpb25lZCIsICIiKS5jb3VudChvcy5zZXApCiAgICBpbmRlbnQgPSAiICAiICogbGV2ZWwKICAgIHByaW50KGYie2luZGVudH17b3MucGF0aC5iYXNlbmFtZShyb290KX0vIikKICAgIGZvciBmaWxlIGluIGZpbGVzOgogICAgICAgIHByaW50KGYie2luZGVudH0gIHtmaWxlfSIp

Output

Loading Python…

💡 What the output shows
DuckDB created directories like year=2023/quarter=Q1/. Each partition folder contains only rows matching those values.

Quiz

What does PARTITION_BY (year, quarter) do when writing data?

A
Sorts the data by year and quarter

B
Filters out rows without year or quarter values

C
Creates separate folders for each unique combination of year and quarter

⚠ Try Again
Not quite. PARTITION_BY organizes data into folders, not sorting within a file.

⚠ Try Again
Not quite. PARTITION_BY doesn’t filter data. It organizes all rows into folders based on their column values.

💡 Correct
Correct! PARTITION_BY creates a folder structure like year=2023/quarter=Q1/ for each unique combination of values.

Reading partitioned data:

Now read the data back. DuckDB auto-detects the column=value folder pattern:

Python

Run

IyBSZWFkIHdpdGggYXV0b21hdGljIHBhcnRpdGlvbiBkZXRlY3Rpb24KcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCAqCiAgICBGUk9NIHJlYWRfcGFycXVldCgnL3RtcC9vcmRlcnNfcGFydGl0aW9uZWQvKi8qLyoucGFycXVldCcpCiIiIikuZGYoKQoKcHJpbnQoIkRhdGEgd2l0aCBwYXJ0aXRpb24gY29sdW1ucyBleHRyYWN0ZWQ6IikKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
Notice year and quarter columns are included in the results. DuckDB extracted these from the directory names without any extra code.

Quiz

Why partition large datasets into folders like year=2023/quarter=Q1/?

A
It compresses files better

B
It makes files easier to rename

C
It allows queries to skip irrelevant partitions and read less data

⚠ Try Again
Not quite. Partitioning organizes files into folders but doesn’t affect compression within files.

⚠ Try Again
Not quite. The main benefit is query performance, not file management.

💡 Correct
Correct! When you filter by year=2024, DuckDB only reads files in the year=2024/ folder, skipping all other years entirely.

← Previous

Complete & Continue →

Exporting Data
DuckDB can export query results directly to CSV, Parquet, and JSON files without converting to pandas first. This avoids memory overhead for large datasets.

Export to CSV:

Use .write_csv() to save query results directly to a CSV file:

Python

Run

aW1wb3J0IGR1Y2tkYgppbXBvcnQgcGFuZGFzIGFzIHBkCgojIENyZWF0ZSBzYW1wbGUgZGF0YQpzYWxlcyA9IHBkLkRhdGFGcmFtZSh7CiAgICAicHJvZHVjdCI6IFsiTGFwdG9wIiwgIlBob25lIiwgIlRhYmxldCJdLAogICAgInJlZ2lvbiI6IFsiTm9ydGgiLCAiU291dGgiLCAiRWFzdCJdLAogICAgInJldmVudWUiOiBbNTAwMDAsIDM1MDAwLCAyODAwMF0KfSkKCiMgRXhwb3J0IHF1ZXJ5IHJlc3VsdHMgZGlyZWN0bHkgdG8gQ1NWCmR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgcHJvZHVjdCwgcmV2ZW51ZQogICAgRlJPTSBzYWxlcwogICAgV0hFUkUgcmV2ZW51ZSA+IDMwMDAwCiAgICBPUkRFUiBCWSByZXZlbnVlIERFU0MKIiIiKS53cml0ZV9jc3YoIi90bXAvaGlnaF9yZXZlbnVlLmNzdiIpCgojIFZlcmlmeSB0aGUgZXhwb3J0CnByaW50KCJFeHBvcnRlZCBDU1YgY29udGVudHM6IikKcHJpbnQob3BlbigiL3RtcC9oaWdoX3JldmVudWUuY3N2IikucmVhZCgpKQ==

Output

Loading Python…

Export to Parquet:

Use .write_parquet() for columnar storage with optional compression:

Python

Run

IyBFeHBvcnQgdG8gUGFycXVldCB3aXRoIGNvbXByZXNzaW9uCmR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgKgogICAgRlJPTSBzYWxlcwoiIiIpLndyaXRlX3BhcnF1ZXQoIi90bXAvc2FsZXNfZXhwb3J0LnBhcnF1ZXQiLCBjb21wcmVzc2lvbj0ic25hcHB5IikKCiMgVmVyaWZ5IGJ5IHJlYWRpbmcgaXQgYmFjawpyZXN1bHQgPSBkdWNrZGIuc3FsKCJGUk9NICcvdG1wL3NhbGVzX2V4cG9ydC5wYXJxdWV0JyIpLmRmKCkKcHJpbnQoIkV4cG9ydGVkIFBhcnF1ZXQgY29udGVudHM6IikKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
Both .write_csv() and .write_parquet() export query results directly. No pandas conversion needed.

Using COPY statement:

The COPY statement provides more control over export options:

DELIMITER '|' – Custom delimiters (comma, pipe, tab)
HEADER true – Include or exclude header row
COMPRESSION 'gzip' – Compress output (gzip, zstd, snappy)
DATEFORMAT '%Y-%m-%d' – Custom date formatting
NULL 'NA' – Custom null value representation

Here’s an example using a custom delimiter and header option together.

Python

Run

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKY29ubi5leGVjdXRlKCJDUkVBVEUgVEFCTEUgZXhwb3J0X2RlbW8gQVMgU0VMRUNUICogRlJPTSBzYWxlcyIpCgojIENPUFkgd2l0aCBvcHRpb25zCmNvbm4uZXhlY3V0ZSgiIiIKICAgIENPUFkgZXhwb3J0X2RlbW8gVE8gJy90bXAvZXhwb3J0X3dpdGhfb3B0aW9ucy5jc3YnCiAgICAoSEVBREVSIHRydWUsIERFTElNSVRFUiAnfCcpCiIiIikKCnByaW50KCJFeHBvcnRlZCB3aXRoIHBpcGUgZGVsaW1pdGVyOiIpCnByaW50KG9wZW4oIi90bXAvZXhwb3J0X3dpdGhfb3B0aW9ucy5jc3YiKS5yZWFkKCkp

Output

Loading Python…

💡 What the output shows
The output uses pipe (|) delimiters instead of commas, as specified by DELIMITER '|'.

Quiz

What’s the advantage of .write_parquet() over converting to pandas first?

A
It avoids creating an intermediate DataFrame in memory

B
Parquet files are smaller when created by DuckDB

C
It provides faster export performance

💡 Correct
Correct! .write_parquet() streams results directly to the file. For large datasets, this avoids the memory overhead of creating a pandas DataFrame as an intermediate step.

⚠ Try Again
Not quite. The file size depends on the data and compression settings, not which tool creates the file.

⚠ Try Again
Not quite. Both methods have similar export performance.

← Previous

Complete & Continue →

Creating Lists, Structs, and Maps
DuckDB supports rich nested data types that go beyond traditional SQL databases. You can create and manipulate LISTs, STRUCTs, and MAPs directly in SQL.

LIST – ordered collection of values:

Unlike traditional SQL arrays, DuckDB lists use familiar Python-style square brackets.

Python

Run

aW1wb3J0IGR1Y2tkYgoKIyBDcmVhdGUgYSBsaXN0CnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgWzEsIDIsIDMsIDQsIDVdIEFTIG51bWJlcnMsCiAgICAgICAgICAgWydhcHBsZScsICdiYW5hbmEnLCAnY2hlcnJ5J10gQVMgZnJ1aXRzCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

Quiz

What type of values can a DuckDB list contain?

A
Only numbers

B
Only strings

C
Any single type (numbers, strings, etc.)

⚠ Try Again
Not quite. Lists can contain strings, booleans, and other types too.

⚠ Try Again
Not quite. Lists can contain numbers, booleans, and other types too.

💡 Correct
Correct! A DuckDB list can contain any type, but all elements must be the same type within a single list.

STRUCT – named fields (like a dictionary):

Structs require the same field names. DuckDB throws an error if fields don’t match.

Python

Run

IyBBbGwgcm93cyBtdXN0IGhhdmUgdGhlIHNhbWUgZmllbGRzCnJlc3VsdCA9IGR1Y2tkYi5zcWwoIiIiCiAgICBTRUxFQ1QgeyduYW1lJzogJ0FsaWNlJywgJ2FnZSc6IDMwfSBBUyBwZXJzb24KICAgIFVOSU9OIEFMTAogICAgU0VMRUNUIHsnbmFtZSc6ICdCb2InLCAnYWdlJzogMjV9CiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
Both rows have identical fields: name and age. Trying to UNION structs with different field names would throw an error.

MAP – key-value pairs:

Use maps when each row needs different keys. No schema enforcement.

Python

Run

IyBFYWNoIHJvdyBrZWVwcyBvbmx5IGl0cyBvd24ga2V5cwpyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUICdBbGljZScgQVMgbmFtZSwgTUFQKFsnc3RlcHMnXSwgWzEwMDAwXSkgQVMgbWV0cmljcwogICAgVU5JT04gQUxMCiAgICBTRUxFQ1QgJ0JvYicsIE1BUChbJ2NhbG9yaWVzJ10sIFsyMDAwXSkKIiIiKS5kZigpCgpwcmludChyZXN1bHQp

Output

Loading Python…

💡 What the output shows
Alice has only steps, Bob has only calories. Maps allow different keys per row without errors.

Quiz

You’re storing product attributes where laptops have RAM and shirts have size. Which type should you use?

A
STRUCT, because it groups related fields together

B
MAP, because each product can have different attributes

C
LIST, because attributes are ordered

⚠ Try Again
Not quite. STRUCTs require all rows to have the same fields. Different attributes would cause an error.

💡 Correct
Correct! MAPs let each product store different attributes without requiring a fixed schema.

⚠ Try Again
Not quite. LISTs store ordered values, not key-value pairs.

← Previous

Complete & Continue →

Manipulating Nested Data
In traditional SQL, working with nested data requires complex joins or custom functions. DuckDB provides native operations: list indexing, Python-style comprehensions, dot notation for structs, and UNNEST to flatten lists.

Access list elements:

You can access elements using 1-based indexing, negative indices for the end, and slicing.

Python

Run

aW1wb3J0IGR1Y2tkYgoKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVAogICAgICAgIFsxMCwgMjAsIDMwLCA0MCwgNTBdIEFTIG51bWJlcnMsCiAgICAgICAgWzEwLCAyMCwgMzAsIDQwLCA1MF1bMV0gQVMgZmlyc3QsCiAgICAgICAgWzEwLCAyMCwgMzAsIDQwLCA1MF1bLTFdIEFTIGxhc3QsCiAgICAgICAgWzEwLCAyMCwgMzAsIDQwLCA1MF1bMjo0XSBBUyBzbGljZQoiIiIpLmRmKCkKCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
Unlike Python’s 0-based indexing, DuckDB lists start at 1. The slice [2:4] includes both endpoints.

Try it

Write the slice to extract [20, 30, 40] from the list:

Python

Run

aW1wb3J0IGR1Y2tkYgoKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCBbMTAsIDIwLCAzMCwgNDAsIDUwXVtfX19dIEFTIG1pZGRsZV90aHJlZQoiIiIpLmRmKCkKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 Solution
“python SELECT [10, 20, 30, 40, 50][2:4] AS middle_three “
The slice [2:4] gets elements from index 2 to 4 (inclusive), which are 20, 30, and 40.

Transform lists with list comprehensions:

You can transform lists directly in SQL using Python-style comprehensions.

Python

Run

IyBQeXRob24tbGlrZSBsaXN0IGNvbXByZWhlbnNpb25zIGluIFNRTApyZXN1bHQgPSBkdWNrZGIuc3FsKCIiIgogICAgU0VMRUNUCiAgICAgICAgW3ggKiAyIEZPUiB4IElOIFsxLCAyLCAzLCA0LCA1XV0gQVMgZG91YmxlZCwKICAgICAgICBbeCBGT1IgeCBJTiBbMSwgMiwgMywgNCwgNV0gSUYgeCA+IDJdIEFTIGZpbHRlcmVkCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
The doubled column multiplies each element by 2, while filtered keeps only values greater than 2. No Python code needed.

Quiz

How do you filter a list to keep only values greater than 5?

A
[x FOR x IN list WHERE x > 5]

B
[x FOR x IN list IF x > 5]

C
list.filter(x > 5)

⚠ Try Again
Not quite. List comprehensions use IF for filtering, not WHERE.

💡 Correct
Correct! DuckDB list comprehensions use IF for filtering, following Python’s syntax.

⚠ Try Again
Not quite. DuckDB uses SQL-style list comprehensions, not method chaining.

Access struct fields:

DuckDB lets you access struct fields using dot notation, just like object properties.

Python

Run

cmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVAogICAgICAgIHsnbmFtZSc6ICdCb2InLCAnc2NvcmUnOiA5NX0gQVMgc3R1ZGVudCwKICAgICAgICB7J25hbWUnOiAnQm9iJywgJ3Njb3JlJzogOTV9Lm5hbWUgQVMgc3R1ZGVudF9uYW1lLAogICAgICAgIHsnbmFtZSc6ICdCb2InLCAnc2NvcmUnOiA5NX0uc2NvcmUgQVMgc3R1ZGVudF9zY29yZQoiIiIpLmRmKCkKCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
Dot notation extracts Bob from student.name and 95 from student.score. No need to parse JSON or use special functions.

Quiz

What does {'name': 'Alice', 'score': 90}.score return?

A
{'score': 90}

B
'score'

C
90

⚠ Try Again
Not quite. Dot notation extracts the value, not a nested struct.

⚠ Try Again
Not quite. Dot notation extracts the value, not the field name.

💡 Correct
Correct! Dot notation extracts the value 90 from the score field.

Unnest lists to rows:

The UNNEST function expands list elements into individual rows for row-by-row analysis.

Python

Run

aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgZHVja2RiCgojIERhdGEgd2l0aCBsaXN0cwpkYXRhID0gcGQuRGF0YUZyYW1lKHsKICAgICJpZCI6IFsxLCAyXSwKICAgICJ0YWdzIjogW1sicHl0aG9uIiwgInNxbCJdLCBbImRhdGEiLCAiYW5hbHl0aWNzIiwgIm1sIl1dCn0pCgojIEV4cGFuZCBsaXN0cyB0byBpbmRpdmlkdWFsIHJvd3MKcmVzdWx0ID0gZHVja2RiLnNxbCgiIiIKICAgIFNFTEVDVCBpZCwgVU5ORVNUKHRhZ3MpIEFTIHRhZwogICAgRlJPTSBkYXRhCiIiIikuZGYoKQoKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 What the output shows
Each tag from the original lists becomes its own row. ID 1 had 2 tags, so it appears twice. ID 2 had 3 tags, so it appears three times.

Quiz

How many rows does SELECT UNNEST([1, 2, 3]) return?

A
1 row with a list

B
3 rows with individual values

C
0 rows

⚠ Try Again
Not quite. UNNEST expands the list into separate rows, not a single row.

💡 Correct
Correct! UNNEST creates one row for each element in the list.

⚠ Try Again
Not quite. UNNEST returns one row per list element, so a 3-element list returns 3 rows.

← Previous

Complete & Continue →

Parameterized Queries
When working with databases, you often need to run similar queries with different parameters. For instance, you might want to filter a table using various criteria.

First, let’s create a sample products table:

Python

Run

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCI6bWVtb3J5OiIpCmNvbm4uc3FsKCIiIgogICAgQ1JFQVRFIFRBQkxFIHByb2R1Y3RzIChpZCBJTlQsIG5hbWUgVkFSQ0hBUiwgcHJpY2UgREVDSU1BTCkKIiIiKQpjb25uLnNxbCgiIiIKICAgIElOU0VSVCBJTlRPIHByb2R1Y3RzIFZBTFVFUwogICAgKDEsICdMYXB0b3AnLCA5OTkuOTkpLAogICAgKDIsICdQaG9uZScsIDY5OS45OSksCiAgICAoMywgJ1RhYmxldCcsIDQ0OS45OSksCiAgICAoNCwgJ1dhdGNoJywgMjk5Ljk5KQoiIiIpCgpwcmludChjb25uLnNxbCgiU0VMRUNUICogRlJPTSBwcm9kdWN0cyIpLmRmKCkp

Output

Loading Python…

You might use f-strings to pass parameters to your queries:

Python

Run

bWluX3ByaWNlID0gNDAwCnJlc3VsdCA9IGNvbm4uc3FsKAogICAgZiJTRUxFQ1QgKiBGUk9NIHByb2R1Y3RzIFdIRVJFIHByaWNlID4ge21pbl9wcmljZX0iCikuZGYoKQoKcHJpbnQoZiJQcm9kdWN0cyBvdmVyICR7bWluX3ByaWNlfToiKQpwcmludChyZXN1bHQp

Output

Loading Python…

⚠ Caution
While this works, f-strings are dangerous. A malicious user could:

Input "0; DROP TABLE products; –" to delete your table
Input "0 UNION SELECT * FROM secrets" to steal data

DuckDB provides a safer way with parameterized queries using the ? placeholder:

Python

Run

bWluX3ByaWNlID0gNDAwCnJlc3VsdCA9IGNvbm4uZXhlY3V0ZSgKICAgICJTRUxFQ1QgKiBGUk9NIHByb2R1Y3RzIFdIRVJFIHByaWNlID4gPyIsCiAgICAobWluX3ByaWNlLCkKKS5kZigpCgpwcmludChmIlByb2R1Y3RzIG92ZXIgJHttaW5fcHJpY2V9OiIpCnByaW50KHJlc3VsdCk=

Output

Loading Python…

💡 What the output shows
DuckDB binds 400 to the ? placeholder separately from parsing. Even if min_price contained malicious SQL, it would be treated as a literal value. This makes your database immune to injection attacks.

Try it

Use the ? placeholder to find products under $300:

Python

Run

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCI6bWVtb3J5OiIpCmNvbm4uc3FsKCIiIgogICAgQ1JFQVRFIFRBQkxFIHByb2R1Y3RzIChpZCBJTlQsIG5hbWUgVkFSQ0hBUiwgcHJpY2UgREVDSU1BTCkKIiIiKQpjb25uLnNxbCgiIiIKICAgIElOU0VSVCBJTlRPIHByb2R1Y3RzIFZBTFVFUwogICAgKDEsICdMYXB0b3AnLCA5OTkuOTkpLAogICAgKDIsICdQaG9uZScsIDY5OS45OSksCiAgICAoMywgJ1RhYmxldCcsIDQ0OS45OSksCiAgICAoNCwgJ1dhdGNoJywgMjk5Ljk5KQoiIiIpCgptYXhfcHJpY2UgPSAzMDAKcmVzdWx0ID0gY29ubi5leGVjdXRlKAogICAgIlNFTEVDVCAqIEZST00gcHJvZHVjdHMgV0hFUkUgX19fIiwKICAgIF9fXwopLmRmKCkKcHJpbnQocmVzdWx0KQ==

Output

Loading Python…

💡 Solution
“python "SELECT * FROM products WHERE price < ?", (max_price,) “
The ? placeholder gets replaced with the value from the tuple. The trailing comma is required for single-element tuples.

Quiz

If a malicious user sets min_price = "0; DROP TABLE products", what happens with parameterized queries?

A
DuckDB treats the entire string as a literal value, causing a type error

B
The products table gets deleted

C
DuckDB ignores the input and uses a default value

💡 Correct
Correct! The malicious string is treated as a literal value to compare against price. Since it’s not a valid number, the query fails safely without executing any DROP command.

⚠ Try Again
Not quite. That would happen with f-strings. Parameterized queries prevent the injected SQL from being executed as code.

⚠ Try Again
Not quite. DuckDB doesn’t silently replace bad input. It processes the input as a literal value, which would cause a type mismatch error.

← Previous

Complete & Continue →

ACID Transactions
DuckDB supports ACID transactions for data integrity:

Atomicity: The transaction either completes entirely or has no effect at all. If any operation fails, all changes are rolled back.
Consistency: The database maintains valid data by enforcing all rules and constraints throughout the transaction.
Isolation: Transactions run independently without interfering with each other.
Durability: Committed changes are permanent and survive system failures.

Let’s demonstrate ACID properties with a bank transfer. First, set up accounts and create a transfer function with balance checking:

Python

Run

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCI6bWVtb3J5OiIpCmNvbm4uc3FsKCIiIgogICAgQ1JFQVRFIFRBQkxFIGFjY291bnRzIChpZCBJTlQsIG5hbWUgVkFSQ0hBUiwgYmFsYW5jZSBERUNJTUFMKQoiIiIpCmNvbm4uc3FsKCIiIgogICAgSU5TRVJUIElOVE8gYWNjb3VudHMgVkFMVUVTCiAgICAoMSwgJ0FsaWNlJywgNTAwKSwgKDIsICdCb2InLCA1MDApLCAoMywgJ0NoYXJsaWUnLCA1MDApCiIiIikKCnByaW50KCJJbml0aWFsIGJhbGFuY2VzOiIpCnByaW50KGNvbm4uc3FsKCJTRUxFQ1QgKiBGUk9NIGFjY291bnRzIikuZGYoKSk=

Output

Loading Python…

Now define a transfer function that checks the balance before transferring. If funds are insufficient, it rolls back the transaction:

Python

Run

ZGVmIHRyYW5zZmVyX21vbmV5KGZyb21faWQsIHRvX2lkLCBhbW91bnQpOgogICAgY29ubi5zcWwoIkJFR0lOIFRSQU5TQUNUSU9OIikKCiAgICAjIENoZWNrIGJhbGFuY2UgYmVmb3JlIHRyYW5zZmVyCiAgICBiYWxhbmNlID0gY29ubi5leGVjdXRlKAogICAgICAgICJTRUxFQ1QgYmFsYW5jZSBGUk9NIGFjY291bnRzIFdIRVJFIGlkID0gPyIsIChmcm9tX2lkLCkKICAgICkuZmV0Y2hvbmUoKVswXQoKICAgIGlmIGJhbGFuY2UgPj0gYW1vdW50OgogICAgICAgIGNvbm4uZXhlY3V0ZSgKICAgICAgICAgICAgIlVQREFURSBhY2NvdW50cyBTRVQgYmFsYW5jZSA9IGJhbGFuY2UgLSA/IFdIRVJFIGlkID0gPyIsCiAgICAgICAgICAgIChhbW91bnQsIGZyb21faWQpCiAgICAgICAgKQogICAgICAgIGNvbm4uZXhlY3V0ZSgKICAgICAgICAgICAgIlVQREFURSBhY2NvdW50cyBTRVQgYmFsYW5jZSA9IGJhbGFuY2UgKyA/IFdIRVJFIGlkID0gPyIsCiAgICAgICAgICAgIChhbW91bnQsIHRvX2lkKQogICAgICAgICkKICAgICAgICBjb25uLnNxbCgiQ09NTUlUIikKICAgICAgICBwcmludChmIlRyYW5zZmVyIG9mICR7YW1vdW50fSBjb21wbGV0ZWQgc3VjY2Vzc2Z1bGx5IikKICAgIGVsc2U6CiAgICAgICAgY29ubi5zcWwoIlJPTExCQUNLIikKICAgICAgICBwcmludChmIkluc3VmZmljaWVudCBmdW5kczogYmFsYW5jZSBpcyAke2JhbGFuY2V9Iik=

Output

Loading Python…

In the code above:

BEGIN TRANSACTION starts a transaction block where all changes remain hidden until committed
COMMIT permanently saves all changes made within the transaction
ROLLBACK cancels all changes and restores the database to its state before the transaction began

Now let’s perform a valid transfer of $200 from Alice to Bob:

Python

Run

dHJhbnNmZXJfbW9uZXkoMSwgMiwgMjAwKQoKcHJpbnQoIlxuQmFsYW5jZXMgYWZ0ZXIgdmFsaWQgdHJhbnNmZXI6IikKcHJpbnQoY29ubi5zcWwoIlNFTEVDVCAqIEZST00gYWNjb3VudHMiKS5kZigpKQ==

Output

Loading Python…

💡 What the output shows
Alice’s balance dropped from $500 to $300, and Bob’s increased from $500 to $700. This demonstrates two ACID properties:

Atomicity: Both updates executed as a single unit
Consistency: Total balance stayed at $1500

Now let’s attempt an invalid transfer. Bob tries to send $1000 to Charlie, but he only has $700:

Python

Run

dHJhbnNmZXJfbW9uZXkoMiwgMywgMTAwMCkKCnByaW50KCJcbkJhbGFuY2VzIGFmdGVyIGZhaWxlZCB0cmFuc2ZlciAoc2hvdWxkIGJlIHVuY2hhbmdlZCk6IikKcHJpbnQoY29ubi5zcWwoIlNFTEVDVCAqIEZST00gYWNjb3VudHMiKS5kZigpKQ==

Output

Loading Python…

💡 What the output shows
The transfer failed due to insufficient funds. ROLLBACK canceled all changes, so Bob still has $700 and Charlie still has $500. This demonstrates two more ACID properties:

Atomicity: Either all operations succeed, or none do
Durability: The successful transfer from earlier is permanently saved

Quiz

In the bank transfer example, what does Atomicity guarantee?

A
Transfers happen instantly without delay

B
Either both accounts are updated, or neither is

C
Account balances are encrypted during transfer

⚠ Try Again
Not quite. Atomicity isn’t about speed. It’s about ensuring operations complete as a single unit.

💡 Correct
Correct! Atomicity means the two UPDATE statements (debit and credit) succeed together or fail together. You’ll never have money deducted from Alice without it arriving in Bob’s account.

⚠ Try Again
Not quite. Atomicity isn’t about encryption or security. It ensures all operations in a transaction complete as one indivisible unit.

← Previous

Complete & Continue →

Attach External Databases
DuckDB can connect to external databases and query them as if they were local tables. This enables federated queries across PostgreSQL, MySQL, SQLite, and DuckDB files.

Attach a SQLite database:

Querying a SQLite database is straightforward. Just install the extension, attach the file, and run SQL.

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKCiMgSW5zdGFsbCBhbmQgbG9hZCB0aGUgU1FMaXRlIGV4dGVuc2lvbgpjb25uLmV4ZWN1dGUoIklOU1RBTEwgc3FsaXRlIikKY29ubi5leGVjdXRlKCJMT0FEIHNxbGl0ZSIpCgojIEF0dGFjaCBTUUxpdGUgZGF0YWJhc2UKY29ubi5leGVjdXRlKCJBVFRBQ0ggJ215X3NxbGl0ZS5kYicgQVMgc3FsaXRlX2RiIChUWVBFIHNxbGl0ZSkiKQoKIyBRdWVyeSBTUUxpdGUgdGFibGVzIGRpcmVjdGx5CnJlc3VsdCA9IGNvbm4uc3FsKCIiIgogICAgU0VMRUNUICogRlJPTSBzcWxpdGVfZGIudXNlcnMKICAgIFdIRVJFIGNyZWF0ZWRfYXQgPiAnMjAyNC0wMS0wMScKIiIiKS5kZigp

Quiz

What operations does DuckDB support on attached SQLite databases?

A
Read-only queries

B
Both read and write operations

C
Only SELECT and JOIN queries

⚠ Try Again
Not quite. DuckDB can do more than just read from SQLite databases.

💡 Correct
Correct! DuckDB supports both read and write operations on attached SQLite databases, including creating tables, inserting data, and modifying schemas.

⚠ Try Again
Not quite. DuckDB supports full read and write operations, not just queries.

Attach a PostgreSQL database:

Connecting to PostgreSQL is just as easy. Simply install the extension and provide your connection details.

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKCiMgSW5zdGFsbCBhbmQgbG9hZCB0aGUgUG9zdGdyZVNRTCBleHRlbnNpb24KY29ubi5leGVjdXRlKCJJTlNUQUxMIHBvc3RncmVzIikKY29ubi5leGVjdXRlKCJMT0FEIHBvc3RncmVzIikKCiMgQXR0YWNoIFBvc3RncmVTUUwgZGF0YWJhc2UKY29ubi5leGVjdXRlKCIiIgogICAgQVRUQUNIICdob3N0PWxvY2FsaG9zdCBkYm5hbWU9bXlkYiB1c2VyPXBvc3RncmVzIHBhc3N3b3JkPXNlY3JldCcKICAgIEFTIHBnX2RiIChUWVBFIHBvc3RncmVzKQoiIiIpCgojIFF1ZXJ5IFBvc3RncmVzIHRhYmxlcyB3aXRoIER1Y2tEQidzIHNwZWVkCnJlc3VsdCA9IGNvbm4uc3FsKCIiIgogICAgU0VMRUNUIHJlZ2lvbiwgU1VNKHJldmVudWUpIGFzIHRvdGFsCiAgICBGUk9NIHBnX2RiLnNhbGVzCiAgICBHUk9VUCBCWSByZWdpb24KIiIiKS5kZigp

When querying with filters, DuckDB pushes the WHERE condition to PostgreSQL. Only matching rows travel back over the network, then DuckDB runs aggregations on the received data.

This query shows how DuckDB pushes the WHERE clause to PostgreSQL.

IyBTdGVwIDE6IFBvc3RncmVTUUwgZmlsdGVycyByb3dzIHdoZXJlIHllYXIgPSAyMDI0CiMgU3RlcCAyOiBEdWNrREIgcnVucyBTVU0gYW5kIEdST1VQIEJZIGxvY2FsbHkKcmVzdWx0ID0gY29ubi5zcWwoIiIiCiAgICBTRUxFQ1QgcmVnaW9uLCBTVU0ocmV2ZW51ZSkgYXMgdG90YWwKICAgIEZST00gcGdfZGIuc2FsZXMKICAgIFdIRVJFIHllYXIgPSAyMDI0CiAgICBHUk9VUCBCWSByZWdpb24KIiIiKS5kZigp

The diagram below illustrates the network optimization:

DuckDB PostgreSQL
│ │
│──── WHERE year = 2024 ────────>│
│ │ (filters 1M → 10K rows)
│<─── 10K matching rows ─────────│
│ │
│ (runs SUM, GROUP BY locally) │

Quiz

When you query a PostgreSQL table with a WHERE clause, what does DuckDB do?

A
Transfers all rows, then filters locally

B
Sends the WHERE condition to PostgreSQL, transfers only matching rows

C
Converts the PostgreSQL table to Parquet first

⚠ Try Again
Not quite. DuckDB optimizes by pushing filters to PostgreSQL to reduce data transfer.

💡 Correct
Correct! DuckDB sends the WHERE condition to PostgreSQL, so only matching rows travel over the network.

⚠ Try Again
Not quite. DuckDB queries PostgreSQL directly without format conversion.

Federated queries – join across databases:

DuckDB’s real power shows when joining across databases. Simply attach each source and reference tables with their database prefix.

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKCiMgSW5zdGFsbCBhbmQgbG9hZCBleHRlbnNpb25zCmNvbm4uZXhlY3V0ZSgiSU5TVEFMTCBzcWxpdGU7IElOU1RBTEwgcG9zdGdyZXMiKQpjb25uLmV4ZWN1dGUoIkxPQUQgc3FsaXRlOyBMT0FEIHBvc3RncmVzIikKCiMgQXR0YWNoIG11bHRpcGxlIHNvdXJjZXMKY29ubi5leGVjdXRlKCJBVFRBQ0ggJ3VzZXJzLmRiJyBBUyBzcWxpdGVfZGIgKFRZUEUgc3FsaXRlKSIpCmNvbm4uZXhlY3V0ZSgiQVRUQUNIICdwb3N0Z3JlczovL2xvY2FsaG9zdC9zYWxlcycgQVMgcGdfZGIgKFRZUEUgcG9zdGdyZXMpIikKCiMgSm9pbiBkYXRhIGFjcm9zcyBTUUxpdGUsIFBvc3RncmVzLCBhbmQgbG9jYWwgUGFycXVldApyZXN1bHQgPSBjb25uLnNxbCgiIiIKICAgIFNFTEVDVAogICAgICAgIHUubmFtZSwKICAgICAgICBzLnRvdGFsX29yZGVycywKICAgICAgICBwLnByb2R1Y3RfY291bnQKICAgIEZST00gc3FsaXRlX2RiLnVzZXJzIHUKICAgIEpPSU4gcGdfZGIub3JkZXJfc3VtbWFyeSBzIE9OIHUuaWQgPSBzLnVzZXJfaWQKICAgIEpPSU4gcmVhZF9wYXJxdWV0KCdwcm9kdWN0cy5wYXJxdWV0JykgcCBPTiB1LmlkID0gcC51c2VyX2lkCiIiIikuZGYoKQ==

💡 What the output shows
A single query joins data from SQLite, PostgreSQL, and a Parquet file. DuckDB handles the complexity of fetching, joining, and aggregating across different sources.

Quiz

To join a SQLite table with a PostgreSQL table in DuckDB, you need to:

A
Export both tables to Parquet first

B
Attach both databases and reference tables with their aliases

C
Create foreign key relationships between the databases

⚠ Try Again
Not quite. DuckDB can query SQLite and PostgreSQL directly without exporting to intermediate formats.

💡 Correct
Correct! After attaching databases with aliases, you reference tables as alias.table_name in your query.

⚠ Try Again
Not quite. DuckDB handles cross-database joins without requiring foreign key definitions between sources.

Save attached database locally:

To avoid repeated network calls, you can create a local copy of remote tables.

aW1wb3J0IGR1Y2tkYgoKY29ubiA9IGR1Y2tkYi5jb25uZWN0KCkKCiMgQ29weSByZW1vdGUgZGF0YSBvbmNlCmNvbm4uZXhlY3V0ZSgiQ1JFQVRFIFRBQkxFIGxvY2FsX3NhbGVzIEFTIFNFTEVDVCAqIEZST00gcGdfZGIuc2FsZXMgV0hFUkUgeWVhciA9IDIwMjQiKQoKIyBSdW4gbXVsdGlwbGUgcXVlcmllcyBvbiBsb2NhbCBkYXRhIChubyBuZXR3b3JrIGNhbGxzKQp0b3RhbCA9IGNvbm4uc3FsKCJTRUxFQ1QgU1VNKHJldmVudWUpIEZST00gbG9jYWxfc2FsZXMiKS5mZXRjaG9uZSgpWzBdCmNvdW50ID0gY29ubi5zcWwoIlNFTEVDVCBDT1VOVCgqKSBGUk9NIGxvY2FsX3NhbGVzIikuZmV0Y2hvbmUoKVswXQphdmcgPSBjb25uLnNxbCgiU0VMRUNUIEFWRyhyZXZlbnVlKSBGUk9NIGxvY2FsX3NhbGVzIikuZmV0Y2hvbmUoKVswXQ==

💡 What the output shows
Only the first query hits the network. The three subsequent queries run entirely on local data, avoiding repeated round trips to PostgreSQL.

← Previous

Complete & Continue →

Key Takeaways
DuckDB empowers data scientists with several key advantages:

Zero-configuration SQL querying without database server setup
Seamless integration with pandas and Polars DataFrames
Out-of-core processing for datasets larger than available RAM
Friendly SQL syntax with shortcuts like GROUP BY ALL and EXCLUDE
Direct querying of cloud storage (S3, Azure, GCS)
Reading and writing Hive-partitioned datasets
Rich nested data types (LIST, STRUCT, MAP) with list comprehensions
Federated queries across PostgreSQL, MySQL, and SQLite
ACID transaction support ensuring data integrity
High-performance columnar-vectorized execution engine

With DuckDB, you can unlock a new level of productivity and efficiency in your data analysis workflows.

💡 Next steps
Try DuckDB on your own datasets! Start by replacing pandas queries with DuckDB SQL and compare the performance.

← Previous

Complete Course

×
Course Complete!
Nice work finishing this course. Ready to go deeper? Check out these courses with hands-on exercises:


Entity Extraction with spaCy and LLMs
Extract names, dates, and custom entities from text.


Python Data Modeling with Dataclasses and Pydantic
Choose the right data container: dict, NamedTuple, dataclass, or Pydantic.

Browse all courses →

DuckDB for Data Scientists Read More »

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran