Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Course

Entity Extraction with spaCy and LLMs

/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}

body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}

/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}

/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}

.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}

.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}

.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}

.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}

.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}

/* Navigation */
.nav-section {
margin-bottom: 1rem;
}

.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}

.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}

.nav-item:hover {
background: #3d3b3c;
color: #fff;
}

.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}

.nav-item.completed .status-icon {
color: #72BEFA;
}

.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}

.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}

.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}

/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}

.lesson {
display: none;
}

.lesson.active {
display: block;
}

.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}

.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}

.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.lesson p {
color: #ccc;
margin-bottom: 1rem;
}

.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}

.lesson li {
margin-bottom: 0.5rem;
}

.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}

.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}

.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}

/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}

.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}

.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}

.callout-tip .callout-title {
color: #72BEFA;
}

.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}

.callout-note .callout-title {
color: #72FCDB;
}

.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}

.callout-warning .callout-title {
color: #E583B6;
}

.callout a {
color: #fff;
text-decoration: underline;
}

.callout a:hover {
color: #72FCDB;
}

/* Collapsible callouts */
details.callout {
cursor: pointer;
}

details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}

details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}

details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}

details.callout summary.callout-title::-webkit-details-marker {
display: none;
}

details.callout > p {
margin-top: 0.75rem;
}

.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}

.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}

/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}

.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}

/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}

/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}

.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}

.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}

.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}

.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}

.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}

.quiz-option:disabled {
cursor: default;
}

.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}

.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}

.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}

.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}

.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}

.option-content {
display: block;
flex: 1;
color: #ccc;
}

.option-content code {
background: #282a36;
padding: 0.5rem 0.75rem;
border-radius: 4px;
font-size: 0.85rem;
display: block;
color: #f8f8f2;
}

.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}

.quiz-feedback .callout {
margin: 0;
}

/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}

.codecut-run-btn:hover {
background: #5aa8e8;
}

.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}

.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}

.codecut-editor textarea {
width: 100%;
min-height: 80px;
padding: 1rem;
background: #2F2D2E;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: vertical;
outline: none;
overflow: hidden;
}

/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}

.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}

.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}

/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}

.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}

.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}

.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}

.CodeMirror-sizer {
margin-left: 40px !important;
}

.CodeMirror-cursor {
border-left-color: #72BEFA;
}

.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}

.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}

/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}

.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}

.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}

.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}

.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }

.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}

@keyframes spin {
to { transform: rotate(360deg); }
}

/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.exercise-split {
display: flex;
flex-direction: column;
}

.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}

.exercise-assignment p {
margin: 0;
}

.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-section {
flex: 1;
min-width: 200px;
}

.exercise-heading + p {
margin-top: 0;
}

.exercise-assignment em {
color: #ffffff;
font-style: italic;
}

.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}

.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}

.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}

.exercise-secret:last-child {
margin-bottom: 0;
}

.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}

.exercise-secret input:focus {
border-color: #72BEFA;
}

.exercise-secret input::placeholder {
color: #666;
}

.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}

.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}

.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}

.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}

.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}

.exercise-btn:hover {
background: #4d4b4c;
}

.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}

.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}

.exercise-btn.primary:hover {
background: #5aa8e8;
}

.exercise-btn.primary:disabled {
background: #666;
}

.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}

.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}

.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }

.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}

.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}

.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}

/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}

.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}

.lesson-nav-btn:hover {
background: #4a4849;
}

.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}

.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}

/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}

.course-content {
margin-left: 0;
padding: 1.5rem;
}

.course-layout {
flex-direction: column;
}
}

Entity Extraction with spaCy and LLMs
0 of 17 completed

Getting Started


What is Entity Extraction?


Sample Document

The Manual Approach


Why Not Use Regex?

spaCy NER


Production-Grade Named Entity Recognition


Exercise: Build a Contact List
🔒


Extracting from Business Documents
🔒


Exercise: Export Contact List
🔒


Visualizing Entities with displaCy
🔒

GLiNER


Zero-Shot Custom Entity Extraction


Extracting Business Entities
🔒


Exercise: Parse Business Metrics
🔒


Using Confidence Scores for Quality Control
🔒


Exercise: Route Low-Confidence to Review
🔒

langextract


AI-Powered Extraction with Source Grounding


Exercise: Analyze Customer Feedback
🔒


Visualizing Extractions
🔒

Summary


When to Use Each Tool
🔒

What is Entity Extraction?
Entity extraction (also called Named Entity Recognition or NER) automatically identifies and classifies key information from unstructured text. For instance, financial reports contain company names, monetary figures, executives, dates, and locations used for competitive analysis and executive tracking.

Extracting these entities manually is time-consuming and error-prone. Automated entity extraction provides a faster and more reliable alternative.

In this course, you’ll learn three modern tools for entity extraction:

spaCy: Production-ready NER with pre-trained models
GLiNER: Zero-shot custom entity recognition
langextract: AI-powered extraction with source grounding

Complete & Continue →

Sample Document
Throughout this course, we’ll extract entities from this earnings report.

Press Run below to try it out.

Python

Run

earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. The company's board of directors
declared a cash dividend of $0.24 per share.

CFO Luca Maestri mentioned that iPhone revenue was $39.3 billion for
the quarter ending June 30, 2023. The company expects total revenue
between $89 billion and $93 billion for the fourth quarter.

Apple's Cupertino headquarters announced the acquisition of AI startup
WaveOne for an undisclosed amount. The deal is expected to close in
Q4 2023, pending regulatory approval from the SEC.
"""

print("Earnings report loaded!")
print(f"Document length: {len(earning_report)} characters")

Output

Loading Python…

We chose this report because it’s dense with overlapping entity types, which is exactly what makes real-world extraction challenging:

Monetary amounts appear in different contexts: revenue ($81.4B), dividends ($0.24), and forecasted ranges ($89B-$93B)
Named entities overlap: “Apple Inc.” is both a company and a stock ticker (AAPL), and “SEC” is an abbreviation that needs context to identify
Temporal references mix formats: exact dates (June 30, 2023), quarters (Q4 2023), and relative time (year over year)

← Previous

Complete & Continue →

Why Not Use Regex?
Regular expressions define text patterns using special syntax to find matches in strings. While they may seem like a natural first choice for entity extraction, they require a separate pattern for each entity type and fail when formats vary.

Here’s what extracting financial amounts, dates, stock symbols, and quarters with regex looks like:

Python

Run

import re

earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. CFO Luca Maestri mentioned that
iPhone revenue was $39.3 billion for the quarter ending June 30, 2023.
"""

# Each entity type needs a separate complex pattern
financial_pattern = r"\$(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.[0-9]+)?(?:\s*(?:billion|million|trillion))?"
date_pattern = r"\b(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},\s+\d{4}"
stock_pattern = r"\b(?:NASDAQ|NYSE|NYSEARCA):\s*[A-Z]{2,5}\b"
quarter_pattern = r"\b(Q[1-4]\s+\d{4})\b"

print("Financial amounts:", re.findall(financial_pattern, earning_report, re.IGNORECASE))
print("Dates:", re.findall(date_pattern, earning_report))
print("Stock symbols:", re.findall(stock_pattern, earning_report))
print("Quarters:", re.findall(quarter_pattern, earning_report))

Output

Loading Python…

From the code above, several limitations become apparent:

Each entity type requires its own pattern, resulting in verbose boilerplate code that is difficult to read and maintain.
The patterns only match numeric quarter formats like “Q4 2023” and miss textual forms such as “third quarter” unless additional exact-match patterns are added.

Quiz

A document contains dates in formats like “January 15, 2024”, “15/01/2024”, and “2024-01-15”. What challenge does regex face here?

A
Regex cannot match numeric characters

B
Each date format requires a separate pattern, making the code harder to maintain as formats increase

C
Regex patterns are limited to 100 characters in length

⚠ Try Again
Not quite. Regex handles numeric characters easily with patterns like \d. The challenge is handling multiple format variations.

💡 Correct
Correct! Each date format (ISO, US, European, written) needs its own pattern. As formats multiply, the codebase grows harder to maintain and test.

⚠ Try Again
Not quite. Regex patterns have no practical length limit. The challenge is writing and maintaining patterns for every format variation.

← Previous

Complete & Continue →

Production-Grade Named Entity Recognition
spaCy provides pre-trained models that automatically identify entities like PERSON, ORG, MONEY, DATE, and PERCENT from context. No pattern writing required.

Let’s install spaCy and download a small English model to get started:

pip install spacy
python -m spacy download en_core_web_sm

Extracting entities with spaCy takes just two steps:

Load the model
Process your text

Python

Run

import spacy

# Load the model
nlp = spacy.load("en_core_web_sm")

# Process your text
sample_text = "Apple Inc. reported revenue of $81.4 billion with CEO Tim Cook."
doc = nlp(sample_text)

print("Entities found:")
for ent in doc.ents:
print(f" '{ent.text}' -> {ent.label_}")

Output

💡 What the output shows

spaCy extracted three entity types (ORG, MONEY, PERSON) without any configuration
The model understood that “Apple Inc.” is a company, not just a fruit
It captured the complete monetary amount “$81.4 billion” including the unit
Person names are recognized even without titles like “CEO”

How spaCy NER Works

spaCy labels each token individually using its BILUO tagging scheme, then groups consecutive entity tokens into spans:

"Apple" "Inc." "CEO" "Tim" "Cook" "$81.4" "billion"
│ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼
B-ORG L-ORG O B-PER L-PER B-MONEY L-MONEY
└───┬───┘ └──┬──┘ └────┬────┘
▼ ▼ ▼
"Apple Inc." → ORG "Tim Cook" → PERSON "$81.4 billion" → MONEY

Begin / Inside / Last mark multi-token entities
Unit marks single-token entities (e.g., “London” → U-LOC)
O means outside any entity

The model learns these tagging patterns from thousands of labeled examples during training.

Quiz

How does spaCy determine that “Apple Inc.” is an ORG entity?

A
It matches against a built-in dictionary of known company names

B
It uses regex to match common organization name patterns

C
The pre-trained model learned patterns from labeled training data

⚠ Try Again
Not quite. spaCy doesn’t use a fixed lookup table. It uses a statistical model that can recognize entities it has never seen before based on learned patterns.

⚠ Try Again
Not quite. Regex uses fixed text patterns. spaCy’s NER model uses neural networks trained on annotated text to predict entity types from context.

💡 Correct
Correct! spaCy’s NER is a statistical model trained on annotated text. It learned patterns like capitalization, surrounding words, and name structures from its training data, not from a fixed list or regex rules.

← Previous

Complete & Continue →

Exercise: Build a Contact List

← Previous

Complete & Continue →

Extracting from Business Documents

← Previous

Complete & Continue →

Exercise: Export Contact List

← Previous

Complete & Continue →

Visualizing Entities with displaCy

← Previous

Complete & Continue →

Zero-Shot Custom Entity Extraction
GLiNER solves spaCy’s limitation of fixed entity types. Instead of being locked into categories like ORG or GPE, GLiNER lets you define custom types using natural language descriptions.

pip install gliner

GLiNER offers several pretrained models. We’ll use gliner_small-v2.1 with threshold=0.3 to capture entities with at least 30% confidence:

Python

Run

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")

test_text = "Apple Inc. CEO Tim Cook announced quarterly revenue of $81.4 billion."
custom_types = ["Company", "Person", "Currency"]

entities = model.predict_entities(test_text, custom_types, threshold=0.3)

for entity in entities:
print(f"'{entity['text']}' -> {entity['label']} (confidence: {entity['score']:.3f})")

Output

💡 What the output shows

GLiNER recognized custom entity types without any training
Confidence scores vary: “Tim Cook” (0.563) scores highest as names are distinctive, while “$81.4 billion” (0.310) scores lower because “Currency” is a less common label

📝 Other model options
For higher accuracy, try gliner_medium-v2.1. For multilingual support, use gliner_multi-v2.1.

How GLiNER Works

Instead of tagging individual tokens, GLiNER scores entire spans against every label you provide. The highest-scoring label wins, and spans below your threshold are filtered out:

┌──────────────┬───────────┬──────────────────┐
│ Span │ Label │ Confidence │
├──────────────┼───────────┼──────────────────┤
│ Apple Inc │ Company │ ████░░░░░░░ 0.36 │ ✓ above 0.3
│ Apple Inc │ Person │ █░░░░░░░░░░ 0.05 │ ✗
├──────────────┼───────────┼──────────────────┤
│ Tim Cook │ Company │ █░░░░░░░░░░ 0.04 │ ✗
│ Tim Cook │ Person │ ██████░░░░░ 0.56 │ ✓ above 0.3
├──────────────┼───────────┼──────────────────┤
│ $81.4 billion│ Company │ ░░░░░░░░░░░ 0.01 │ ✗
│ $81.4 billion│ Currency │ ███░░░░░░░░ 0.31 │ ✓ above 0.3
└──────────────┴───────────┴──────────────────┘
threshold = 0.3 ▲

This gives you two controls spaCy doesn’t: custom labels (any text, not a fixed set) and a confidence threshold to filter results.

Quiz

How does GLiNER decide which label to assign to a text span?

A
It picks the first label in your list that partially matches

B
It scores the span against every label and picks the highest

C
It uses a dictionary lookup to map known words to labels

⚠ Try Again
Not quite. The order of labels in your list doesn’t affect the result. GLiNER evaluates all labels equally for each span.

💡 Correct
Correct! As shown in the diagram, each span is scored against all labels. “Apple Inc” scored 0.36 for Company, 0.05 for Person, and 0.02 for Currency. The highest score (Company) wins.

⚠ Try Again
Not quite. GLiNER doesn’t use a fixed dictionary. It uses a BERT-like encoder to compare text spans against label descriptions semantically.

← Previous

Complete & Continue →

Extracting Business Entities

← Previous

Complete & Continue →

Exercise: Parse Business Metrics

← Previous

Complete & Continue →

Using Confidence Scores for Quality Control

← Previous

Complete & Continue →

Exercise: Route Low-Confidence to Review

← Previous

Complete & Continue →

AI-Powered Extraction with Source Grounding
langextract uses large language models (Gemini, GPT) to understand entity relationships and provide source attribution.

It captures semantic context like “AI startup WaveOne” (category + name) and “between $89 billion and $93 billion” (revenue ranges) as complete phrases rather than separate pieces.

Let’s install langextract along with its dependencies to try it out:

pip install langextract python-dotenv google-genai

To authenticate, add your API key to a .env file. This course uses Gemini (get a key from AI Studio), but OpenAI models also work:

# .env file
LANGEXTRACT_API_KEY=your-api-key-here

langextract uses an LLM to extract entities. You provide examples that teach the model what to look for and how to format the output:

Example (you provide):
┌─────────────────────────────────────────────────────┐
│ Text: "Microsoft Corp. CEO Satya Nadella reported │
│ Q2 2024 revenue of $65B" │
│ │
│ Extractions: │
│ company → "Microsoft Corp." │
│ executive → "CEO Satya Nadella" ← role + name │
│ quarter → "Q2 2024" │
│ financial → "$65B" │
└──────────────────────┬──────────────────────────────┘
│ teaches format

New text: "Apple Inc… CEO Tim Cook… $81.4 billion"


Output (model generates):
┌─────────────────────────────────────────────────────┐
│ company → "Apple Inc." │
│ executive → "CEO Tim Cook" ← same format │
│ executive → "CFO Luca Maestri" ← generalized │
│ financial → "undisclosed amount" ← semantic │
└─────────────────────────────────────────────────────┘

The LLM generalizes from your examples. One example showing “CEO Satya Nadella” is enough for it to also extract “CFO Luca Maestri” and understand “undisclosed amount” as a financial figure, something spaCy and GLiNER would miss.

Few-Shot Learning with Examples

To use langextract, provide two components:

Prompt: A description listing entity types to extract (companies, executives, financial figures)
Examples: Sample text paired with labeled extractions showing expected output

Python

Run

import os
from dotenv import load_dotenv
import langextract as lx
from langextract import extract

load_dotenv()

def extract_financial_entities(text):
"""Extract entities using langextract."""
prompt_description = """Extract business entities: companies, executives,
financial figures, quarters, locations, products, startups,
regulatory bodies, stock_symbols, market_reaction."""

examples = [
lx.data.ExampleData(
text="Microsoft Corp. (NYSE: MSFT) CEO Satya Nadella reported Q2 2024 revenue of $65B, down 5% quarter-over-quarter.",
extractions=[
lx.data.Extraction(extraction_class="company", extraction_text="Microsoft Corp."),
lx.data.Extraction(extraction_class="executive", extraction_text="CEO Satya Nadella"),
lx.data.Extraction(extraction_class="stock_symbol", extraction_text="NYSE: MSFT"),
lx.data.Extraction(extraction_class="quarter", extraction_text="Q2 2024"),
lx.data.Extraction(extraction_class="financial_figure", extraction_text="$65B"),
lx.data.Extraction(extraction_class="market_reaction", extraction_text="down 5% quarter-over-quarter"),
]
)
]

return extract(
text_or_documents=text,
prompt_description=prompt_description,
examples=examples,
model_id="gemini-2.5-flash"
)

Output

Now extract entities from the earnings report:

Python

Run

from collections import defaultdict

earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. The company's board of directors
declared a cash dividend of $0.24 per share.

CFO Luca Maestri mentioned that iPhone revenue was $39.3 billion for
the quarter ending June 30, 2023. The company expects total revenue
between $89 billion and $93 billion for the fourth quarter.

Apple's Cupertino headquarters announced the acquisition of AI startup
WaveOne for an undisclosed amount. The deal is expected to close in
Q4 2023, pending regulatory approval from the SEC.
"""

result = extract_financial_entities(earning_report)

non_empty = [e for e in result.extractions if e.extraction_text]
print(f"Extracted {len(non_empty)} entities:")

grouped = defaultdict(list)
for extraction in result.extractions:
if extraction.extraction_text: # Filter empty extractions
grouped[extraction.extraction_class].append(extraction.extraction_text)

for entity_class, texts in grouped.items():
print(f"\n{entity_class.upper()} ({len(texts)} found):")
for text in texts:
print(f" '{text}'")

Output

💡 What the output shows

Role-linked executives (“CEO Tim Cook”) instead of just the name
Semantic understanding of “undisclosed amount” as a financial figure
Market reaction “up 2% year over year” captured with full context

Quiz

The example extracts “CEO Satya Nadella” as an executive. How does this affect the model’s output?

A
The model will only extract executives from Microsoft

B
The model learns to include the role (CEO/CFO) with the name

C
The model copies the exact format and ignores other patterns

⚠ Try Again
Not quite. The example teaches a pattern, not a specific company. The model applied the same pattern to extract “CEO Tim Cook” and “CFO Luca Maestri” from Apple’s report.

💡 Correct
Correct! The few-shot example teaches the model what format to use. Since the example linked the role to the name, the model did the same for “CEO Tim Cook” and “CFO Luca Maestri.”

⚠ Try Again
Not quite. The model generalizes from the example. It extracted “CFO Luca Maestri” even though the example only showed a CEO pattern.

langextract extracted “undisclosed amount” as a financial figure. Why would spaCy and GLiNER likely miss this?

A
“undisclosed amount” is too long for token-based models

B
It contains no numbers or currency symbols, which pattern-based models rely on to identify financial entities

C
spaCy and GLiNER can’t process sentences about acquisitions

⚠ Try Again
Not quite. Both spaCy and GLiNER handle multi-token spans. “Cupertino headquarters” was captured as a two-word span by GLiNER.

💡 Correct
Correct! spaCy’s MONEY type and GLiNER’s “Monetary Value” label both depend on numeric patterns. langextract’s LLM understands that “undisclosed amount” refers to money semantically, even without numbers.

⚠ Try Again
Not quite. Both tools can process any text. The issue is that “undisclosed amount” lacks the numeric patterns these models use to identify financial entities.

← Previous

Complete & Continue →

Exercise: Analyze Customer Feedback

← Previous

Complete & Continue →

Visualizing Extractions

← Previous

Complete & Continue →

When to Use Each Tool

← Previous

Complete Course

Entity Extraction with spaCy and LLMs Read More »

DuckDB for Data Scientists

/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}

body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}

/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}

/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}

.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}

.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}

.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}

.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}

.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}

/* Navigation */
.nav-section {
margin-bottom: 1rem;
}

.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}

.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}

.nav-item:hover {
background: #3d3b3c;
color: #fff;
}

.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}

.nav-item.completed .status-icon {
color: #72BEFA;
}

.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}

.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}

.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}

/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}

.lesson {
display: none;
}

.lesson.active {
display: block;
}

.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}

.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}

.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.lesson p {
color: #ccc;
margin-bottom: 1rem;
}

.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}

.lesson li {
margin-bottom: 0.5rem;
}

.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}

.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}

.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}

/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}

.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}

.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}

.callout-tip .callout-title {
color: #72BEFA;
}

.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}

.callout-note .callout-title {
color: #72FCDB;
}

.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}

.callout-warning .callout-title {
color: #E583B6;
}

.callout a {
color: #fff;
text-decoration: underline;
}

.callout a:hover {
color: #72FCDB;
}

/* Collapsible callouts */
details.callout {
cursor: pointer;
}

details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}

details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}

details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}

details.callout summary.callout-title::-webkit-details-marker {
display: none;
}

details.callout > p {
margin-top: 0.75rem;
}

.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}

.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}

/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}

.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}

/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}

/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}

.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}

.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}

.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}

.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}

.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}

.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}

.quiz-option:disabled {
cursor: default;
}

.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}

.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}

.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}

.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}

.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}

.option-content {
display: block;
flex: 1;
color: #ccc;
}

.option-content code {
background: #282a36;
padding: 0.5rem 0.75rem;
border-radius: 4px;
font-size: 0.85rem;
display: block;
color: #f8f8f2;
}

.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}

.quiz-feedback .callout {
margin: 0;
}

/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}

.codecut-run-btn:hover {
background: #5aa8e8;
}

.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}

.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}

.codecut-editor textarea {
width: 100%;
min-height: 80px;
padding: 1rem;
background: #2F2D2E;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: vertical;
outline: none;
overflow: hidden;
}

/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}

.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}

.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}

.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}

/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}

.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}

.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}

.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}

.CodeMirror-sizer {
margin-left: 40px !important;
}

.CodeMirror-cursor {
border-left-color: #72BEFA;
}

.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}

.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}

/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}

.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}

.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}

.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}

.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}

.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }

.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}

@keyframes spin {
to { transform: rotate(360deg); }
}

/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}

.exercise-split {
display: flex;
flex-direction: column;
}

.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}

.exercise-assignment p {
margin: 0;
}

.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-section {
flex: 1;
min-width: 200px;
}

.exercise-heading + p {
margin-top: 0;
}

.exercise-assignment em {
color: #ffffff;
font-style: italic;
}

.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}

.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}

.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}

.exercise-secret:last-child {
margin-bottom: 0;
}

.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}

.exercise-secret input:focus {
border-color: #72BEFA;
}

.exercise-secret input::placeholder {
color: #666;
}

.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}

.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}

.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}

.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}

.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}

.exercise-btn:hover {
background: #4d4b4c;
}

.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}

.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}

.exercise-btn.primary:hover {
background: #5aa8e8;
}

.exercise-btn.primary:disabled {
background: #666;
}

.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}

.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}

.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}

.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}

.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }

.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}

.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}

.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}

/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}

.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}

.lesson-nav-btn:hover {
background: #4a4849;
}

.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}

.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}

/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}

.course-content {
margin-left: 0;
padding: 1.5rem;
}

.course-layout {
flex-direction: column;
}
}

DuckDB for Data Scientists
0 of 25 completed

Getting Started


What is DuckDB?


Installation


Zero Configuration

Working with DataFrames


Integrate Seamlessly with pandas and Polars


Memory Efficiency


Out-of-Core Processing
🔒


Fast Performance

SQL Syntax Shortcuts


FROM-First Syntax


GROUP BY ALL
🔒


SELECT * EXCLUDE
🔒


SELECT * REPLACE
🔒

File Operations


Streamlined File Reading
🔒


Query Cloud Storage
🔒


Automatic Parsing of CSV Files
🔒


Automatic Flattening of Nested Parquet Files
🔒


Automatic Flattening of Nested JSON Files
🔒


Reading Multiple Files
🔒


Hive Partitioned Datasets
🔒


Exporting Data
🔒

Working with Complex Types


Creating Lists, Structs, and Maps
🔒


Manipulating Nested Data
🔒

Advanced Features


Parameterized Queries


ACID Transactions
🔒


Attach External Databases
🔒

Summary


Key Takeaways
🔒

What is DuckDB?
DuckDB is a fast, in-process SQL OLAP database optimized for analytics. Unlike traditional databases like PostgreSQL or MySQL that require server setup and maintenance, DuckDB runs directly in your Python process.

It’s perfect for data scientists because:

Zero Configuration: No database server setup required
Memory Efficiency: Out-of-core processing for datasets larger than RAM
Familiar Interface: SQL syntax with shortcuts like GROUP BY ALL
Performance: Columnar-vectorized engine faster than pandas
Universal Access: Query files, cloud storage, and external databases

Complete & Continue →

Installation
Install DuckDB with pip:

pip install duckdb

Let’s verify the installation:

Python

Run

import duckdb

print(f"DuckDB version: {duckdb.__version__}")
print("Installation successful!")

Output

Loading Python…

← Previous

Complete & Continue →

Zero Configuration
SQL operations on DataFrames typically require setting up database servers. With pandas and PostgreSQL, you need to:

Install and configure a database server
Ensure the service is running
Set up credentials and connections
Write the DataFrame to a table first

# Traditional approach with pandas + PostgreSQL
import pandas as pd
from sqlalchemy import create_engine

sales = pd.DataFrame({
"product": ["A", "B", "C"],
"amount": [100, 150, 200]
})

# Requires server setup, credentials, running service…
engine = create_engine("postgresql://user:pass@localhost:5432/db")
sales.to_sql("sales", engine, if_exists="replace")

with engine.connect() as conn:
result = pd.read_sql("SELECT * FROM sales", conn)

DuckDB eliminates this overhead. Query DataFrames directly with SQL:

Python

Run

import duckdb
import pandas as pd

sales = pd.DataFrame({
"product": ["A", "B", "C"],
"amount": [100, 150, 200]
})

# No server needed – query DataFrame directly!
result = duckdb.sql("SELECT * FROM sales").df()
print(result)

Output

Loading Python…

💡 What the output shows
Notice how the query returns results instantly. There’s no connection string, no server startup time, and no authentication steps.

Try it

Edit the query to select items with quantity greater than 30 from the inventory DataFrame:

Python

Run

import duckdb
import pandas as pd

inventory = pd.DataFrame({
"item": ["Chair", "Desk", "Lamp"],
"quantity": [50, 20, 100]
})

# Edit this query to filter for quantity > 30
result = duckdb.sql("SELECT * FROM inventory").df()
print(result)

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * FROM inventory WHERE quantity > 30").df() “

Quiz

In the code above, how does DuckDB access the sales DataFrame?

A
It automatically detects Python variables and makes them queryable

B
You must register the DataFrame with duckdb.register() first

C
The DataFrame must be saved to disk before querying

💡 Correct
Correct! DuckDB scans your Python namespace and makes DataFrames available as SQL tables automatically.

⚠ Try Again
Not quite. Look at the code above. There’s no duckdb.register() call before the SQL query runs.

⚠ Try Again
Not quite. The DataFrame stays in memory. There’s no file saving step before the SQL query runs.

← Previous

Complete & Continue →

Integrate Seamlessly with pandas and Polars
Have you ever wanted to leverage SQL’s power while working with your favorite data manipulation libraries such as pandas and Polars?

DuckDB makes it seamless to query pandas and Polars DataFrames via the duckdb.sql function.

Python

Run

import duckdb
import pandas as pd
import polars as pl

pd_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

pl_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

print("Query pandas DataFrame:")
print(duckdb.sql("SELECT * FROM pd_df").df())

print("\nQuery Polars DataFrame:")
print(duckdb.sql("SELECT * FROM pl_df").df())

Output

💡 What the output shows
DuckDB recognized both pd_df (pandas) and pl_df (Polars) as DataFrame variables and queried them directly with SQL. No imports or registration needed.

DuckDB’s integration with pandas and Polars lets you combine the strengths of each tool. For example, you can:

Use pandas for data cleaning and feature engineering
Use DuckDB for complex aggregations and complex queries

Python

Run

import pandas as pd
import duckdb

# Create sales data
sales = pd.DataFrame({
"product": ["A", "B", "C", "A", "B", "C"] * 2,
"region": ["North", "South"] * 6,
"amount": [100, 150, 200, 120, 180, 220, 110, 160, 210, 130, 170, 230],
"date": pd.date_range("2024-01-01", periods=12)
})

# Use pandas for feature engineering
sales['month'] = sales['date'].dt.month
sales['is_high_value'] = sales['amount'] > 150
print("Sales after feature engineering:")
print(sales.head())

Output

Loading Python…

💡 What the output shows
pandas makes feature engineering straightforward: extracting month from dates and creating is_high_value flags are common transformations for preparing data for analysis or machine learning.

Now use DuckDB for complex aggregations:

Python

Run

# Use DuckDB for complex aggregations
analysis = duckdb.sql("""
SELECT
product,
region,
COUNT(*) as total_sales,
AVG(amount) as avg_amount,
SUM(CASE WHEN is_high_value THEN 1 ELSE 0 END) as high_value_sales
FROM sales
GROUP BY product, region
ORDER BY avg_amount DESC
""").df()

print("Sales analysis by product and region:")
print(analysis)

Output

Loading Python…

💡 What the output shows
DuckDB excels at complex aggregations: combining GROUP BY, AVG, and conditional CASE WHEN in a single query is more readable and efficient than equivalent pandas code.

Try it

Edit the query to combine results from both df_2023 and df_2024 using UNION ALL:

Python

Run

import duckdb
import pandas as pd

df_2023 = pd.DataFrame({"year": [2023, 2023], "sales": [100, 150]})
df_2024 = pd.DataFrame({"year": [2024, 2024], "sales": [200, 250]})

# Edit to combine both DataFrames with UNION ALL
result = duckdb.sql("SELECT * FROM df_2023").df()
print(result)

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("SELECT * FROM df_2023 UNION ALL SELECT * FROM df_2024").df() “

Quiz

What makes DuckDB’s approach to complex aggregations more readable than pandas?

A
All operations are expressed in a single, declarative query

B
DuckDB uses shorter function names

C
DuckDB automatically formats the output

💡 Correct
Correct! SQL lets you express GROUP BY, aggregates, and sorting in one cohesive statement, while pandas requires chaining multiple methods.

⚠ Try Again
Not quite. Function name length isn’t the key difference. Think about how operations are structured.

⚠ Try Again
Not quite. Output formatting isn’t what makes DuckDB’s approach more readable. Look at how the query combines multiple operations.

← Previous

Complete & Continue →

Memory Efficiency
Pandas loads entire datasets into RAM before filtering, which can cause out-of-memory errors. DuckDB processes only the rows that match your filter, using a fraction of the memory. To see this in action, let’s compare both approaches on the same dataset.

First, create a sample CSV file:

Python

Run

import pandas as pd

# Create sample data and save to CSV
customers = pd.DataFrame({
"id": range(1000),
"name": [f"Customer_{i}" for i in range(1000)],
"region": ["North", "South", "East", "West"] * 250
})
customers.to_csv("customers.csv", index=False)
print(f"Created customers.csv with {len(customers)} rows")

Output

Loading Python…

With pandas, filtering loads ALL records into RAM first:

Python

Run

import pandas as pd

# Read entire CSV into memory, then filter
df = pd.read_csv("customers.csv")
result = df[df["region"] == "North"]
print(f"Loaded {len(df)} rows to get {len(result)} matches")

Output

Loading Python…

With DuckDB, only matching rows enter memory:

Python

Run

import duckdb

# Stream from file, filter during read
result = duckdb.sql("""
SELECT *
FROM 'customers.csv'
WHERE region = 'North'
""").df()
print(f"Returned {len(result)} rows without loading full file")

Output

Loading Python…

The diagram below summarizes the memory difference:

RAM Usage

│ ████████████ Pandas (loads all 1M rows)

│ ██ DuckDB (streams, keeps 10K matches)

└──────────────────────────────────────────────

Quiz

What’s the key difference between how pandas and DuckDB handle the filter region = 'North'?

A
Pandas loads all rows first then filters; DuckDB processes only matching rows

B
Pandas uses more CPU; DuckDB uses more RAM

C
Pandas filters in Python; DuckDB filters in C++

💡 Correct
Correct! Pandas must load the entire DataFrame into memory before applying any filter. DuckDB evaluates the WHERE clause during scanning, never loading non-matching rows.

⚠ Try Again
Not quite. The difference isn’t about CPU vs RAM usage. Think about when filtering happens relative to data loading.

⚠ Try Again
Not quite. While implementation languages differ, the key difference is the order of operations: load-then-filter vs filter-while-loading.

← Previous

Complete & Continue →

Out-of-Core Processing

← Previous

Complete & Continue →

Fast Performance
While pandas processes data sequentially row-by-row, DuckDB uses a columnar-vectorized execution engine that processes data in parallel chunks. The diagram below shows how each approach handles data:

Pandas DuckDB
│ │
├─ Row 1 ──────> process ├─ Chunk 1 (2048 rows) ─┐
├─ Row 2 ──────> process ├─ Chunk 2 (2048 rows) ─┼─> process
├─ Row 3 ──────> process ├─ Chunk 3 (2048 rows) ─┘
├─ Row 4 ──────> process │
│ … │
▼ ▼
Sequential Parallel chunks

This architectural difference enables DuckDB to significantly outperform pandas, especially for computationally intensive operations like aggregations and joins.

Let’s compare the performance of pandas and DuckDB for aggregations on a million rows of data.

Python

Run

import time

# Pandas aggregation
start_time = time.time()
pandas_agg = customers.groupby(['region', 'segment']).size().reset_index(name='count')
pandas_time = time.time() – start_time

# DuckDB aggregation
start_time = time.time()
duckdb_agg = duckdb.sql("""
SELECT region, segment, COUNT(*) as count FROM customers GROUP BY region, segment
""").df()
duckdb_time = time.time() – start_time

print(f"Pandas aggregation time: {pandas_time:.2f} seconds")
print(f"DuckDB aggregation time: {duckdb_time:.2f} seconds")
print(f"Speedup: {pandas_time/duckdb_time:.1f}x")

Output

💡 What the output shows
DuckDB completes the same aggregation ~8x faster than pandas. The speedup comes from DuckDB’s columnar-vectorized execution engine processing data in parallel chunks.

📝 Note
The following benchmark was run on native Python. Results may vary in browser-based environments.

Quiz

How does pandas process data differently from DuckDB?

A
Pandas processes rows sequentially; DuckDB processes chunks in parallel

B
Pandas uses disk storage; DuckDB uses only RAM

C
Pandas compiles queries; DuckDB interprets them

💡 Correct
Correct! Pandas iterates through rows one at a time. DuckDB’s columnar-vectorized engine processes multiple rows simultaneously, enabling significant speedups for operations like GROUP BY.

⚠ Try Again
Not quite. Both can work with in-memory data. The difference is in execution strategy, not storage location.

⚠ Try Again
Not quite. This is reversed. DuckDB actually compiles queries into optimized execution plans, while pandas interprets method chains.

← Previous

Complete & Continue →

FROM-First Syntax
Traditional SQL requires SELECT before FROM. This adds unnecessary boilerplate when you just want a quick look at your data:

Python

Run

import duckdb
import pandas as pd

sales = pd.DataFrame({
"product": ["A", "B", "C", "A", "B"],
"region": ["North", "South", "North", "South", "North"],
"amount": [100, 200, 150, 120, 180]
})

# Traditional SQL
result = duckdb.sql("SELECT * FROM sales").df()
print(result)

Output

Loading Python…

DuckDB lets you skip SELECT * entirely, making quick data exploration faster:

Python

Run

# DuckDB: FROM-first (SELECT * is implied)
result = duckdb.sql("FROM sales").df()
print(result)

Output

Loading Python…

💡 What the output shows
Notice the results are the same. This confirms that FROM table automatically selects all columns.

Try it

Write a FROM-first query to get all sales with amount > 150:

Python

Run

import duckdb
import pandas as pd

sales = pd.DataFrame({
"product": ["A", "B", "C", "A", "B"],
"region": ["North", "South", "North", "South", "North"],
"amount": [100, 200, 150, 120, 180]
})

# Write a FROM-first query with WHERE clause
result = duckdb.sql("___").df()
print(result)

Output

Loading Python…

💡 Solution
“python result = duckdb.sql("FROM sales WHERE amount > 150").df() “

Quiz

What happens when you run FROM sales in DuckDB?

A
Returns only the first row from the sales table

B
Returns all rows and columns from the sales table

C
Returns the table schema without data

⚠ Try Again
Not quite. FROM table returns all rows, not just the first one. To limit rows, you’d use FROM table LIMIT 1.

💡 Correct
Correct! FROM table is shorthand for SELECT * FROM table, returning all rows and all columns.

⚠ Try Again
Not quite. FROM table returns data, not schema. To see the schema, use DESCRIBE table or SUMMARIZE table.

← Previous

Complete & Continue →

GROUP BY ALL

← Previous

Complete & Continue →

SELECT * EXCLUDE

← Previous

Complete & Continue →

SELECT * REPLACE

← Previous

Complete & Continue →

Streamlined File Reading

← Previous

Complete & Continue →

Query Cloud Storage

← Previous

Complete & Continue →

Automatic Parsing of CSV Files

← Previous

Complete & Continue →

Automatic Flattening of Nested Parquet Files

← Previous

Complete & Continue →

Automatic Flattening of Nested JSON Files

← Previous

Complete & Continue →

Reading Multiple Files

← Previous

Complete & Continue →

Hive Partitioned Datasets

← Previous

Complete & Continue →

Exporting Data

← Previous

Complete & Continue →

Creating Lists, Structs, and Maps

← Previous

Complete & Continue →

Manipulating Nested Data

← Previous

Complete & Continue →

Parameterized Queries
When working with databases, you often need to run similar queries with different parameters. For instance, you might want to filter a table using various criteria.

First, let’s create a sample products table:

Python

Run

import duckdb

conn = duckdb.connect(":memory:")
conn.sql("""
CREATE TABLE products (id INT, name VARCHAR, price DECIMAL)
""")
conn.sql("""
INSERT INTO products VALUES
(1, 'Laptop', 999.99),
(2, 'Phone', 699.99),
(3, 'Tablet', 449.99),
(4, 'Watch', 299.99)
""")

print(conn.sql("SELECT * FROM products").df())

Output

Loading Python…

You might use f-strings to pass parameters to your queries:

Python

Run

min_price = 400
result = conn.sql(
f"SELECT * FROM products WHERE price > {min_price}"
).df()

print(f"Products over ${min_price}:")
print(result)

Output

Loading Python…

⚠ Caution
While this works, f-strings are dangerous. A malicious user could:

Input "0; DROP TABLE products; –" to delete your table
Input "0 UNION SELECT * FROM secrets" to steal data

DuckDB provides a safer way with parameterized queries using the ? placeholder:

Python

Run

min_price = 400
result = conn.execute(
"SELECT * FROM products WHERE price > ?",
(min_price,)
).df()

print(f"Products over ${min_price}:")
print(result)

Output

Loading Python…

💡 What the output shows
DuckDB binds 400 to the ? placeholder separately from parsing. Even if min_price contained malicious SQL, it would be treated as a literal value. This makes your database immune to injection attacks.

Try it

Use the ? placeholder to find products under $300:

Python

Run

import duckdb

conn = duckdb.connect(":memory:")
conn.sql("""
CREATE TABLE products (id INT, name VARCHAR, price DECIMAL)
""")
conn.sql("""
INSERT INTO products VALUES
(1, 'Laptop', 999.99),
(2, 'Phone', 699.99),
(3, 'Tablet', 449.99),
(4, 'Watch', 299.99)
""")

max_price = 300
result = conn.execute(
"SELECT * FROM products WHERE ___",
___
).df()
print(result)

Output

Loading Python…

💡 Solution
“python "SELECT * FROM products WHERE price < ?", (max_price,) “
The ? placeholder gets replaced with the value from the tuple. The trailing comma is required for single-element tuples.

Quiz

If a malicious user sets min_price = "0; DROP TABLE products", what happens with parameterized queries?

A
DuckDB treats the entire string as a literal value, causing a type error

B
The products table gets deleted

C
DuckDB ignores the input and uses a default value

💡 Correct
Correct! The malicious string is treated as a literal value to compare against price. Since it’s not a valid number, the query fails safely without executing any DROP command.

⚠ Try Again
Not quite. That would happen with f-strings. Parameterized queries prevent the injected SQL from being executed as code.

⚠ Try Again
Not quite. DuckDB doesn’t silently replace bad input. It processes the input as a literal value, which would cause a type mismatch error.

← Previous

Complete & Continue →

ACID Transactions

← Previous

Complete & Continue →

Attach External Databases

← Previous

Complete & Continue →

Key Takeaways

← Previous

Complete Course

DuckDB for Data Scientists Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran