Entity Extraction with spaCy and LLMs
/* CodeMirror 5 CSS (inlined to prevent WordPress stripping) */
.CodeMirror{font-family:’Fira Code’,monospace;height:300px;color:#000;direction:ltr}.CodeMirror-lines{padding:4px 0}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{padding:0 4px}.CodeMirror-gutter-filler,.CodeMirror-scrollbar-filler{background-color:#fff}.CodeMirror-gutters{border-right:1px solid #ddd;background-color:#f7f7f7;white-space:nowrap}.CodeMirror-linenumber{padding:0 3px 0 5px;min-width:20px;text-align:right;color:#999;white-space:nowrap}.CodeMirror-guttermarker{color:#000}.CodeMirror-guttermarker-subtle{color:#999}.CodeMirror-cursor{border-left:1px solid #000;border-right:none;width:0}.CodeMirror div.CodeMirror-secondarycursor{border-left:1px solid silver}.cm-fat-cursor .CodeMirror-cursor{width:auto;border:0!important;background:#7e7}.cm-fat-cursor div.CodeMirror-cursors{z-index:1}.cm-fat-cursor .CodeMirror-line::selection,.cm-fat-cursor .CodeMirror-line>span::selection,.cm-fat-cursor .CodeMirror-line>span>span::selection{background:0 0}.cm-fat-cursor .CodeMirror-line::-moz-selection,.cm-fat-cursor .CodeMirror-line>span::-moz-selection,.cm-fat-cursor .CodeMirror-line>span>span::-moz-selection{background:0 0}.cm-fat-cursor{caret-color:transparent}@-moz-keyframes blink{50%{background-color:transparent}}@-webkit-keyframes blink{50%{background-color:transparent}}@keyframes blink{50%{background-color:transparent}}.cm-tab{display:inline-block;text-decoration:inherit}.CodeMirror-rulers{position:absolute;left:0;right:0;top:-50px;bottom:0;overflow:hidden}.CodeMirror-ruler{border-left:1px solid #ccc;top:0;bottom:0;position:absolute}.cm-s-default .cm-header{color:#00f}.cm-s-default .cm-quote{color:#090}.cm-negative{color:#d44}.cm-positive{color:#292}.cm-header,.cm-strong{font-weight:700}.cm-em{font-style:italic}.cm-link{text-decoration:underline}.cm-strikethrough{text-decoration:line-through}.cm-s-default .cm-keyword{color:#708}.cm-s-default .cm-atom{color:#219}.cm-s-default .cm-number{color:#164}.cm-s-default .cm-def{color:#00f}.cm-s-default .cm-variable-2{color:#05a}.cm-s-default .cm-type,.cm-s-default .cm-variable-3{color:#085}.cm-s-default .cm-comment{color:#a50}.cm-s-default .cm-string{color:#a11}.cm-s-default .cm-string-2{color:#f50}.cm-s-default .cm-meta{color:#555}.cm-s-default .cm-qualifier{color:#555}.cm-s-default .cm-builtin{color:#30a}.cm-s-default .cm-bracket{color:#997}.cm-s-default .cm-tag{color:#170}.cm-s-default .cm-attribute{color:#00c}.cm-s-default .cm-hr{color:#999}.cm-s-default .cm-link{color:#00c}.cm-s-default .cm-error{color:red}.cm-invalidchar{color:red}.CodeMirror-composing{border-bottom:2px solid}div.CodeMirror span.CodeMirror-matchingbracket{color:#0b0}div.CodeMirror span.CodeMirror-nonmatchingbracket{color:#a22}.CodeMirror-matchingtag{background:rgba(255,150,0,.3)}.CodeMirror-activeline-background{background:#e8f2ff}.CodeMirror{position:relative;overflow:hidden;background:#fff}.CodeMirror-scroll{overflow:scroll!important;margin-bottom:-50px;margin-right:-50px;padding-bottom:50px;height:100%;outline:0;position:relative;z-index:0}.CodeMirror-sizer{position:relative;border-right:50px solid transparent}.CodeMirror-gutter-filler,.CodeMirror-hscrollbar,.CodeMirror-scrollbar-filler,.CodeMirror-vscrollbar{position:absolute;z-index:6;display:none;outline:0}.CodeMirror-vscrollbar{right:0;top:0;overflow-x:hidden;overflow-y:scroll}.CodeMirror-hscrollbar{bottom:0;left:0;overflow-y:hidden;overflow-x:scroll}.CodeMirror-scrollbar-filler{right:0;bottom:0}.CodeMirror-gutter-filler{left:0;bottom:0}.CodeMirror-gutters{position:absolute;left:0;top:0;min-height:100%;z-index:3}.CodeMirror-gutter{white-space:normal;height:100%;display:inline-block;vertical-align:top;margin-bottom:-50px}.CodeMirror-gutter-wrapper{position:absolute;z-index:4;background:0 0!important;border:none!important}.CodeMirror-gutter-background{position:absolute;top:0;bottom:0;z-index:4}.CodeMirror-gutter-elt{position:absolute;cursor:default;z-index:4}.CodeMirror-gutter-wrapper ::selection{background-color:transparent}.CodeMirror-gutter-wrapper ::-moz-selection{background-color:transparent}.CodeMirror-lines{cursor:text;min-height:1px}.CodeMirror pre.CodeMirror-line,.CodeMirror pre.CodeMirror-line-like{-moz-border-radius:0;-webkit-border-radius:0;border-radius:0;border-width:0;background:0 0;font-family:inherit;font-size:inherit;margin:0;white-space:pre;word-wrap:normal;line-height:inherit;color:inherit;z-index:2;position:relative;overflow:visible;-webkit-tap-highlight-color:transparent;-webkit-font-variant-ligatures:contextual;font-variant-ligatures:contextual}.CodeMirror-wrap pre.CodeMirror-line,.CodeMirror-wrap pre.CodeMirror-line-like{word-wrap:break-word;white-space:pre-wrap;word-break:normal}.CodeMirror-linebackground{position:absolute;left:0;right:0;top:0;bottom:0;z-index:0}.CodeMirror-linewidget{position:relative;z-index:2;padding:.1px}.CodeMirror-rtl pre{direction:rtl}.CodeMirror-code{outline:0}.CodeMirror-gutter,.CodeMirror-gutters,.CodeMirror-linenumber,.CodeMirror-scroll,.CodeMirror-sizer{-moz-box-sizing:content-box;box-sizing:content-box}.CodeMirror-measure{position:absolute;width:100%;height:0;overflow:hidden;visibility:hidden}.CodeMirror-cursor{position:absolute;pointer-events:none}.CodeMirror-measure pre{position:static}div.CodeMirror-cursors{visibility:hidden;position:relative;z-index:3}div.CodeMirror-dragcursors{visibility:visible}.CodeMirror-focused div.CodeMirror-cursors{visibility:visible}.CodeMirror-selected{background:#d9d9d9}.CodeMirror-focused .CodeMirror-selected{background:#d7d4f0}.CodeMirror-crosshair{cursor:crosshair}.CodeMirror-line::selection,.CodeMirror-line>span::selection,.CodeMirror-line>span>span::selection{background:#d7d4f0}.CodeMirror-line::-moz-selection,.CodeMirror-line>span::-moz-selection,.CodeMirror-line>span>span::-moz-selection{background:#d7d4f0}.cm-searching{background-color:#ffa;background-color:rgba(255,255,0,.4)}.cm-force-border{padding-right:.1px}@media print{.CodeMirror div.CodeMirror-cursors{visibility:hidden}}.cm-tab-wrap-hack:after{content:”}span.CodeMirror-selectedtext{background:0 0}
/* Material Palenight theme */
.cm-s-material-palenight.CodeMirror{background-color:#292d3e;color:#a6accd}.cm-s-material-palenight .CodeMirror-gutters{background:#292d3e;color:#676e95;border:none}.cm-s-material-palenight .CodeMirror-guttermarker,.cm-s-material-palenight .CodeMirror-guttermarker-subtle,.cm-s-material-palenight .CodeMirror-linenumber{color:#676e95}.cm-s-material-palenight .CodeMirror-cursor{border-left:1px solid #fc0}.cm-s-material-palenight.cm-fat-cursor .CodeMirror-cursor{background-color:#607c8b80!important}.cm-s-material-palenight .cm-animate-fat-cursor{background-color:#607c8b80!important}.cm-s-material-palenight div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight.CodeMirror-focused div.CodeMirror-selected{background:rgba(113,124,180,.2)}.cm-s-material-palenight .CodeMirror-line::selection,.cm-s-material-palenight .CodeMirror-line>span::selection,.cm-s-material-palenight .CodeMirror-line>span>span::selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-line::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span::-moz-selection,.cm-s-material-palenight .CodeMirror-line>span>span::-moz-selection{background:rgba(128,203,196,.2)}.cm-s-material-palenight .CodeMirror-activeline-background{background:rgba(0,0,0,.5)}.cm-s-material-palenight .cm-keyword{color:#c792ea}.cm-s-material-palenight .cm-operator{color:#89ddff}.cm-s-material-palenight .cm-variable-2{color:#eff}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#f07178}.cm-s-material-palenight .cm-builtin{color:#ffcb6b}.cm-s-material-palenight .cm-atom{color:#f78c6c}.cm-s-material-palenight .cm-number{color:#ff5370}.cm-s-material-palenight .cm-def{color:#82aaff}.cm-s-material-palenight .cm-string{color:#c3e88d}.cm-s-material-palenight .cm-string-2{color:#f07178}.cm-s-material-palenight .cm-comment{color:#676e95}.cm-s-material-palenight .cm-variable{color:#f07178}.cm-s-material-palenight .cm-tag{color:#ff5370}.cm-s-material-palenight .cm-meta{color:#ffcb6b}.cm-s-material-palenight .cm-attribute{color:#c792ea}.cm-s-material-palenight .cm-property{color:#c792ea}.cm-s-material-palenight .cm-qualifier{color:#decb6b}.cm-s-material-palenight .cm-type,.cm-s-material-palenight .cm-variable-3{color:#decb6b}.cm-s-material-palenight .cm-error{color:#fff;background-color:#ff5370}.cm-s-material-palenight .CodeMirror-matchingbracket{text-decoration:underline;color:#fff!important}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
font-family: -apple-system, BlinkMacSystemFont, ‘Segoe UI’, Roboto, sans-serif;
background: #1a1a1a;
color: #f0f0f0;
line-height: 1.6;
}
/* Layout */
.course-layout {
display: flex;
min-height: 100vh;
}
/* Sidebar */
.course-sidebar {
width: 280px;
background: #2F2D2E;
border-right: 1px solid #4a4849;
position: fixed;
height: 100vh;
overflow-y: auto;
padding: 1.5rem 0;
}
.course-title {
padding: 0 1.5rem 1rem;
border-bottom: 1px solid #4a4849;
margin-bottom: 1rem;
}
.course-title h1 {
font-size: 1.1rem;
color: #72BEFA;
margin-bottom: 0.25rem;
}
.course-title .progress-text {
font-size: 0.75rem;
color: #888;
}
.progress-bar {
height: 4px;
background: #4a4849;
border-radius: 2px;
margin-top: 0.5rem;
overflow: hidden;
}
.progress-fill {
height: 100%;
background: #72BEFA;
width: 0%;
transition: width 0.3s;
}
/* Navigation */
.nav-section {
margin-bottom: 1rem;
}
.nav-section-title {
padding: 0.5rem 1.5rem;
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: #888;
}
.nav-item {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.6rem 1.5rem;
color: #ccc;
text-decoration: none;
font-size: 0.9rem;
transition: all 0.2s;
cursor: pointer;
border-left: 3px solid transparent;
}
.nav-item:hover {
background: #3d3b3c;
color: #fff;
}
.nav-item.active {
background: #3d3b3c;
border-left-color: #72BEFA;
color: #72BEFA;
}
.nav-item.completed .status-icon {
color: #72BEFA;
}
.status-icon {
width: 20px;
height: 20px;
min-width: 20px;
flex-shrink: 0;
display: flex;
align-items: center;
justify-content: center;
border: 2px solid #4a4849;
border-radius: 50%;
font-size: 0.7rem;
}
.nav-item.completed .status-icon {
border-color: #72BEFA;
background: rgba(114, 252, 219, 0.1);
}
.lock-icon {
margin-left: auto;
font-size: 0.75rem;
color: #666;
opacity: 0.7;
flex-shrink: 0;
min-width: 1rem;
}
/* Main content */
.course-content {
margin-left: 280px;
flex: 1;
padding: 2rem 3rem;
max-width: 900px;
}
.lesson {
display: none;
}
.lesson.active {
display: block;
}
.lesson h2 {
color: #72BEFA;
font-size: 1.75rem;
margin-bottom: 1.5rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #4a4849;
}
.lesson h3 {
color: #fff;
font-size: 1.25rem;
margin-top: 2rem;
margin-bottom: 1rem;
}
.lesson h4 {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}
.lesson p {
color: #ccc;
margin-bottom: 1rem;
}
.lesson ul, .lesson ol {
color: #ccc;
margin-bottom: 1rem;
padding-left: 1.5rem;
}
.lesson li {
margin-bottom: 0.5rem;
}
.lesson code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9em;
color: #72BEFA;
}
.lesson pre {
background: #2F2D2E;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
margin-bottom: 1rem;
border: 1px solid #4a4849;
}
.lesson pre code {
background: none;
padding: 0;
color: #f8f8f2;
}
/* Callouts */
.callout {
padding: 1rem 1.25rem;
border-radius: 8px;
margin: 1.5rem 0;
border-left: 4px solid;
}
.callout-title {
font-weight: 600;
margin-bottom: 0.5rem;
display: flex;
align-items: center;
gap: 0.5rem;
}
.callout-tip {
background: rgba(114, 190, 250, 0.1);
border-color: #72BEFA;
}
.callout-tip .callout-title {
color: #72BEFA;
}
.callout-note {
background: rgba(114, 252, 219, 0.1);
border-color: #72FCDB;
}
.callout-note .callout-title {
color: #72FCDB;
}
.callout-warning {
background: rgba(229, 131, 182, 0.1);
border-color: #E583B6;
}
.callout-warning .callout-title {
color: #E583B6;
}
.callout a {
color: #fff;
text-decoration: underline;
}
.callout a:hover {
color: #72FCDB;
}
/* Collapsible callouts */
details.callout {
cursor: pointer;
}
details.callout summary.callout-title {
cursor: pointer;
list-style: none;
}
details.callout summary.callout-title::before {
content: ‘▶ ‘;
font-size: 0.8em;
transition: transform 0.2s;
display: inline-block;
}
details.callout[open] summary.callout-title::before {
transform: rotate(90deg);
}
details.callout summary.callout-title::-webkit-details-marker {
display: none;
}
details.callout > p {
margin-top: 0.75rem;
}
.callout pre {
background: #1a1a1a;
border-radius: 6px;
padding: 1rem;
margin-top: 0.75rem;
overflow-x: auto;
}
.callout pre code {
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
color: #c3e88d;
}
/* Blockquotes */
.lesson blockquote {
border-left: 3px solid #72BEFA;
background: rgba(114, 190, 250, 0.08);
padding: 0.75rem 1.25rem;
border-radius: 0 6px 6px 0;
margin: 1rem 0;
}
.lesson blockquote p {
margin: 0;
color: rgba(255, 255, 255, 0.85);
}
/* Tables */
.course-table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0 1.5rem 0;
font-size: 0.95rem;
}
.course-table th,
.course-table td {
border: 1px solid #4a4849;
padding: 0.6rem 1rem;
text-align: left;
}
.course-table thead th {
background: #3a3839;
color: #e0e0e0;
font-weight: 600;
}
.course-table tbody td {
color: #ccc;
}
.course-table tbody tr:nth-child(even) {
background: rgba(255, 255, 255, 0.03);
}
/* Quiz */
.quiz {
background: #2F2D2E;
border-radius: 8px;
padding: 1.5rem;
margin: 0 0 1.5rem 0;
border: 1px solid #4a4849;
}
.quiz-heading {
color: #ccc;
font-size: 1.1rem;
margin-top: 1.5rem;
margin-bottom: 0.75rem;
}
.quiz-divider {
border: none;
border-top: 1px solid #4a4849;
margin: 1.5rem 0;
}
.quiz-question {
color: #fff;
font-size: 1rem;
margin-bottom: 1rem;
font-weight: 500;
}
.quiz-options {
display: flex;
flex-direction: column;
gap: 0.75rem;
}
.quiz-option {
display: flex;
align-items: center;
gap: 0.75rem;
padding: 0.75rem 1rem;
background: #3d3b3c;
border: 2px solid #4a4849;
border-radius: 8px;
cursor: pointer;
transition: all 0.2s;
text-align: left;
width: 100%;
}
.quiz-option:hover:not(:disabled) {
border-color: #72BEFA;
background: #454243;
}
.quiz-option:disabled {
cursor: default;
}
.quiz-option.correct {
border-color: #72FCDB;
background: rgba(114, 252, 219, 0.15);
}
.quiz-option.incorrect {
border-color: #ff6b6b;
background: rgba(255, 107, 107, 0.15);
}
.option-label {
display: flex;
align-items: center;
justify-content: center;
width: 28px;
height: 28px;
min-width: 28px;
background: #4a4849;
border-radius: 50%;
font-weight: 600;
font-size: 0.85rem;
color: #fff;
}
.quiz-option.correct .option-label {
background: #72FCDB;
color: #2F2D2E;
}
.quiz-option.incorrect .option-label {
background: #ff6b6b;
color: #2F2D2E;
}
.option-content {
display: block;
flex: 1;
color: #ccc;
}
.option-content code {
background: #282a36;
padding: 0.5rem 0.75rem;
border-radius: 4px;
font-size: 0.85rem;
display: block;
color: #f8f8f2;
}
.quiz-feedback {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #4a4849;
}
.quiz-feedback .callout {
margin: 0;
}
/* Code widget */
.codecut-widget {
background: #2F2D2E;
border-radius: 8px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}
.codecut-widget-header {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.5rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}
.codecut-widget-lang {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.codecut-run-btn {
display: flex;
align-items: center;
gap: 0.4rem;
background: #72BEFA;
color: #2F2D2E;
border: none;
padding: 0.4rem 0.8rem;
border-radius: 4px;
font-size: 0.8rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
}
.codecut-run-btn:hover {
background: #5aa8e8;
}
.codecut-run-btn:disabled {
background: #666;
cursor: not-allowed;
}
.codecut-editor {
min-height: 80px;
background: #2F2D2E;
}
.codecut-editor textarea {
width: 100%;
min-height: 80px;
padding: 1rem;
background: #2F2D2E;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: vertical;
outline: none;
overflow: hidden;
}
/* Static code widgets (read-only, no header/output) */
.codecut-widget[data-static=”true”] {
border-radius: 8px;
border: 1px solid #4a4849;
}
.codecut-widget[data-static=”true”] .codecut-editor {
border-radius: 8px;
min-height: auto;
}
.codecut-widget[data-static=”true”] .codecut-editor textarea {
min-height: auto;
}
.codecut-widget[data-static=”true”] .CodeMirror {
min-height: auto;
}
.codecut-widget[data-static=”true”] .CodeMirror-scroll {
min-height: auto;
}
/* CodeMirror 5 styling overrides */
.CodeMirror {
height: auto;
min-height: 80px;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
background: #282a36;
border-radius: 0;
}
.CodeMirror-scroll {
min-height: 80px;
overflow-x: auto !important;
overflow-y: hidden !important;
}
.CodeMirror-gutters {
background: #282a36;
border-right: 1px solid #4a4849;
min-width: 40px;
}
.CodeMirror-linenumber {
color: #6272a4;
padding: 0 8px 0 5px;
min-width: 25px;
text-align: right;
}
.CodeMirror-sizer {
margin-left: 40px !important;
}
.CodeMirror-cursor {
border-left-color: #72BEFA;
}
.CodeMirror-selected {
background: rgba(114, 190, 250, 0.3) !important;
}
.CodeMirror-focused .CodeMirror-selected {
background: rgba(114, 190, 250, 0.4) !important;
}
/* Suppress red error background for $ and other valid-in-context tokens */
.cm-s-material-palenight .cm-error {
background: none;
}
.codecut-output-section {
margin-top: 0.75rem;
border-top: 2px solid #4a4849;
background: #252324;
}
.codecut-output-header {
padding: 0.4rem 1rem;
background: #3d3b3c;
border-bottom: 1px solid #4a4849;
}
.codecut-output-label {
color: #aaa;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
}
.codecut-output {
padding: 1rem;
min-height: 60px;
max-height: 300px;
overflow-y: auto;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
}
.codecut-output.error { color: #ff6b6b; }
.codecut-output.loading { color: #72BEFA; }
.codecut-output .success { color: #72BEFA; }
.codecut-spinner {
display: inline-block;
width: 14px;
height: 14px;
border: 2px solid #2F2D2E;
border-top-color: transparent;
border-radius: 50%;
animation: spin 0.8s linear infinite;
}
@keyframes spin {
to { transform: rotate(360deg); }
}
/* Exercise widget */
.exercise-widget {
background: #1e1e2e;
border-radius: 12px;
overflow: hidden;
margin: 1.5rem 0;
border: 1px solid #4a4849;
}
.exercise-split {
display: flex;
flex-direction: column;
}
.exercise-left {
padding: 20px 24px;
background: #252535;
border-bottom: 1px solid #4a4849;
}
.exercise-title {
color: #72BEFA;
font-size: 1rem;
font-weight: 600;
margin: 0 0 1rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.exercise-assignment {
color: #e0e0e0;
font-size: 0.9rem;
line-height: 1.6;
display: flex;
flex-wrap: wrap;
gap: 1.5rem 3rem;
}
.exercise-assignment p {
margin: 0;
}
.exercise-heading {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
margin: 0 0 0.4rem 0;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.exercise-section {
flex: 1;
min-width: 200px;
}
.exercise-heading + p {
margin-top: 0;
}
.exercise-assignment em {
color: #ffffff;
font-style: italic;
}
.exercise-assignment code {
background: #3d3b3c;
padding: 0.2rem 0.4rem;
border-radius: 4px;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
}
.exercise-secrets {
margin-top: 1rem;
padding-top: 1rem;
border-top: 1px solid #3d3b3c;
}
.exercise-secret {
display: flex;
flex-direction: column;
gap: 0.4rem;
margin-bottom: 0.75rem;
}
.exercise-secret:last-child {
margin-bottom: 0;
}
.exercise-secret label {
color: #72BEFA;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.exercise-secret input {
padding: 0.6rem 0.8rem;
background: #1e1e2e;
border: 1px solid #4a4849;
border-radius: 6px;
color: #e0e0e0;
font-family: ‘Fira Code’, monospace;
font-size: 0.85rem;
outline: none;
transition: border-color 0.2s;
}
.exercise-secret input:focus {
border-color: #72BEFA;
}
.exercise-secret input::placeholder {
color: #666;
}
.exercise-right {
display: flex;
flex-direction: column;
background: #1e1e2e;
}
.exercise-editor {
flex: 1;
min-height: 200px;
background: #282a36;
}
.exercise-editor textarea {
width: 100%;
min-height: 200px;
padding: 1rem;
background: #282a36;
color: #f8f8f2;
border: none;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
resize: none;
outline: none;
}
.exercise-actions {
display: flex;
gap: 8px;
padding: 12px 16px;
background: #1a1a2e;
border-top: 1px solid #4a4849;
}
.exercise-btn {
display: flex;
align-items: center;
gap: 0.4rem;
padding: 0.5rem 1rem;
border: none;
border-radius: 6px;
font-size: 0.85rem;
font-weight: 600;
cursor: pointer;
transition: all 0.2s;
background: #3d3b3c;
color: #e0e0e0;
}
.exercise-btn:hover {
background: #4d4b4c;
}
.exercise-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.exercise-btn.primary {
background: #72BEFA;
color: #1e1e2e;
}
.exercise-btn.primary:hover {
background: #5aa8e8;
}
.exercise-btn.primary:disabled {
background: #666;
}
.exercise-output-section {
border-top: 1px solid #4a4849;
background: #1e1e2e;
}
.exercise-output-header {
padding: 0.5rem 1rem;
background: #252535;
border-bottom: 1px solid #4a4849;
}
.exercise-output-label {
color: #888;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.exercise-output {
padding: 1rem;
font-family: ‘Fira Code’, monospace;
font-size: 0.9rem;
line-height: 1.5;
color: #f8f8f2;
white-space: pre-wrap;
max-height: 200px;
overflow-y: auto;
}
.exercise-output.error { color: #ff6b6b; }
.exercise-output.loading { color: #72BEFA; }
.exercise-output.success { color: #72FCDB; }
.exercise-result {
padding: 1rem;
margin: 0;
font-weight: 600;
text-align: center;
}
.exercise-result.success {
background: rgba(114, 252, 219, 0.1);
color: #72FCDB;
border-top: 2px solid #72FCDB;
}
.exercise-result.failure {
background: rgba(255, 107, 107, 0.1);
color: #ff6b6b;
border-top: 2px solid #ff6b6b;
}
/* Navigation buttons */
.lesson-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #4a4849;
}
.lesson-nav-btn {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.75rem 1.5rem;
background: #3d3b3c;
color: #fff;
border: none;
border-radius: 8px;
font-size: 0.9rem;
cursor: pointer;
transition: all 0.2s;
}
.lesson-nav-btn:hover {
background: #4a4849;
}
.lesson-nav-btn.primary {
background: #72BEFA;
color: #2F2D2E;
}
.lesson-nav-btn.primary:hover {
background: #5aa8e8;
}
/* Responsive */
@media (max-width: 768px) {
.course-sidebar {
width: 100%;
position: relative;
height: auto;
}
.course-content {
margin-left: 0;
padding: 1.5rem;
}
.course-layout {
flex-direction: column;
}
}
Entity Extraction with spaCy and LLMs
0 of 17 completed
Getting Started
○
What is Entity Extraction?
○
Sample Document
The Manual Approach
○
Why Not Use Regex?
spaCy NER
○
Production-Grade Named Entity Recognition
○
Exercise: Build a Contact List
🔒
○
Extracting from Business Documents
🔒
○
Exercise: Export Contact List
🔒
○
Visualizing Entities with displaCy
🔒
GLiNER
○
Zero-Shot Custom Entity Extraction
○
Extracting Business Entities
🔒
○
Exercise: Parse Business Metrics
🔒
○
Using Confidence Scores for Quality Control
🔒
○
Exercise: Route Low-Confidence to Review
🔒
langextract
○
AI-Powered Extraction with Source Grounding
○
Exercise: Analyze Customer Feedback
🔒
○
Visualizing Extractions
🔒
Summary
○
When to Use Each Tool
🔒
What is Entity Extraction?
Entity extraction (also called Named Entity Recognition or NER) automatically identifies and classifies key information from unstructured text. For instance, financial reports contain company names, monetary figures, executives, dates, and locations used for competitive analysis and executive tracking.
Extracting these entities manually is time-consuming and error-prone. Automated entity extraction provides a faster and more reliable alternative.
In this course, you’ll learn three modern tools for entity extraction:
spaCy: Production-ready NER with pre-trained models
GLiNER: Zero-shot custom entity recognition
langextract: AI-powered extraction with source grounding
Complete & Continue →
Sample Document
Throughout this course, we’ll extract entities from this earnings report.
Press Run below to try it out.
Python
Run
earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. The company's board of directors
declared a cash dividend of $0.24 per share.
CFO Luca Maestri mentioned that iPhone revenue was $39.3 billion for
the quarter ending June 30, 2023. The company expects total revenue
between $89 billion and $93 billion for the fourth quarter.
Apple's Cupertino headquarters announced the acquisition of AI startup
WaveOne for an undisclosed amount. The deal is expected to close in
Q4 2023, pending regulatory approval from the SEC.
"""
print("Earnings report loaded!")
print(f"Document length: {len(earning_report)} characters")
Output
Loading Python…
We chose this report because it’s dense with overlapping entity types, which is exactly what makes real-world extraction challenging:
Monetary amounts appear in different contexts: revenue ($81.4B), dividends ($0.24), and forecasted ranges ($89B-$93B)
Named entities overlap: “Apple Inc.” is both a company and a stock ticker (AAPL), and “SEC” is an abbreviation that needs context to identify
Temporal references mix formats: exact dates (June 30, 2023), quarters (Q4 2023), and relative time (year over year)
← Previous
Complete & Continue →
Why Not Use Regex?
Regular expressions define text patterns using special syntax to find matches in strings. While they may seem like a natural first choice for entity extraction, they require a separate pattern for each entity type and fail when formats vary.
Here’s what extracting financial amounts, dates, stock symbols, and quarters with regex looks like:
Python
Run
import re
earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. CFO Luca Maestri mentioned that
iPhone revenue was $39.3 billion for the quarter ending June 30, 2023.
"""
# Each entity type needs a separate complex pattern
financial_pattern = r"\$(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.[0-9]+)?(?:\s*(?:billion|million|trillion))?"
date_pattern = r"\b(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},\s+\d{4}"
stock_pattern = r"\b(?:NASDAQ|NYSE|NYSEARCA):\s*[A-Z]{2,5}\b"
quarter_pattern = r"\b(Q[1-4]\s+\d{4})\b"
print("Financial amounts:", re.findall(financial_pattern, earning_report, re.IGNORECASE))
print("Dates:", re.findall(date_pattern, earning_report))
print("Stock symbols:", re.findall(stock_pattern, earning_report))
print("Quarters:", re.findall(quarter_pattern, earning_report))
Output
Loading Python…
From the code above, several limitations become apparent:
Each entity type requires its own pattern, resulting in verbose boilerplate code that is difficult to read and maintain.
The patterns only match numeric quarter formats like “Q4 2023” and miss textual forms such as “third quarter” unless additional exact-match patterns are added.
Quiz
A document contains dates in formats like “January 15, 2024”, “15/01/2024”, and “2024-01-15”. What challenge does regex face here?
A
Regex cannot match numeric characters
B
Each date format requires a separate pattern, making the code harder to maintain as formats increase
C
Regex patterns are limited to 100 characters in length
⚠ Try Again
Not quite. Regex handles numeric characters easily with patterns like \d. The challenge is handling multiple format variations.
💡 Correct
Correct! Each date format (ISO, US, European, written) needs its own pattern. As formats multiply, the codebase grows harder to maintain and test.
⚠ Try Again
Not quite. Regex patterns have no practical length limit. The challenge is writing and maintaining patterns for every format variation.
← Previous
Complete & Continue →
Production-Grade Named Entity Recognition
spaCy provides pre-trained models that automatically identify entities like PERSON, ORG, MONEY, DATE, and PERCENT from context. No pattern writing required.
Let’s install spaCy and download a small English model to get started:
pip install spacy
python -m spacy download en_core_web_sm
Extracting entities with spaCy takes just two steps:
Load the model
Process your text
Python
Run
import spacy
# Load the model
nlp = spacy.load("en_core_web_sm")
# Process your text
sample_text = "Apple Inc. reported revenue of $81.4 billion with CEO Tim Cook."
doc = nlp(sample_text)
print("Entities found:")
for ent in doc.ents:
print(f" '{ent.text}' -> {ent.label_}")
Output
💡 What the output shows
spaCy extracted three entity types (ORG, MONEY, PERSON) without any configuration
The model understood that “Apple Inc.” is a company, not just a fruit
It captured the complete monetary amount “$81.4 billion” including the unit
Person names are recognized even without titles like “CEO”
How spaCy NER Works
spaCy labels each token individually using its BILUO tagging scheme, then groups consecutive entity tokens into spans:
"Apple" "Inc." "CEO" "Tim" "Cook" "$81.4" "billion"
│ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼
B-ORG L-ORG O B-PER L-PER B-MONEY L-MONEY
└───┬───┘ └──┬──┘ └────┬────┘
▼ ▼ ▼
"Apple Inc." → ORG "Tim Cook" → PERSON "$81.4 billion" → MONEY
Begin / Inside / Last mark multi-token entities
Unit marks single-token entities (e.g., “London” → U-LOC)
O means outside any entity
The model learns these tagging patterns from thousands of labeled examples during training.
Quiz
How does spaCy determine that “Apple Inc.” is an ORG entity?
A
It matches against a built-in dictionary of known company names
B
It uses regex to match common organization name patterns
C
The pre-trained model learned patterns from labeled training data
⚠ Try Again
Not quite. spaCy doesn’t use a fixed lookup table. It uses a statistical model that can recognize entities it has never seen before based on learned patterns.
⚠ Try Again
Not quite. Regex uses fixed text patterns. spaCy’s NER model uses neural networks trained on annotated text to predict entity types from context.
💡 Correct
Correct! spaCy’s NER is a statistical model trained on annotated text. It learned patterns like capitalization, surrounding words, and name structures from its training data, not from a fixed list or regex rules.
← Previous
Complete & Continue →
Exercise: Build a Contact List
← Previous
Complete & Continue →
Extracting from Business Documents
← Previous
Complete & Continue →
Exercise: Export Contact List
← Previous
Complete & Continue →
Visualizing Entities with displaCy
← Previous
Complete & Continue →
Zero-Shot Custom Entity Extraction
GLiNER solves spaCy’s limitation of fixed entity types. Instead of being locked into categories like ORG or GPE, GLiNER lets you define custom types using natural language descriptions.
pip install gliner
GLiNER offers several pretrained models. We’ll use gliner_small-v2.1 with threshold=0.3 to capture entities with at least 30% confidence:
Python
Run
from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")
test_text = "Apple Inc. CEO Tim Cook announced quarterly revenue of $81.4 billion."
custom_types = ["Company", "Person", "Currency"]
entities = model.predict_entities(test_text, custom_types, threshold=0.3)
for entity in entities:
print(f"'{entity['text']}' -> {entity['label']} (confidence: {entity['score']:.3f})")
Output
💡 What the output shows
GLiNER recognized custom entity types without any training
Confidence scores vary: “Tim Cook” (0.563) scores highest as names are distinctive, while “$81.4 billion” (0.310) scores lower because “Currency” is a less common label
📝 Other model options
For higher accuracy, try gliner_medium-v2.1. For multilingual support, use gliner_multi-v2.1.
How GLiNER Works
Instead of tagging individual tokens, GLiNER scores entire spans against every label you provide. The highest-scoring label wins, and spans below your threshold are filtered out:
┌──────────────┬───────────┬──────────────────┐
│ Span │ Label │ Confidence │
├──────────────┼───────────┼──────────────────┤
│ Apple Inc │ Company │ ████░░░░░░░ 0.36 │ ✓ above 0.3
│ Apple Inc │ Person │ █░░░░░░░░░░ 0.05 │ ✗
├──────────────┼───────────┼──────────────────┤
│ Tim Cook │ Company │ █░░░░░░░░░░ 0.04 │ ✗
│ Tim Cook │ Person │ ██████░░░░░ 0.56 │ ✓ above 0.3
├──────────────┼───────────┼──────────────────┤
│ $81.4 billion│ Company │ ░░░░░░░░░░░ 0.01 │ ✗
│ $81.4 billion│ Currency │ ███░░░░░░░░ 0.31 │ ✓ above 0.3
└──────────────┴───────────┴──────────────────┘
threshold = 0.3 ▲
This gives you two controls spaCy doesn’t: custom labels (any text, not a fixed set) and a confidence threshold to filter results.
Quiz
How does GLiNER decide which label to assign to a text span?
A
It picks the first label in your list that partially matches
B
It scores the span against every label and picks the highest
C
It uses a dictionary lookup to map known words to labels
⚠ Try Again
Not quite. The order of labels in your list doesn’t affect the result. GLiNER evaluates all labels equally for each span.
💡 Correct
Correct! As shown in the diagram, each span is scored against all labels. “Apple Inc” scored 0.36 for Company, 0.05 for Person, and 0.02 for Currency. The highest score (Company) wins.
⚠ Try Again
Not quite. GLiNER doesn’t use a fixed dictionary. It uses a BERT-like encoder to compare text spans against label descriptions semantically.
← Previous
Complete & Continue →
Extracting Business Entities
← Previous
Complete & Continue →
Exercise: Parse Business Metrics
← Previous
Complete & Continue →
Using Confidence Scores for Quality Control
← Previous
Complete & Continue →
Exercise: Route Low-Confidence to Review
← Previous
Complete & Continue →
AI-Powered Extraction with Source Grounding
langextract uses large language models (Gemini, GPT) to understand entity relationships and provide source attribution.
It captures semantic context like “AI startup WaveOne” (category + name) and “between $89 billion and $93 billion” (revenue ranges) as complete phrases rather than separate pieces.
Let’s install langextract along with its dependencies to try it out:
pip install langextract python-dotenv google-genai
To authenticate, add your API key to a .env file. This course uses Gemini (get a key from AI Studio), but OpenAI models also work:
# .env file
LANGEXTRACT_API_KEY=your-api-key-here
langextract uses an LLM to extract entities. You provide examples that teach the model what to look for and how to format the output:
Example (you provide):
┌─────────────────────────────────────────────────────┐
│ Text: "Microsoft Corp. CEO Satya Nadella reported │
│ Q2 2024 revenue of $65B" │
│ │
│ Extractions: │
│ company → "Microsoft Corp." │
│ executive → "CEO Satya Nadella" ← role + name │
│ quarter → "Q2 2024" │
│ financial → "$65B" │
└──────────────────────┬──────────────────────────────┘
│ teaches format
▼
New text: "Apple Inc… CEO Tim Cook… $81.4 billion"
│
▼
Output (model generates):
┌─────────────────────────────────────────────────────┐
│ company → "Apple Inc." │
│ executive → "CEO Tim Cook" ← same format │
│ executive → "CFO Luca Maestri" ← generalized │
│ financial → "undisclosed amount" ← semantic │
└─────────────────────────────────────────────────────┘
The LLM generalizes from your examples. One example showing “CEO Satya Nadella” is enough for it to also extract “CFO Luca Maestri” and understand “undisclosed amount” as a financial figure, something spaCy and GLiNER would miss.
Few-Shot Learning with Examples
To use langextract, provide two components:
Prompt: A description listing entity types to extract (companies, executives, financial figures)
Examples: Sample text paired with labeled extractions showing expected output
Python
Run
import os
from dotenv import load_dotenv
import langextract as lx
from langextract import extract
load_dotenv()
def extract_financial_entities(text):
"""Extract entities using langextract."""
prompt_description = """Extract business entities: companies, executives,
financial figures, quarters, locations, products, startups,
regulatory bodies, stock_symbols, market_reaction."""
examples = [
lx.data.ExampleData(
text="Microsoft Corp. (NYSE: MSFT) CEO Satya Nadella reported Q2 2024 revenue of $65B, down 5% quarter-over-quarter.",
extractions=[
lx.data.Extraction(extraction_class="company", extraction_text="Microsoft Corp."),
lx.data.Extraction(extraction_class="executive", extraction_text="CEO Satya Nadella"),
lx.data.Extraction(extraction_class="stock_symbol", extraction_text="NYSE: MSFT"),
lx.data.Extraction(extraction_class="quarter", extraction_text="Q2 2024"),
lx.data.Extraction(extraction_class="financial_figure", extraction_text="$65B"),
lx.data.Extraction(extraction_class="market_reaction", extraction_text="down 5% quarter-over-quarter"),
]
)
]
return extract(
text_or_documents=text,
prompt_description=prompt_description,
examples=examples,
model_id="gemini-2.5-flash"
)
Output
Now extract entities from the earnings report:
Python
Run
from collections import defaultdict
earning_report = """
Apple Inc. (NASDAQ: AAPL) reported third quarter revenue of $81.4 billion,
up 2% year over year. CEO Tim Cook stated that Services revenue reached
a new all-time high of $21.2 billion. The company's board of directors
declared a cash dividend of $0.24 per share.
CFO Luca Maestri mentioned that iPhone revenue was $39.3 billion for
the quarter ending June 30, 2023. The company expects total revenue
between $89 billion and $93 billion for the fourth quarter.
Apple's Cupertino headquarters announced the acquisition of AI startup
WaveOne for an undisclosed amount. The deal is expected to close in
Q4 2023, pending regulatory approval from the SEC.
"""
result = extract_financial_entities(earning_report)
non_empty = [e for e in result.extractions if e.extraction_text]
print(f"Extracted {len(non_empty)} entities:")
grouped = defaultdict(list)
for extraction in result.extractions:
if extraction.extraction_text: # Filter empty extractions
grouped[extraction.extraction_class].append(extraction.extraction_text)
for entity_class, texts in grouped.items():
print(f"\n{entity_class.upper()} ({len(texts)} found):")
for text in texts:
print(f" '{text}'")
Output
💡 What the output shows
Role-linked executives (“CEO Tim Cook”) instead of just the name
Semantic understanding of “undisclosed amount” as a financial figure
Market reaction “up 2% year over year” captured with full context
Quiz
The example extracts “CEO Satya Nadella” as an executive. How does this affect the model’s output?
A
The model will only extract executives from Microsoft
B
The model learns to include the role (CEO/CFO) with the name
C
The model copies the exact format and ignores other patterns
⚠ Try Again
Not quite. The example teaches a pattern, not a specific company. The model applied the same pattern to extract “CEO Tim Cook” and “CFO Luca Maestri” from Apple’s report.
💡 Correct
Correct! The few-shot example teaches the model what format to use. Since the example linked the role to the name, the model did the same for “CEO Tim Cook” and “CFO Luca Maestri.”
⚠ Try Again
Not quite. The model generalizes from the example. It extracted “CFO Luca Maestri” even though the example only showed a CEO pattern.
langextract extracted “undisclosed amount” as a financial figure. Why would spaCy and GLiNER likely miss this?
A
“undisclosed amount” is too long for token-based models
B
It contains no numbers or currency symbols, which pattern-based models rely on to identify financial entities
C
spaCy and GLiNER can’t process sentences about acquisitions
⚠ Try Again
Not quite. Both spaCy and GLiNER handle multi-token spans. “Cupertino headquarters” was captured as a two-word span by GLiNER.
💡 Correct
Correct! spaCy’s MONEY type and GLiNER’s “Monetary Value” label both depend on numeric patterns. langextract’s LLM understands that “undisclosed amount” refers to money semantically, even without numbers.
⚠ Try Again
Not quite. Both tools can process any text. The issue is that “undisclosed amount” lacks the numeric patterns these models use to identify financial entities.
← Previous
Complete & Continue →
Exercise: Analyze Customer Feedback
← Previous
Complete & Continue →
Visualizing Extractions
← Previous
Complete & Continue →
When to Use Each Tool
← Previous
Complete Course
Entity Extraction with spaCy and LLMs Read More »

