SkillNER: Automating Skill Extraction in Python

September 12, 2024

SkillNER: Automating Skill Extraction in Python

Khuyen Tran

What is SkillNER?

SkillNER is an open-source Python library built on top of spaCy that automates the extraction of skills from unstructured text. It uses a curated database of over 60,000 skills combined with spaCy’s PhraseMatcher to identify both hard skills (like Python, SQL, TensorFlow) and soft skills (like communication, leadership) in documents. Unlike generic named entity recognition tools, SkillNER is purpose-built for skill taxonomy, making it especially valuable for HR tech, recruiting platforms, and workforce analytics applications where accurate skill tagging is critical.

Extracting skills from job postings, resumes, or other unstructured text can be time-consuming if done manually. SkillNER automates this process, making it faster and more efficient.

This tool can be useful for:

Recruiters to automate skill extraction for faster candidate screening.
Data scientists to extract structured data from unstructured job-related text.

Here’s a quick example:

import spacy
from spacy.matcher import PhraseMatcher
from skillNer.general_params import SKILL_DB
from skillNer.skill_extractor_class import SkillExtractor

# Load the spaCy model
nlp = spacy.load("en_core_web_lg")

# Initialize the SkillExtractor
skill_extractor = SkillExtractor(nlp, SKILL_DB, PhraseMatcher)

# Sample job description
job_description = """
You are a data scientist with strong expertise in Python. You have solid experience in 
data analysis and visualization, and can manage end-to-end data science projects. 
You quickly adapt to new tools and technologies, and are fluent in both English and SQL.
"""

# Extract skills from the job description
annotations = skill_extractor.annotate(job_description)
annotations

Output:

{'text': 'you are a data scientist with strong expertise in python you have solid experience in data analysis and visualization and can manage end to end data science projects you quickly adapt to new tools and technologies and are fluent in both english and sql',
 'results': {'full_matches': [{'skill_id': 'KS120GV6C72JMSZKMTD7',
    'doc_node_value': 'data analysis',
    'score': 1,
    'doc_node_id': [15, 16]}],
  'ngram_scored': [{'skill_id': 'KS125LS6N7WP4S6SFTCK',
    'doc_node_id': [9],
    'doc_node_value': 'python',
    'type': 'fullUni',
    'score': 1,
    'len': 1},
   {'skill_id': 'KS1282T6STD9RJZ677XL',
    'doc_node_id': [18],
    'doc_node_value': 'visualization',
    'type': 'fullUni',
    'score': 1,
    'len': 1},
   {'skill_id': 'KS1218W78FGVPVP2KXPX',
    'doc_node_id': [21],
    'doc_node_value': 'manage',
    'type': 'lowSurf',
    'score': 0.63417345,
    'len': 1},
   {'skill_id': 'KS7LO8P3MXB93R3C9RWL',
    'doc_node_id': [25, 26],
    'doc_node_value': 'data science',
    'type': 'lowSurf',
    'score': 2,
    'len': 2},
   {'skill_id': 'KS120626HMWCXJWJC7VK',
    'doc_node_id': [30],
    'doc_node_value': 'adapt',
    'type': 'lowSurf',
    'score': 0.503605,
    'len': 1},
   {'skill_id': 'KS123K75YYK8VGH90NCS',
    'doc_node_id': [41],
    'doc_node_value': 'english',
    'type': 'lowSurf',
    'score': 1,
    'len': 1},
   {'skill_id': 'KS440W865GC4VRBW6LJP',
    'doc_node_id': [43],
    'doc_node_value': 'sql',
    'type': 'fullUni',
    'score': 1,
    'len': 1}]}}

skill_extractor.describe(annotations)

Link to SkillNer.

Why It Beats Manual Parsing

Before tools like SkillNER, extracting skills from resumes meant writing brittle regex patterns or maintaining hand-curated keyword lists that fell apart the moment a candidate wrote “ML” instead of “machine learning.” SkillNER handles these variations automatically through its built-in skill database and fuzzy matching, reducing the engineering cost of building skill-aware applications from weeks to minutes. For teams processing hundreds of resumes daily, this is the difference between a functional recruiting pipeline and a bottleneck.

Conclusion

SkillNER turns skill extraction from a manual, error-prone task into a reliable automated step in your NLP pipeline. It’s a solid choice when you need fast, accurate skill tagging without training a custom NER model from scratch.

SkillNER: Automating Skill Extraction in Python

SkillNER: Automating Skill Extraction in Python

Khuyen Tran

What is SkillNER?

Why It Beats Manual Parsing

Conclusion

Read Next

Related Posts

Leave a Comment Cancel Reply

Get in touch

Join the Newsletter

Follow Us on Social Media

SkillNER: Automating Skill Extraction in Python

SkillNER: Automating Skill Extraction in Python

Khuyen Tran

What is SkillNER?

Why It Beats Manual Parsing

Conclusion

Read Next

Related Posts

Leave a Comment Cancel Reply

Work with Khuyen Tran

Work with Khuyen Tran