Understanding how attention mechanisms work in transformer models can be challenging due to the complex interactions between multiple attention heads across different layers.
BertViz is a tool that allows you to interactively visualize and explore attention patterns through multiple views.
Installing BertViz
To use BertViz, you can install it using pip:
!pip install bertviz
Loading a Pre-Trained Model and Tokenizer
First, we load a pre-trained model and tokenizer using the transformers
library:
from transformers import AutoTokenizer, AutoModel, utils
utils.logging.set_verbosity_error() # Suppress standard warnings
model_name = "microsoft/xtremedistil-l12-h384-uncased"
input_text = "The cat sat on the mat"
model = AutoModel.from_pretrained(model_name, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Tokenizing Input Text and Running the Model
Next, we tokenize the input text and run the model:
inputs = tokenizer.encode(input_text, return_tensors='pt')
outputs = model(inputs)
attention = outputs[-1]
tokens = tokenizer.convert_ids_to_tokens(inputs[0])
Visualizing Attention with BertViz
We can now use BertViz to visualize the attention patterns in the model. Here, we display the model view:
from bertviz import model_view, head_view
model_view(attention, tokens)
This will display an interactive visualization of the attention patterns in the model.

Displaying Head View
We can also display the head view:
head_view(attention, tokens)

This will display an interactive visualization of the attention patterns in a specific head.