Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Faker: Generate Realistic Test Data in Python with One Line of Code

Faker: Generate Realistic Test Data in Python with One Line of Code

Table of Contents

Motivation

Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?

What if there is a package that enables you to create fake data in one line of code such as this:

fake.profile()
{
    'address': '076 Steven Trace\nJillville, ND 12393',
    'birthdate': datetime.date(1981, 11, 19),
    'blood_group': 'O-',
    'company': 'Johnson-Rodriguez',
    'current_location': (Decimal('61.969848'), Decimal('121.407164')),
    'job': 'Patent examiner',
    'mail': 'ohicks@hotmail.com',
    'name': 'Katie Romero',
    'residence': '271 Smith Wells\nMichaelport, MN 40933',
    'sex': 'F',
    'ssn': '281-84-3963',
    'username': 'eparker',
    'website': ['https://www.gonzalez.com/', 'https://rogers-scott.com/']
}

This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

Basics of Faker

Start with installing the package:

pip install Faker

Import Faker:

from faker import Faker

fake = Faker()

Some basic methods of Faker:

print(fake.color_name())
print(fake.name())
print(fake.address())
print(fake.job())
print(fake.date_of_birth(minimum_age=30))
print(fake.city())
Tan
Kristin Buck
715 Peter Views
Abigailport, ME 57602
Systems analyst
1946-03-07
Evanmouth
```text
Let's say you are an author of a fiction book who want to create a character but find it difficult and time-consuming to come up with a realistic name and information. You can write:

```python
name = fake.name()
color = fake.color_name()
city = fake.city()
job = fake.job()

print(f'Her name is {name}. She lives in {city}. Her favorite color is {color}. She works as a {job}')
Her name is Debra Armstrong. She lives in Beanview. Her favorite color is GreenYellow. She works as a Lawyer

With Faker, you can generate a persuasive example instantly!

Location-Specific Data Generation

Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker:

fake = Faker('it_IT')

for _ in range(10):
    print(fake.name())
Angelica Donarelli-Marangoni
Rosaria Castiglione
Federica Iacovelli
Puccio Armellini
Dina Donini-Alboni
Dott. Carolina Marrone
Olga Nosiglia
Graziella Russo
Paulina Galiazzo
Dott. Riccardo Padovano

Or create information from multiple locations:

fake = Faker(['ja_JP','zh_CN','es_ES','en_US','fr_FR'])

for _ in range(10):
    print(fake.city())
齐齐哈尔市
Blakefort
North Joeborough
玉兰市
Saint Suzanne-les-Bains
Melilla
調布市
富津市
Maillot-sur-Mer
East Jamesshire

If you are from these specific countries, I hope you recognize the location. In case you are curious about other locations that you can specify, check out the doc here.

Create Text

Create Random Text

We can create random text with:

fake = Faker('en_US')
print(fake.text())
Gas threat perhaps minute energy thus. Relate group science car discussion budget art.
Let visit reach senior. Story once list almost. Enough major everyone.

Try with the Vietnamese language:

fake = Faker('vi_VN')
print(fake.text())
Như không cho số vậy tại đến. Hơn các thay. Khi từ cũng không rất là.
Gần được cho có nơi như vẫn cho. Nơi đi về giống.
Mà cũng từ nhưng lớn. Từng của nếu khi như nhưng.

None of these random text makes sense, but it is a good way to quickly create text for testing.

Create Text from Selected Words

Or we can also create text from a list of words:

fake = Faker()
my_information = ['dog','swimming', '21', 'slow', 'girl', 'coffee', 'flower','pink']

print(fake.sentence(ext_word_list=my_information))
print(fake.sentence(ext_word_list=my_information))
Coffee pink coffee.
Dog pink 21 pink.
```text
## Create Profile Data {#create-profile-data}

We can quickly create a profile with:

```python
fake = Faker()
fake.profile()
{'job': 'Nurse, adult',
 'company': 'Johnson, Moore and Glover',
 'ssn': '762-56-8929',
 'residence': '742 Shane Groves\nLake Jasminefort, GU 12583',
 'current_location': (Decimal('-77.3842165'), Decimal('7.407430')),
 'blood_group': 'B-',
 'website': ['https://brooks.com/'],
 'username': 'brownamanda',
 'name': 'Carolyn Navarro',
 'sex': 'F',
 'address': '505 Lewis Grove Apt. 588\nHowardville, ID 68181',
 'mail': 'larry00@hotmail.com',
 'birthdate': datetime.date(1946, 6, 13)}

As we can see, most relevant information about a person is created with ease, even with mail, ssn, username, and website.

What is even more useful is that we can create a dataframe of 100 users from different countries:

import pandas as pd

fake = Faker(['it_IT','ja_JP', 'zh_CN', 'de_DE','en_US'])
profiles = [fake.profile() for i in range(100)]

pd.DataFrame(profiles).head()
job company ssn residence current_location blood_group website username name sex address mail birthdate
0 Physiological scientist Sobrero-Mazzanti Group CLGTNO59H42A473Z Incrocio Cabrini, 14 Appartamento 59\n74100, L… (-88.2637715, 149.968584) AB+ [http://federici-endrizzi.it/, http://www.paru…] giuliagreco Dott. Liliana Serraglio F Vicolo Milo, 0\n64020, Ripattoni (TE) giolittiflavio@gmail.com 1998-10-10
1 花火師 阿部運輸株式会社 701-41-9799 和歌山県印旛郡本埜村鳥越20丁目23番18号 (79.245074, 109.117174) O+ [https://suzuki.com/, http://ishikawa.jp/] lyamamoto 斉藤 明美 F 東京都江戸川区神明内40丁目12番20号 akemiyamada@yahoo.com 1916-12-09
2 小説家 小林食品株式会社 103-28-5057 島根県富津市細野7丁目16番1号 (-84.3304275, 38.093874) A+ [https://tanaka.jp/, http://www.fujita.net/, h…] minoru62 渡辺 英樹 M 青森県川崎市川崎区長畑22丁目27番12号 minoru35@yahoo.com 2008-02-17
3 ゲームクリエイター 佐藤水産有限会社 123-85-7967 宮城県調布市隼町3丁目22番12号 アーバン台東327 (-49.3689775, -134.762867) AB- [http://www.sato.org/, http://kato.net/, http:…] ayamamoto 鈴木 洋介 M 栃木県川崎市中原区虎ノ門30丁目27番20号 yuta56@hotmail.com 1917-01-25
4 薬剤師 合同会社高橋建設 891-98-2169 山梨県山武郡横芝光町轟4丁目22番10号 コート天神島159 (-62.1493985, -105.171377) B+ [http://yamashita.jp/, http://www.shimizu.com/] yosukekimura 田中 真綾 F 山口県府中市下吉羽6丁目20番2号 hayashiyuki@yahoo.com 2001-08-09

Create Random Python Datatypes

If we just care about the type of your data, without caring so much about the information, we can easily generate random datatypes such as:

Boolean:

print(fake.pybool())
False

A list of 5 elements with different data_type:

print(fake.pylist(nb_elements=5, variable_nb_elements=True))
['juan28@example.org', 8515, 6618, 'UexWQJkGrJFGBAVfHgUt']

A decimal with 5 left digits and 6 right digits (after the .):

print(fake.pydecimal(left_digits=5, right_digits=6, positive=False, min_value=None, max_value=None))
-26114.564612

You can find more about other Python datatypes that you can create here.

Conclusion

I hope you find Faker a helpful tool to create data efficiently. You may find this tool useful for what you are working on or may not at the moment. But it is helpful to know that there exists a tool that enables you to generate data with ease for your specific needs such as testing.

Feel free to check out more information about Faker here.

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran