Table of Contents
- Motivation
- Basics of Faker
- Location-Specific Data Generation
- Create Text
- Create Profile Data
- Create Random Python Datatypes
- Conclusion
Motivation
Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?
What if there is a package that enables you to create fake data in one line of code such as this:
fake.profile()
{
'address': '076 Steven Trace\nJillville, ND 12393',
'birthdate': datetime.date(1981, 11, 19),
'blood_group': 'O-',
'company': 'Johnson-Rodriguez',
'current_location': (Decimal('61.969848'), Decimal('121.407164')),
'job': 'Patent examiner',
'mail': 'ohicks@hotmail.com',
'name': 'Katie Romero',
'residence': '271 Smith Wells\nMichaelport, MN 40933',
'sex': 'F',
'ssn': '281-84-3963',
'username': 'eparker',
'website': ['https://www.gonzalez.com/', 'https://rogers-scott.com/']
}
This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Basics of Faker
Start with installing the package:
pip install Faker
Import Faker:
from faker import Faker
fake = Faker()
Some basic methods of Faker:
print(fake.color_name())
print(fake.name())
print(fake.address())
print(fake.job())
print(fake.date_of_birth(minimum_age=30))
print(fake.city())
Tan
Kristin Buck
715 Peter Views
Abigailport, ME 57602
Systems analyst
1946-03-07
Evanmouth
```text
Let's say you are an author of a fiction book who want to create a character but find it difficult and time-consuming to come up with a realistic name and information. You can write:
```python
name = fake.name()
color = fake.color_name()
city = fake.city()
job = fake.job()
print(f'Her name is {name}. She lives in {city}. Her favorite color is {color}. She works as a {job}')
Her name is Debra Armstrong. She lives in Beanview. Her favorite color is GreenYellow. She works as a Lawyer
With Faker, you can generate a persuasive example instantly!
Location-Specific Data Generation
Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location
parameter in the class Faker
:
fake = Faker('it_IT')
for _ in range(10):
print(fake.name())
Angelica Donarelli-Marangoni
Rosaria Castiglione
Federica Iacovelli
Puccio Armellini
Dina Donini-Alboni
Dott. Carolina Marrone
Olga Nosiglia
Graziella Russo
Paulina Galiazzo
Dott. Riccardo Padovano
Or create information from multiple locations:
fake = Faker(['ja_JP','zh_CN','es_ES','en_US','fr_FR'])
for _ in range(10):
print(fake.city())
齐齐哈尔市
Blakefort
North Joeborough
玉兰市
Saint Suzanne-les-Bains
Melilla
調布市
富津市
Maillot-sur-Mer
East Jamesshire
If you are from these specific countries, I hope you recognize the location. In case you are curious about other locations that you can specify, check out the doc here.
Create Text
Create Random Text
We can create random text with:
fake = Faker('en_US')
print(fake.text())
Gas threat perhaps minute energy thus. Relate group science car discussion budget art.
Let visit reach senior. Story once list almost. Enough major everyone.
Try with the Vietnamese language:
fake = Faker('vi_VN')
print(fake.text())
Như không cho số vậy tại đến. Hơn các thay. Khi từ cũng không rất là.
Gần được cho có nơi như vẫn cho. Nơi đi về giống.
Mà cũng từ nhưng lớn. Từng của nếu khi như nhưng.
None of these random text makes sense, but it is a good way to quickly create text for testing.
Create Text from Selected Words
Or we can also create text from a list of words:
fake = Faker()
my_information = ['dog','swimming', '21', 'slow', 'girl', 'coffee', 'flower','pink']
print(fake.sentence(ext_word_list=my_information))
print(fake.sentence(ext_word_list=my_information))
Coffee pink coffee.
Dog pink 21 pink.
```text
## Create Profile Data {#create-profile-data}
We can quickly create a profile with:
```python
fake = Faker()
fake.profile()
{'job': 'Nurse, adult',
'company': 'Johnson, Moore and Glover',
'ssn': '762-56-8929',
'residence': '742 Shane Groves\nLake Jasminefort, GU 12583',
'current_location': (Decimal('-77.3842165'), Decimal('7.407430')),
'blood_group': 'B-',
'website': ['https://brooks.com/'],
'username': 'brownamanda',
'name': 'Carolyn Navarro',
'sex': 'F',
'address': '505 Lewis Grove Apt. 588\nHowardville, ID 68181',
'mail': 'larry00@hotmail.com',
'birthdate': datetime.date(1946, 6, 13)}
As we can see, most relevant information about a person is created with ease, even with mail, ssn, username, and website.
What is even more useful is that we can create a dataframe of 100 users from different countries:
import pandas as pd
fake = Faker(['it_IT','ja_JP', 'zh_CN', 'de_DE','en_US'])
profiles = [fake.profile() for i in range(100)]
pd.DataFrame(profiles).head()
job | company | ssn | residence | current_location | blood_group | website | username | name | sex | address | birthdate | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Physiological scientist | Sobrero-Mazzanti Group | CLGTNO59H42A473Z | Incrocio Cabrini, 14 Appartamento 59\n74100, L… | (-88.2637715, 149.968584) | AB+ | [http://federici-endrizzi.it/, http://www.paru…] | giuliagreco | Dott. Liliana Serraglio | F | Vicolo Milo, 0\n64020, Ripattoni (TE) | giolittiflavio@gmail.com | 1998-10-10 |
1 | 花火師 | 阿部運輸株式会社 | 701-41-9799 | 和歌山県印旛郡本埜村鳥越20丁目23番18号 | (79.245074, 109.117174) | O+ | [https://suzuki.com/, http://ishikawa.jp/] | lyamamoto | 斉藤 明美 | F | 東京都江戸川区神明内40丁目12番20号 | akemiyamada@yahoo.com | 1916-12-09 |
2 | 小説家 | 小林食品株式会社 | 103-28-5057 | 島根県富津市細野7丁目16番1号 | (-84.3304275, 38.093874) | A+ | [https://tanaka.jp/, http://www.fujita.net/, h…] | minoru62 | 渡辺 英樹 | M | 青森県川崎市川崎区長畑22丁目27番12号 | minoru35@yahoo.com | 2008-02-17 |
3 | ゲームクリエイター | 佐藤水産有限会社 | 123-85-7967 | 宮城県調布市隼町3丁目22番12号 アーバン台東327 | (-49.3689775, -134.762867) | AB- | [http://www.sato.org/, http://kato.net/, http:…] | ayamamoto | 鈴木 洋介 | M | 栃木県川崎市中原区虎ノ門30丁目27番20号 | yuta56@hotmail.com | 1917-01-25 |
4 | 薬剤師 | 合同会社高橋建設 | 891-98-2169 | 山梨県山武郡横芝光町轟4丁目22番10号 コート天神島159 | (-62.1493985, -105.171377) | B+ | [http://yamashita.jp/, http://www.shimizu.com/] | yosukekimura | 田中 真綾 | F | 山口県府中市下吉羽6丁目20番2号 | hayashiyuki@yahoo.com | 2001-08-09 |
Create Random Python Datatypes
If we just care about the type of your data, without caring so much about the information, we can easily generate random datatypes such as:
Boolean:
print(fake.pybool())
False
A list of 5 elements with different data_type:
print(fake.pylist(nb_elements=5, variable_nb_elements=True))
['juan28@example.org', 8515, 6618, 'UexWQJkGrJFGBAVfHgUt']
A decimal with 5 left digits and 6 right digits (after the .
):
print(fake.pydecimal(left_digits=5, right_digits=6, positive=False, min_value=None, max_value=None))
-26114.564612
You can find more about other Python datatypes that you can create here.
Conclusion
I hope you find Faker a helpful tool to create data efficiently. You may find this tool useful for what you are working on or may not at the moment. But it is helpful to know that there exists a tool that enables you to generate data with ease for your specific needs such as testing.
Feel free to check out more information about Faker here.