Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

smart_open: The Python Library That Makes Cloud Storage Feel Local

Table of Contents

smart_open: The Python Library That Makes Cloud Storage Feel Local

Working with large remote files in cloud storage services like Amazon S3 can be challenging, often requiring complex code and careful object management.

import boto3

# Initialize S3 client
s3_client = boto3.client('s3')

with open('example_file.txt', 'w') as local_file:
    local_file.write("Hello, world!")

s3_client.upload_file('example_file.txt', 'khuyen-bucket', 'remote_file.txt')
s3_client.download_file('khuyen-bucket', 'remote_file.txt', 'example_file2.txt')

with open('example_file2.txt', 'r') as local_file:
    content = local_file.read()
    print(content)

# Hello, world!

smart_open addresses these issues by providing a single open() function that works across different storage systems and file formats.

from smart_open import open

with open('s3://khuyen-bucket/example_file.txt', 'w') as s3_file:
    s3_file.write("Hello, world!")


with open('s3://khuyen-bucket/example_file.txt', 'r') as s3_file:
    print(s3_file.read())

# Hello, world!

Notice how similar this is to working with local files. The open() function from smart_open handles all the complexities of S3 operations behind the scenes.

Another great feature of smart_open is its ability to handle compressed files transparently. Let’s say we have a gzipped file that we want to upload to S3 and then read from:

# Uploading a gzipped file
with open('example_file.txt.gz', 'r') as local_file:
    with open('s3://my-bucket/example_file.txt.gz', 'w') as s3_file:
        s3_file.write(local_file.read())

# Reading a gzipped file from S3
with open('s3://my-bucket/example_file.txt.gz', 'r') as s3_file:
    content = s3_file.read()
    print(content)

# Hello, world!

smart_open automatically handles the decompression when reading the gzipped file, making it seamless to work with compressed files in S3.

Other examples of URLs that smart_open accepts:

s3://my_bucket/my_key
s3://my_key:my_secret@my_bucket/my_key
s3://my_key:my_secret@my_server:my_port@my_bucket/my_key
gs://my_bucket/my_blob
azure://my_bucket/my_blob
hdfs:///path/file
hdfs://path/file
webhdfs://host:port/path/file
./local/path/file
~/local/path/file
local/path/file
./local/path/file.gz
file:///home/user/file
file:///home/user/file.bz2
[ssh|scp|sftp]://username@host//path/file
[ssh|scp|sftp]://username@host/path/file
[ssh|scp|sftp]://username:password@host/path/file

If you frequently work with remote files or cloud storage services, smart_open is definitely a tool worth adding to your Python toolkit. It can save you time, reduce complexity, and make your code more maintainable.

Link to smart_open.

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran