I'm going to show you why and how to use pythons 'dataclass' instead of dictionaries or custom classes for structuring data.
As you're progressing through your programming journey, it's important to find ways to make your code more easily readable, understandable and extendable. This ultimately makes you more productive and let's you focus on the actual quality of the software you're building.
Using pythons 'dataclass' helps you to accomplish these things in the context of working with data in Python programs.
It's easy to just stick to working with dicts or custom classes. But this is a mistake, as the former is not declarative enough and the latter requires too much boilerplate code to write.
I hope it's clear that you're missing out if you don't how to use dataclasses in Python :P
Read on to learn the absolute basics of using 'dataclass':
For this little tutorial, let's assume we're pulling some raw data out of a json file.
To integrate the data into our imaginary application, we need to:
# reading our raw data from a json file
from pprint import pprint
import json
with open("./raw_data.json", "r") as file:
raw_data = json.loads( file.read() )
pprint(raw_data)
# [{'category': 'Electronics',
# 'description': 'A smooth and precise wireless mouse with ergonomic design.',
# 'id': 1,
# 'name': 'Wireless Mouse',
# 'price': 25.99,
# 'quantity_in_stock': '150'},
# {'category': 'Accessories',
# 'description': 'Noise-cancelling headphones with long battery life.',
# 'id': 2,
# 'name': 'Bluetooth Headphones',
# 'price': 89.99,
# 'quantity_in_stock': '75'},
# {'category': 'Accessories',
# 'description': 'Adjustable stand for smartphones and tablets.',
# 'id': 3,
# 'name': 'Smartphone Stand',
# 'price': 12.99,
# 'quantity_in_stock': '200'}]
Before diving into how I'd use a dataclass, let's first look at common ways people do things, if they don't know dataclass.
Bad example 1:
# clean and map raw_data
from datetime import datetime
now: datetime = datetime.now()
data: list[dict] = [
{
"id": d['id'],
"amount": d['price'],
"amount_with_tax": round( d['price'] * 1.19, 2 ),
"in_stock": int( d['quantity_in_stock'] ),
"product_name": d['name'],
"product_description": d['description'],
"category": d['category'],
"import_date": now,
}
for d in raw_data
]
pprint(data)
# [{'amount': 25.99,
# 'amount_with_tax': 30.93,
# 'category': 'Electronics',
# 'id': 1,
# 'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
# 'in_stock': 150,
# 'product_description': 'A smooth and precise wireless mouse with ergonomic '
# 'design.',
# 'product_name': 'Wireless Mouse'},
# {'amount': 89.99,
# 'amount_with_tax': 107.09,
# 'category': 'Accessories',
# 'id': 2,
# 'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
# 'in_stock': 75,
# 'product_description': 'Noise-cancelling headphones with long battery life.',
# 'product_name': 'Bluetooth Headphones'},
# {'amount': 12.99,
# 'amount_with_tax': 15.46,
# 'category': 'Accessories',
# 'id': 3,
# 'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
# 'in_stock': 200,
# 'product_description': 'Adjustable stand for smartphones and tablets.',
# 'product_name': 'Smartphone Stand'}]
Here we cleaned and mapped our raw_data
into a structure that our application expects.
At least we can do something with our data now.
Problems:
Bad example 2:
Let's look at another example using a custom class.
class Data:
id: int
amount: float
amount_with_tax: float
in_stock: int
product_name: str
product_description: str
category: str
import_date: datetime
def __init__(self, id: int, amount: float, amount_with_tax: float, in_stock: int, product_name: str, product_description: str, category: str):
self.id = id
self.amount = amount
self.amount_with_tax = amount_with_tax
self.in_stock = in_stock
self.product_name = product_name
self.product_description = product_description
self.category = category
self.import_date = now
data: list[Data] = [
Data(
id = d['id'],
amount = d['price'],
amount_with_tax = round( d['price'] * 1.19, 2 ),
in_stock = int( d['quantity_in_stock'] ),
product_name = d['name'],
product_description = d['description'],
category = d['category']
)
for d in raw_data
]
pprint(data)
# [<__main__.Data object at 0x77b4099abdc0>,
# <__main__.Data object at 0x77b408c21ff0>,
# <__main__.Data object at 0x77b408c22e00>]
Better and more declarative.
At least this part of the class looks good and serves as a clear model for our target data:
class Data:
id: int
amount: float
amount_with_tax: float
in_stock: int
product_name: str
product_description: str
category: str
import_date: datetime
Problems:
__init__
method.__init__
.__repr__
method: def __repr__(self):
return (f"Data(id={self.id}, amount={self.amount}, amount_with_tax={self.amount_with_tax}, "
f"in_stock={self.in_stock}, product_name='{self.product_name}', "
f"product_description='{self.product_description}', category='{self.category}', "
f"import_date={self.import_date})")
Now let's look at how I'd do things using a dataclass
.
from dataclasses import dataclass, field
@dataclass
class Data:
id: int
amount: float
in_stock: int
product_name: str
product_description: str
category: str
amount_with_tax: float = field(init=False)
import_date: datetime = field(init=False)
def __post_init__(self):
self.amount_with_tax = round( self.amount * 1.19, 2)
self.import_date = now
data: list[Data] = [
Data(
id = d['id'],
amount = d['price'],
in_stock = int( d['quantity_in_stock'] ),
product_name = d['name'],
product_description = d['description'],
category = d['category'],
)
for d in raw_data
]
pprint(data)
# [Data(id=1,
# amount=25.99,
# in_stock=150,
# product_name='Wireless Mouse',
# product_description='A smooth and precise wireless mouse with ergonomic '
# 'design.',
# category='Electronics',
# amount_with_tax=30.93,
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
# Data(id=2,
# amount=89.99,
# in_stock=75,
# product_name='Bluetooth Headphones',
# product_description='Noise-cancelling headphones with long battery life.',
# category='Accessories',
# amount_with_tax=107.09,
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
# Data(id=3,
# amount=12.99,
# in_stock=200,
# product_name='Smartphone Stand',
# product_description='Adjustable stand for smartphones and tablets.',
# category='Accessories',
# amount_with_tax=15.46,
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859))]
Ok this was easy. All I did was to attach the @dataclass
decorator to my class.
__init__
and __repr__
are handled automatically.
It's declarative and I don't have to write a lot of boilerplate.
I use the __post_init__
method to calculate fields. Those fields can be declared as optional by using the field() function from the dataclasses module.
Let's improve things even further.
from dataclasses import field
@dataclass
class Data:
id: int
amount: float
in_stock: int
product_name: str
product_description: str
category: str
import_date: datetime = field(init=False)
def __post_init__(self):
self.import_date = now
@property
def amount_with_tax(self) -> float:
return round( self.amount * 1.19, 2)
@classmethod
def map_from_raw_data(cls, data: dict) -> 'Data':
return cls(
id = data['id'],
amount = data['price'],
in_stock = int(data['quantity_in_stock']),
product_name = data['name'],
product_description = data['description'],
category = data['category'],
)
data: list[Data] = [ Data.map_from_raw_data(d) for d in raw_data ]
pprint(data)
# [Data(id=1,
# amount=25.99,
# in_stock=150,
# product_name='Wireless Mouse',
# product_description='A smooth and precise wireless mouse with ergonomic '
# 'design.',
# category='Electronics',
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
# Data(id=2,
# amount=89.99,
# in_stock=75,
# product_name='Bluetooth Headphones',
# product_description='Noise-cancelling headphones with long battery life.',
# category='Accessories',
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
# Data(id=3,
# amount=12.99,
# in_stock=200,
# product_name='Smartphone Stand',
# product_description='Adjustable stand for smartphones and tablets.',
# category='Accessories',
# import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859))]
Let's break down what I did here.
I didn't like having amount_with_tax
stored in the data structure. Instead I declared it as a @property and made it a field that will be calculated at runtime. I can simply call it with (e.g.) data[0].amount_with_tax
. The nice thing is, the output is not stored in memory.
I added a @classmethod
that serves as a mapper. Inside the function I can clean and map the data.
The beautiful thing about this setup is that everything related to my Data
class is now declared inside the class itself.
It also doesn't require much boilerplate.
And since everything is in one class, it becomes easy to identify what's going on in the app and how to make changes.
Another reason I like dataclass is that it's part of the standard library in Python. There's no need to install an external module to work with it.
However, dataclass has its limits. If you need a full fledged library that actually checks your types and more, you may want to have a look at "Pydantic".
Resources: