2025-05-11

Why You Should Use Python's Dataclasses Over Dicts or Custom Classes

Introduction

I'm going to show you why and how to use pythons 'dataclass' instead of dictionaries or custom classes for structuring data.

As you're progressing through your programming journey, it's important to find ways to make your code more easily readable, understandable and extendable. This ultimately makes you more productive and let's you focus on the actual quality of the software you're building.

Using pythons 'dataclass' helps you to accomplish these things in the context of working with data in Python programs.

It's easy to just stick to working with dicts or custom classes. But this is a mistake, as the former is not declarative enough and the latter requires too much boilerplate code to write.

I hope it's clear that you're missing out if you don't how to use dataclasses in Python :P

What you'll learn

Read on to learn the absolute basics of using 'dataclass':

  • How to declare a dataclass
  • How to use calculated fields to save memory
  • How to work with optional fields
  • How to build mappers for your dataclass directly in the class itself
  • The downsides of dataclass and possible alternatives

Tutorial

For this little tutorial, let's assume we're pulling some raw data out of a json file.

To integrate the data into our imaginary application, we need to:

  • Clean it up
  • Map it to a structure our app expects
# reading our raw data from a json file

from pprint import pprint
import json

with open("./raw_data.json", "r") as file:
    raw_data = json.loads( file.read() )

pprint(raw_data)
# [{'category': 'Electronics',
#     'description': 'A smooth and precise wireless mouse with ergonomic design.',
#     'id': 1,
#     'name': 'Wireless Mouse',
#     'price': 25.99,
#     'quantity_in_stock': '150'},
#     {'category': 'Accessories',
#     'description': 'Noise-cancelling headphones with long battery life.',
#     'id': 2,
#     'name': 'Bluetooth Headphones',
#     'price': 89.99,
#     'quantity_in_stock': '75'},
#     {'category': 'Accessories',
#     'description': 'Adjustable stand for smartphones and tablets.',
#     'id': 3,
#     'name': 'Smartphone Stand',
#     'price': 12.99,
#     'quantity_in_stock': '200'}]

Bad examples

Before diving into how I'd use a dataclass, let's first look at common ways people do things, if they don't know dataclass.

Bad example 1:

# clean and map raw_data

from datetime import datetime
now: datetime = datetime.now()

data: list[dict] = [
    {
        "id":                  d['id'],
        "amount":              d['price'],
        "amount_with_tax":     round( d['price'] * 1.19, 2 ),
        "in_stock":            int( d['quantity_in_stock'] ),
        "product_name":        d['name'],
        "product_description": d['description'],
        "category":            d['category'],
        "import_date":         now,
    }
    for d in raw_data
]

pprint(data)
# [{'amount': 25.99,
#     'amount_with_tax': 30.93,
#     'category': 'Electronics',
#     'id': 1,
#     'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
#     'in_stock': 150,
#     'product_description': 'A smooth and precise wireless mouse with ergonomic '
#                             'design.',
#     'product_name': 'Wireless Mouse'},
#     {'amount': 89.99,
#     'amount_with_tax': 107.09,
#     'category': 'Accessories',
#     'id': 2,
#     'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
#     'in_stock': 75,
#     'product_description': 'Noise-cancelling headphones with long battery life.',
#     'product_name': 'Bluetooth Headphones'},
#     {'amount': 12.99,
#     'amount_with_tax': 15.46,
#     'category': 'Accessories',
#     'id': 3,
#     'import_date': datetime.datetime(2025, 5, 11, 11, 37, 38, 243859),
#     'in_stock': 200,
#     'product_description': 'Adjustable stand for smartphones and tablets.',
#     'product_name': 'Smartphone Stand'}]

Here we cleaned and mapped our raw_data into a structure that our application expects.

At least we can do something with our data now.

Problems:

  • "What our application expects" is unclear. Our target dict can't be declared with types.
  • It's quick and dirty. Ok for prototyping but nothing more.

Bad example 2:

Let's look at another example using a custom class.


class Data:

    id: int
    amount: float
    amount_with_tax: float
    in_stock: int
    product_name: str
    product_description: str
    category: str
    import_date: datetime

    def __init__(self, id: int, amount: float, amount_with_tax: float, in_stock: int, product_name: str, product_description: str, category: str):
        self.id = id
        self.amount = amount
        self.amount_with_tax = amount_with_tax
        self.in_stock = in_stock
        self.product_name = product_name
        self.product_description = product_description
        self.category = category
        self.import_date = now


data: list[Data] = [
    Data(
        id =                  d['id'],
        amount =              d['price'],
        amount_with_tax =     round( d['price'] * 1.19, 2 ),
        in_stock =            int( d['quantity_in_stock'] ),
        product_name =        d['name'],
        product_description = d['description'],
        category =            d['category']
    )
    for d in raw_data
]

pprint(data)
# [<__main__.Data object at 0x77b4099abdc0>,
#  <__main__.Data object at 0x77b408c21ff0>,
#  <__main__.Data object at 0x77b408c22e00>]

Better and more declarative.

At least this part of the class looks good and serves as a clear model for our target data:

class Data:

    id: int
    amount: float
    amount_with_tax: float
    in_stock: int
    product_name: str
    product_description: str
    category: str
    import_date: datetime

Problems:

  • I wrote a lot of boilerplate. Look at the __init__ method.
  • I also declared types twice. In the class properties and also as arguments of __init__.
  • Printing the output to the console doesn't give me the properties and values. For that I'd have to write even more boilerplate and add a __repr__ method:
    def __repr__(self):
        return (f"Data(id={self.id}, amount={self.amount}, amount_with_tax={self.amount_with_tax}, "
                f"in_stock={self.in_stock}, product_name='{self.product_name}', "
                f"product_description='{self.product_description}', category='{self.category}', "
                f"import_date={self.import_date})")

Using dataclass

Now let's look at how I'd do things using a dataclass.

from dataclasses import dataclass, field

@dataclass
class Data:

    id: int
    amount: float
    in_stock: int
    product_name: str
    product_description: str
    category: str
    amount_with_tax: float = field(init=False)
    import_date: datetime = field(init=False)

    def __post_init__(self):
        self.amount_with_tax = round( self.amount * 1.19, 2)
        self.import_date = now


data: list[Data] = [
    Data(
        id =                  d['id'],
        amount =              d['price'],
        in_stock =            int( d['quantity_in_stock'] ),
        product_name =        d['name'],
        product_description = d['description'],
        category =            d['category'],
    )
    for d in raw_data
]

pprint(data)
# [Data(id=1,
#       amount=25.99,
#       in_stock=150,
#       product_name='Wireless Mouse',
#       product_description='A smooth and precise wireless mouse with ergonomic '
#                           'design.',
#       category='Electronics',
#       amount_with_tax=30.93,
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
#  Data(id=2,
#       amount=89.99,
#       in_stock=75,
#       product_name='Bluetooth Headphones',
#       product_description='Noise-cancelling headphones with long battery life.',
#       category='Accessories',
#       amount_with_tax=107.09,
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
#  Data(id=3,
#       amount=12.99,
#       in_stock=200,
#       product_name='Smartphone Stand',
#       product_description='Adjustable stand for smartphones and tablets.',
#       category='Accessories',
#       amount_with_tax=15.46,
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859))]

Ok this was easy. All I did was to attach the @dataclass decorator to my class.

__init__ and __repr__ are handled automatically.

It's declarative and I don't have to write a lot of boilerplate.

I use the __post_init__ method to calculate fields. Those fields can be declared as optional by using the field() function from the dataclasses module.

Making it even better

Let's improve things even further.

from dataclasses import field

@dataclass
class Data:

    id: int
    amount: float
    in_stock: int
    product_name: str
    product_description: str
    category: str
    import_date: datetime = field(init=False)

    def __post_init__(self):
        self.import_date = now

    @property
    def amount_with_tax(self) -> float:
        return round( self.amount * 1.19, 2)

    @classmethod
    def map_from_raw_data(cls, data: dict) -> 'Data':
        return cls(
            id = data['id'],
            amount = data['price'],
            in_stock = int(data['quantity_in_stock']),
            product_name = data['name'],
            product_description = data['description'],
            category = data['category'],
        )


data: list[Data] = [ Data.map_from_raw_data(d) for d in raw_data ]

pprint(data)
# [Data(id=1,
#       amount=25.99,
#       in_stock=150,
#       product_name='Wireless Mouse',
#       product_description='A smooth and precise wireless mouse with ergonomic '
#                           'design.',
#       category='Electronics',
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
#  Data(id=2,
#       amount=89.99,
#       in_stock=75,
#       product_name='Bluetooth Headphones',
#       product_description='Noise-cancelling headphones with long battery life.',
#       category='Accessories',
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859)),
#  Data(id=3,
#       amount=12.99,
#       in_stock=200,
#       product_name='Smartphone Stand',
#       product_description='Adjustable stand for smartphones and tablets.',
#       category='Accessories',
#       import_date=datetime.datetime(2025, 5, 11, 11, 37, 38, 243859))]

Breakdown

Let's break down what I did here.

  • I didn't like having amount_with_tax stored in the data structure. Instead I declared it as a @property and made it a field that will be calculated at runtime. I can simply call it with (e.g.) data[0].amount_with_tax. The nice thing is, the output is not stored in memory.

  • I added a @classmethod that serves as a mapper. Inside the function I can clean and map the data.

The beautiful thing about this setup is that everything related to my Data class is now declared inside the class itself.

  • Properties and types
  • Cleaning and Mapping logic

It also doesn't require much boilerplate.

And since everything is in one class, it becomes easy to identify what's going on in the app and how to make changes.

Another reason I like dataclass is that it's part of the standard library in Python. There's no need to install an external module to work with it.

Alternative to dataclass

However, dataclass has its limits. If you need a full fledged library that actually checks your types and more, you may want to have a look at "Pydantic".

Resources: