Skip to main content
Python

An Introduction to Python's Dataclass Decorator

5 mins

A blueprint and finished building, illustrating the concept of Python’s data classes and their design considerations.

What is a Dataclass? #

A dataclass in Python is a decorator that automatically generates special methods for classes that primarily store data. It was introduced in Python 3.7 via the dataclasses module.

When you define a class with the @dataclass decorator, Python automatically adds methods like __init__, __repr__, __eq__, and others based on the class attributes you define. These are commonly needed methods for data-holding classes, and the dataclass decorator helps streamline their creation.

  • The __init__ method initializes the class attributes
  • The __repr__ method provides a string representation of the class instance
  • The __eq__ method allows for comparison between instances of the class

When to Use Dataclasses #

Dataclasses are a quick way to create classes that are primarily used to store data, saving you time from writing repetitive boilerplate code.

Typical uses of dataclasses include:

  • DTO (Data Transfer Objects) - such as a class representing a row in a database table
  • Configuration objects - a class that holds configuration settings for an application

What about NamedTuples? #

If you have a simple use case where you just need an immutable data structure, consider using NamedTuple from the typing module instead of a dataclass. See the article Named Tuples - The Best Data Structure You’re Not Using for more information.

Dataclasses have more features than NamedTuples do, such as default values, type annotations, and support for mutable fields.

Regular Class vs Dataclass #

To illustrate the difference between a regular class and a dataclass, let’s compare the two approaches for defining a simple Person class.

Regular Class #

class Person:
    def __init__(self, name: str, age: int, id: str):
        self.name = name
        self.age = age
        self.id = id

    def __repr__(self):
        return f"Person(name={self.name}, age={self.age}, id={self.id})"

    def __eq__(self, other):
        if not isinstance(other, Person):
            return NotImplemented
        return (self.name == other.name and self.age == other.age
                and self.id == other.id)  

Dataclass #

There is a lot of boilerplate code in the regular class example above. We can achieve the same functionality with a dataclass.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    id: int

Each field in the dataclass needs a type annotation.

Much less code, the type annotations define the attributes, so that the __init__, __repr__, and __eq__ methods can be properly generated.

The usage is the same for both implementations.

person1 = Person("John", 30, 123456789)
person2 = Person("Jane", 25, 987654321)
person3 = Person("John", 30, 123456789)

print(person1)
print(person1 == person2)
print(person1 == person3)

Output:

Person(name='John', age=30, id='123456789')
False
True

Without any parameters, the @dataclass decorator generates the default methods as described above. However, more features can be enabled or disabled by passing parameters to the decorator.

Order methods #

By default, dataclasses do not generate ordering methods (__lt__, __le__, __gt__, __ge__). To enable these methods, you can set the order parameter to True.

@dataclass(order=True)

Care must be taken when using the order parameter, as all comparable fields are used to generate the ordering methods.

For example:

print("person1 < person2", person1 < person2)
print("person1 <= person2", person1 <= person2)
print("person1 <= person3", person1 <= person3)

Results in

person1 < person2 False
person1 <= person2 False
person1 <= person3 True

Set what fields are used for ordering

If you do not want all fields to be used for ordering, you can use the field function with the compare parameter set to False for specific fields.

from dataclasses import dataclass, field

@dataclass(order=True)
class Person:
    name: str = field(compare=False)
    age: int = field(compare=False)
    id: int

So with this change, only the id field will be used for ordering comparisons, resulting in:

person1 < person2 True
person1 <= person2 True
person1 <= person3 True

Default Values #

Simple Default Values #

Basic data types like int, str, and float can have default values assigned directly in the class definition.

Default values must come after non-default values.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    id: int
    age: int = 18  # Default age is 18


person1 = Person("John", 123456789)
print(person1)  # Output: Person(name='John', id=123456789, age=18)

Lists as Default Values #

Data types like list, dict, and set are mutable, so care must be taken when assigning default values to fields of these types.

The following code will lead to a ValueError.

@dataclass(order=True)
class Person:
    name: str
    courses: list = ["intro"]

Outputs

ValueError: mutable default <class 'list'> for field courses is not allowed: use default_factory

Instead, use the default_factory parameter of the field function. Do not forget to use the compare=False parameter if the field is not to be used for ordering.

from dataclasses import dataclass, field            

@dataclass(order=True)
class Person:
    name: str = field(compare=False)
    id: int
    age: int = field(default=18, compare=False)
    courses: list = field(default_factory=lambda: ["intro"], compare=False)

person = Person("John", 123456789)
print(person)

outputs

Person(name='John', id=123456789, age=21, courses=['intro'])

Custom __init__ #

You may need to add some validation logic to the __init__ method. Do not define the __init__ method directly, as this will override the dataclass-generated method. Instead, use the __post_init__ method, which is called automatically after the generated __init__ method.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    id: int
    age: int = 21

    def __post_init__(self):
        if self.age < 21:
            raise ValueError("Age cannot be less than 21")

Summary #

The @dataclass decorator is a useful tool for simplifying the creation of classes that primarily store data. It automatically generates common methods (e.g., __init__), and can be customized with various parameters for ordering, default values, and more.

However, note the features that dataclasses provide, as NamedTuples may be a better fit for simple immutable data structures.