Use TypedDict
!
If there’s one thing that you can take away from this TIL, there’s almost no reason to not use TypedDict
in python. If you’re on python 3.8 or higher (and yes, you need to use python higher than 3.8, I would highly recommend use 3.10 for newer project in 2024). It can be as simple as:
from typing import TypedDict
class ConfigDict(TypedDict):
host: str
port: int
user: str
config : ConfigDict = {"host": "localhost", "port": 5432, "user": "dev"}
But, you may say:
Why not use
dataclass
instead? or even betterpydantic
so that we can parse the dictionary? Oh there’s also a package calleddynaconf
andomdegaconf
that can handle cascading config!
Yup, you’re probably right and definitely really excited, but hear me out. I write this for people who still uses a lot of dictionary in their python code, not that it’s wrong but we often do put together a quick PoC and still don’t know the shape of our data yet. But we grow out of that project phase, and we want to either have our code self explain or work with other developer. This is the first step that I want you to do and start maybe investing your time in when refactoring that code.
What we Gain? Auto Suggestion!
This is reason enough to use TypedDict
in your code, especially if you’re on VSCode with Pylance extension. Just by adding type hinting to your dictionary object, you can get this for free:
You get suggestion for the
dict
’s key! And you know what’s neater? Your editor can also suggest you the method that’s valid for the object type! Can you imagine how a lifesaver this when I can’t get my brain to remember whether I need capitalise
or capitalize
to make stuff CAPITAL?
What we don’t Get? Other Object!
While I believe dataclass
, Pydantic’s BaseModel
have a whole lot of features that we can use to make our code read better and can validate our data, and OmegaConf is one of my goto for parsing configuration file, but all of them need us to pass our data into their class constructor one way or another. Which may not be fully compatible with how we use our dict. I mean even something as simple as getting an item is not compatible, and we need to change our code significantly.
@dataclass
class ConfigData:
host: str
port: int
user: str
config_data = ConfigData(**config)
config["host"] # OK
config_data["host"] # Error
config_data.__dict__["host"] # OK
Whether it is worth it or not is another topic, but being aware that the alternative may introduce either bug or another hours of work is still important.
Any Other Tips?
Oh? glad you asked :) here are some quick rounds, I may create another TIL on these for more details
Use Literal for Choice in Public Interface
Sometimes we want to expose a function with a certain category for it’s argument, using using bool
flag can make our parameter too crowded, using str
don’t really tell our user what string to pass, using enum
will imply our user need to import the related enum class. We could use Literal
in this case.
dict
is Somehow Similar to set
Ever want to combine to dict
?
first_dict = {"a":1}
second_dict = {"b":2}
There are several methods:
new_dict_1 = first_dict.update(second_dict)
new_dict_2 = {**first_dict, **second_dict}
new_dict_3 = first_dict | second_dict
Want to do Typehint in pandas
?
Though luck then, since pandera
just started offering experimental support for mypy
, while VSCode uses pyright
. I really hope it would get better over time. For now what we can do is using assign
well. Instead of having our function takes pd.DataFrame
and lost all information on the column name in our editor, we could structure it as function with multiple pd.Series
as its parameter.
import pandas as pd
def sum_two_columns_df(df: pd.DataFrame):
return df["a"] + df["b"]
def sum_two_columns_series(a: pd.Series, b: pd.Series):
"""Sum two pandas Series. We don't need to know the name of the columns."""
return a + b
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
res = sum_two_columns_df(df) # OK
res = sum_two_columns_series(df["a"], df["b"]) # Better, since in this scope we know df contains columns "a" and "b"
Use Final
to Hold that Constant
Already follow PEP8 and use CAPITAL_CASE
for that constant? Or you have a variable you don’t want to override? Help your editor to remind you by adding Final
type to it
from typing import Final
SOME_GLOBAL_CONSTANT : Final[int] = 42
def do_sonething(a_number:int) -> int:
final_number :Final[int] = a_number + SOME_GLOBAL_CONSTANT
...
return final_number
Want More?
Wow, I can’t believe you reach this spot in my TIL. If you want to discuss more things about python, data or Machine Learning, hit me up on Twitter (or X actually, I still haven’t wrapped my head around it) and either DM me your question or discuss this TIL on the thread!