Define your constants as Enums
...because it's faster, safer and overall neater.
Today, we talk about Enums. More specifically about their uses and why they are useful also in the context of data science projects.
So... let's get started!
What are Enums?
Enums, or enumerations, are a very useful data type in Python that allow you to define a set of related values that can be used throughout your code. They are useful in a variety of situations, and can make your code more readable, maintainable, and efficient.
General benefits of using Enums in Python
Enums improve the readability of your code
One of the main benefits of using enums is that they can make your code more readable, especially if you use them to define constants that are used throughout your code. For example, consider the following code:
# Without enums
MAX_RETRIES = 5
def retry_operation(operation, max_retries=MAX_RETRIES):
for i in range(max_retries):
try:
return operation()
except Exception:
pass
raise Exception("Failed after {} retries".format(max_retries))
# With enums
from enum import Enum
class RetryPolicy(Enum):
MAX_RETRIES = 5
def retry_operation(operation, policy=RetryPolicy.MAX_RETRIES):
for i in range(policy.value):
try:
return operation()
except Exception:
pass
raise Exception("Failed after {} retries".format(policy.value))
In the first example, the constant MAX_RETRIES
is defined at the top of the file, and its value is used in the retry_operation
function. This can be confusing to readers of the code, as the value of MAX_RETRIES
is not immediately apparent. In contrast, the second example defines an Enum
called RetryPolicy
, which includes a value called MAX_RETRIES
. This value is used in the retry_operation
function, making the code much more readable and intuitive.
Enums improve the maintainability of your code
Enums can also make your code more maintainable, as they allow you to define a set of related values in one place, rather than scattering them throughout your code. This can be especially helpful if you need to change the values of these constants in the future. For example, consider the following code:
In the first example, the constants SUCCESS
and ERROR
are defined at the top of the file and used in the process_request
function. If you need to change the values of these constants in the future, you will need to update every instance of these constants throughout your code. In contrast, the second example defines an Enum
called RequestStatus
Enums improve type safety of your code
Enums can also improve the type safety of your code, as they allow you to define a fixed set of values that a variable can take on. This can be especially helpful in cases where you need to ensure that a variable is assigned a specific value, or a set of values. For example, consider the following code:
In the first example, the constants FRUIT_APPLE
, FRUIT_ORANGE
, and FRUIT_BANANA
are defined at the top of the file and used in the get_fruit_price
function. This can be problematic, as there is nothing to prevent someone from passing an invalid value to the fruit
argument. In contrast, the second example defines an Enum
called Fruit
, which includes values for APPLE
, ORANGE
, and BANANA
. By annotating the fruit
argument with the Fruit
type, we can ensure that only valid values are passed to the function.
Enums improve performance (in some cases)
Enums can also offer improved performance compared to other data types, such as strings or integers. This is because enums are stored as integers under the hood, and are much faster to compare than strings or other objects. For example, consider the following code:
In the first example, the compare_strings
function compares two strings using the ==
operator. This can be slow, especially if the strings are long or complex. In contrast, the second example defines an Enum
called StringEnum
, which includes values for S1
and S2
. By annotating the arguments with the StringEnum
type, we can ensure that only valid values are passed to the function. Additionally, because enums are stored as integers under the hood, the comparison is much faster than if we were comparing two strings.
In conclusion, enums are a very useful data type in Python that can improve the readability, maintainability, type safety, and performance of your code. They allow you to define a set of related values that can be used throughout your code, and are especially useful when you need to define constants or a fixed set of values. By using enums in your code, you can make your code more intuitive, maintainable, and efficient, and ensure that your variables always have the correct values.
Is there any use of Enums in data science projects?
Enums are also very useful in the context of data science, because they allow you to define a fixed set of values that can be used throughout your code. This can be especially helpful in cases where you need to ensure that a variable is assigned a specific value, or a set of values.
For example, consider the code below that processes a dataset of customer data. The dataset includes a column called customer_type
, which can take on one of three values: "standard"
, "premium"
, or "platinum"
. Without using enums, you might define these values as constants at the top of your script, like this:
This approach works, but it can be confusing to readers of the code, as the values of the constants are not immediately apparent, especially if you string along a lot of string descriptors. The becomes even more problematic, the more constants define within different locations of your code. Additionally, as already hinted at above, if you need to change the values of these constants in the future, you will need to update every instance of these constants throughout your code.
A better approach is to use enums to define these values. For example:
This approach is much more readable and intuitive, as the values of the CustomerType
enum are clearly defined in one place. Additionally, if you need to change the values of these constants in the future, you only need to update the CustomerType
enum, rather than searching for and updating every instance of the constants throughout your code.
Another advantage of using Enums in data science projects is related to the already previously discussed performance improvements.
Summing up
Enums are a very useful data type in the context of data science scripts, as they allow you to define a fixed set of values that can be used throughout your code. They can improve the readability, maintainability, and performance of your scripts, and ensure that your variables always have the correct values. By using enums in your data science scripts, you can make your code more intuitive, maintainable, and efficient, and help ensure the quality and accuracy of your results.