OOPs for Data Scientists

16 min readNov 25, 2023

There are 4 principles of OOPs that everyone is expected to know. I used to be one of those people who thought that it was enough to know how to read and write OOPs code without actually knowing what those principles were as I thought they were more of a software engineering concept. Turns out, sometimes that’s not quite enough.

Some Data Science and Machine Learning interviews will also test your OOPs knowledge. This post is an attempt to prepare you for those interviews. After all, failing to prepare is preparing to fail, right? (ᵔᴥᵔ)

Along the way we will also look at a couple of tricks and tips to help you code more easily. For now, we’ll start with an import:

import random

First things first, the 4 principles of OOPs are:

Abstraction
Encapsulation
Inheritance
Polymorphism

If you need a mnemonic to remember them:
AEIP = Astronauts Explore Invisible Planets.

Classes

Let’s make sure we have the basics of a class down. A class is a blueprint. The best example I could come up with for classes as a blueprint was to illustrate with a pokemon class:

class Pokemon:
    pass

This class is pretty sad and does nothing. Let’s rectify that. If you’ve ever played any pokemon based games, you know that each pokemon has its own Attack, Special Attack, Defense and Special Defense. Let’s add those to our class.

class Pokemon:
    def __init__(self):
        self.attack = 0
        self.special_attack = 0
        self.defense = 0
        self.special_defense = 0

pikachu = Pokemon()

Throughout this post, we will refer to instances and objects and they mean one and the same thing, an instance / object of a class. “pikachu” is an instance or object of the Pokemon class.

Know thy “self” 😉

Something that not everyone knows about the “self” keyword is that it doesn’t have to be self at all. It can be anything, as long as we are consistent about it:

class Pokemon_not_self:
    def __init__(not_self):
        not_self.attack = 10
        not_self.special_attack = 10
        not_self.defense = 10
        not_self.special_defense = 10

Example using “not_self” instead of “self” in Python classes

Though it is probably best to use the “self” keyword, as it is the convention and might confuse other people who are reading your code. Code is read more than it is written, so better follow the convention that helps readability. Let’s make the Pokemon class a little more fun by randomly allowing created pokemons (pokemon objects) to be shiny. Then we’ll jump into Inheritance.

class Pokemon:
    type = 'normal'
    def __init__(self):
        self.attack = random.randint(0, 100)
        self.special_attack = random.randint(0, 100)
        self.defense = random.randint(0, 100)
        self.special_defense = random.randint(0, 100)
        self.hp = 100
        self.shiny = random.choices([True, False], weights=[0.0001, 0.9999], k=1)[0]

Inheritance

Inheritance is the easiest to explain and understand, so let’s start there. Let’s say we want to create a class for each type (fire, water, grass, etc.) of pokemon. We can do that by inheriting from the previous Pokemon class:

class PokemonFireType(Pokemon):
    type = 'fire'

So, let’s say I create a Ratata, a normal type pokemon and Charmander, a fire type pokemon using the appropriate classes:

Charmander, a fire type Pokemon, where the PokemonFireType Class inherits from the Pokemon class

Even though we didn’t define all the attributes for the PokemonFireType class, i.e. we didn’t state that an instance of the PokemonTypeFire class would have attack, defense, .., shiny attributes, we see that it still does, because the PokemonTypeFire class inherited from the Pokemon class.

Looping through instance attributes

Another little nuance, you can access the attribute values of an instance like so:

class MyClass:
    def __init__(self):
        self.attribute1 = 1
        self.attribute2 = 2
        self.attribute3 = 3

object = MyClass()

for property in vars(object):
    print(property, ':', vars(object)[property])

Output of running the above code block — looping through instance attributes

I already feel more like a Software Engineer. Let’s move on to the next principle.

Polymorphism

Poly = many, morph = forms. Polymorphism is the ability of an object to take on many forms. Say we have the following scenario (changing the class names a bit for consistency):

class PokemonNormalType:
    type = 'normal'
    def __init__(self):
        self.attack = random.randint(0, 100)
        self.special_attack = random.randint(0, 100)
        self.defense = random.randint(0, 100)
        self.special_defense = random.randint(0, 100)
        self.hp = 100
        self.shiny = random.choices([True, False], weights=[0.0001, 0.9999], k=1)[0]
        self.default_attack = "tackle"

class PokemonFireType(PokemonNormalType):
    type = 'fire'
    def __init__(self):
        self.default_attack = "ember"

class PokemonWaterType(PokemonNormalType):
    type = 'water'
    def __init__(self):
        self.default_attack = "water-gun"

class PokemonGrassType(PokemonNormalType):
    type = 'grass'
    def __init__(self):
        self.default_attack = "razor-leaf"

Given the above class definitions, we can retrieve their types and default attacks with a single function:

ratata = PokemonNormalType()
charmander = PokemonFireType()
squirtle = PokemonWaterType()
bulbasaur = PokemonGrassType()
pokemons = [ratata, charmander, squirtle, bulbasaur]

for pokemon in pokemons:
    print(f"Current pokemon type is {pokemon.type} and default attack is {pokemon.default_attack}")

The same object attribute / class property and even functions can take on different forms depending on the class that is calling it. This is polymorphism. Let’s see a different example:

class PokemonBasic:
    def __init__(self):
        self.height = random.randint(50, 200) # in cm
        self.weight = random.randint(10, 100) # in kgs
        self.hp = 100

class DigimonBasic:
    def __init__(self):
        self.height = random.randint(250, 350) # in cm
        self.weight = random.randint(200, 300) # in kgs
        self.hp = 99

pikachu = PokemonBasic()
agumon = DigimonBasic()
all_the_mons = [pikachu, agumon]
for mon in all_the_mons:
    print(f"Height is {mon.height} cm and weight is {mon.weight} kgs")

Output of the code block above, demonstrating the polymorphism of the height and weight methods

There are two kinds of polymorphism when it comes to class methods:

Method Overloading
Method Overriding

Method Overriding

Method Overriding is similar to what we already did before with instance attributes and class properties, just overriding methods instead. Override is the same as an overwrite, we write over the previously defined method with a newly defined method. Let’s look at an example:

class PokemonNormalType:
    def __init__(self):
        self.hp = 100
    def speak(self, sound):
        print(f"{sound[:4]}-{sound[4:]}!") # rata-ta!

class PokemonFireType(PokemonNormalType):
    def speak(self, sound):
        print(f"{sound[:4]}-{sound[:4]}!") # char-char!

class PokemonElectricType(PokemonNormalType):
    def speak(self, sound):
        print(f"{sound[:4]}-{sound[:2]}!") # pika-pi!

ratata = PokemonNormalType()
charmander = PokemonFireType()
pikachu = PokemonElectricType()
pokemons = [ratata, charmander, pikachu]
sounds = ["ratata", "charmander", "pikachu"]

for index, pokemon in enumerate(pokemons):
    pokemon.speak(sounds[index])

Output of the code block above — different types of Pokemon classes output their respective speak logics

The method “speak” is overidden. The subclass PokemonFireType has its own speak method, which overrides the speak method of the parent PokemonNormalType class. The same thing happens vis-a-vis the PokemonElectricType and PokemonNormalType classes.

This overriding is evident when we look at the different outputs produced by the speak method run on the different instances.

Other languages (I’m looking at you, Java and C++), also have the concept of Method Overloading, where we can use the same method with a different number of arguments. For example:

class PokeBag:
    def __init__(self):
        self.pokemons = []
    def add_pokemon(self, pokemon):
        self.pokemons.append(pokemon)
    def add_pokemon(self, pokemon1, pokemon2):
        self.pokemons.append(pokemon1)
        self.pokemons.append(pokemon2)

We could define a PokeBag class with the appropriate syntax in Java and it would work. But in Python, the latest method will override the previous declaration of that method. So, when we try to use the previous declaration of that method within Python, it will error out :

bag = PokeBag()
try:
    bag.add_pokemon("ratata")
except Exception as e:
    print(f"Exception {e}")

Output of the code block above — Method Overloading attempt that result in Method Overriding.

However, if we use the newly defined method, it will work:

bag.add_pokemon("ratata", "charmander")

This is because Python is a dynamically typed language, i.e. we do not have to define the variable types hence they will be determined dynamically when the program is run. This is in contrast to statically typed languages like Java and C++, where we have to define the variable types before we can use them. If python were statically typed, we would have to define the variable types for the methods in the class definition.

While we can define variable types in Python, it is meant to ease programming. The type hints are NOT enforced. As a best practice, you could use type hints to indicate variable types all the same, but make sure to remember that they are hints and not variable type assignments. Let’s look at the PokeBag example with type hints:

# PokeBag class with type hints
class PokeBag:
    def __init__(self):
        self.pokemons = []
    def add_pokemon(self, pokemon: str):
        self.pokemons.append(pokemon)
    def add_pokemon(self, pokemon1: str, pokemon2: str):
        self.pokemons.append(pokemon1)
        self.pokemons.append(pokemon2)

As the developer, I know that I need to pass pokemons as strings:

bag = PokeBag()
bag.add_pokemon("ratata", "charmander")
bag.pokemons

Output of the code block above — adding pokemon to the PokeBag instance

But notice that the typing is not enforced:

bag = PokeBag()
bag.add_pokemon("ratata", 2)
bag.pokemons

Output of the code block above — demonstrating that type hinting is not enforced in Python

Dynamic typing is one of the reasons why Python is slower than Java and C++, as there is more work to do when the program is run. With all the drama above, it may seem like method overloading is a pipe dream in Python.

*args workaround

However, there is a way to achieve method overloading in Python. But first, if we just wish to call the method with an unknown number of arguments, we can just use the *args syntax :

class PokeBag:
    def __init__(self):
        self.pokemons = []
    def add_pokemon(self, *pokemons):
        for pokemon in pokemons:
            self.pokemons.append(pokemon)

bag = PokeBag()
bag.add_pokemon("ratata")
bag.pokemons

Output of the code block above — adding a single pokemon to the PokeBag instance

bag = PokeBag()
bag.add_pokemon("pikachu")
bag.add_pokemon("ratata", "charmander")
bag.add_pokemon("squirtle", "bulbasaur", "metapod")
bag.pokemons

Output of the code block above — adding multiple pokemon to the PokeBag instance

However, this is NOT method overloading, since there is no method to overload (we only have one method to add pokemon to the bag).

Method Overloading

We CAN achieve method overloading in Python using the library called multipledispatch:

# Source: https://www.geeksforgeeks.org/python-method-overloading/
from multipledispatch import dispatch
class PokeBag:
    def __init__(self):
        self.pokemons = []
    @dispatch(str)
    def add_pokemon(self, pokemon):
        self.pokemons.append(pokemon)
    @dispatch(str, str)
    def add_pokemon(self, pokemon1, pokemon2):
        self.pokemons.append(pokemon1)
        self.pokemons.append(pokemon2)

bag = PokeBag()
bag.add_pokemon("ratata")
bag.add_pokemon("arbok", "weezing")
bag.pokemons

Output of the code block above — Method Overloading in Python using multipledispatch

Notice that the code that errored out before doesn’t error out when we use the @dispatch decorator from multipledispatch library. So, if we really want to achieve method overloading in Python, we can use the method shown above.

An exception to the rule

There is one thing that python allows you to overload without using the multipledispatch library and that is defining getters, setters and deleters for the instance properties. These are a bit of an exception however, as they are identified by the decorators used to define them. We’ll take a look at these when we get to encapsulation.

I might become a Software Engineer at any second at this rate. Let’s move on to the next principle.

Abstraction

I remember the first time that the word “abstract” was explained to me in English class. It was defined as something intangible, something that you could feel but not touch. Something like love, happiness, sadness or adjectives such as soft or hard.

When I mention the word- “soft”, you already have an idea of “softness” in your head, but not an exact one. I could be referring to a soft pillow, a soft toy, a soft feeling or even a soft person. The word “soft” is abstract and the pillow / toy / feeling / person are concrete examples of abstract concept of the word “soft”. All the examples are derived from the same idea of “softness”.

Similarly, when it comes to classes we can define an abstract class, which is a class that is not fully formed. It’s a blueprint for other classes to be derived from. If we follow the “soft” example, we can define an abstract class called “Soft” and then derive classes like “SoftPillow”, “SoftToy”, “SoftFeeling” and “SoftPerson” from it.

Let’s go back to the notion of abstract classes being the blueprint for classes. If you recall, in the beginning of this post we mentioned that classes are blueprints (for objects). Hence, as abstract classes are blueprints for classes, they are… blueprints for blueprints. This is why abstract classes are also called meta classes.

Abtract Classes and abstract methods

In python, to define abstract classes and abstract methods, we need to import the ABC (Abstract Base Class) module from the abc library. Let’s look at an example:

from abc import ABC, abstractmethod, abstractproperty
class Soft(ABC):
    @abstractmethod
    def softness(self):
        pass

Any python class that derives from the Abstract Base Class (ABC) as shown is an abstract class. It is a class that serves as a blueprint for other classes. It’s a blueprint because there isn’t an actual real implementation of any functionality in the abstract class. Hence, if you try to create an instance / object of the abstract class, it will error out:

try:
    obj = Soft()
except Exception as e:
    print(f"Exception: {e}")

Output of the code block above — Error on instantiating an Abstract Class

This non-instantiation property serves as an easy decision making tool. If you want to allow the creation of parent class objects, use normal inheritance and subclassing. If you do not want to allow the creation of parent class objects, use abstract classes.

Any child class derived from the parent abstract class (“Soft”) will need to define the abstract methods mentioned in the parent class. If it doesn’t, it will error out.

class SoftPillow(Soft):
    def not_softness(self):
        print("I am not a soft pillow as I am missing the abstract method softness()!")

try:
    obj = SoftPillow()
    print(obj.not_softness())
except Exception as e:
    print(e)

Output of the code block above — Error instantiating a class deriving from an Abstract class but missing an abstract method

However, if we implement the abstract method softness() in the SoftPillow class, then we can instantiate the class:

class SoftPillow(Soft):
    def softness(self):
        print("I am a super soft Pillow!")

obj = SoftPillow()
obj.softness()

Output of the code block above — Instantiating a class deriving from an Abstract class

We could have a mix of abstract and non-abstract methods in an abstract class:

class Soft(ABC):
    @abstractmethod
    def softness(self):
        pass

    def size(self):
        pass

The abstract methods are mandatory to be implemented in the child classes, but the non-abstract methods are optional. We illustrate the same below:

class SoftPillow(Soft):
    def size(self):
        print("I am a smol Pillow!")

try:
    obj = SoftPillow()
    obj.size()
except Exception as e:
    print(e)

Output of the code block above — Error instantiating a class deriving from an Abstract class. The derived class contains a non-abstract method but is missing an abstract method

Since we didn’t have the mandatory abstract method softness() in the SoftPillow class, it errored out. However, we didn’t have to implement the optional non-abstract method size().

class Softness(Soft):
    def softness(self):
        print("I am a super soft Pillow!")

obj = Softness()
obj.softness()

Output of the code block above — Instantiating a class deriving from an Abstract class. The derived class contains the abstract method

If we implement the mandatory abstract method softness() in the SoftPillow class, then there’s no problem.

Abstract Properties

An abstract class can also have abstract properties and non abtract-properties, just like abstract methods and non-abstract methods. The abstract properties are, surprise, surprise, mandatory to be implemented in the child classes, but the non-abstract properties are optional. Example below:

class Soft(ABC):
    def __init__(self):
        self.price = "Not defined"
    
    @abstractproperty
    def get_price(self):
        pass

class SoftToy(Soft):
    def __init__(self):
        self.price = "100"
    
    def get_price(self):
        return self.price

obj = SoftToy()
obj.get_price()

Output of the code block above — conventional getter

Other than the fact that we’re using a different decorator: “@abstractproperty” instead of “@abstractmethod” there isn’t a real difference between the two (Source: StackOverflow).

The only convention based nuance is that you would use the “@abstractproperty” decorator when you are dealing with properties and the “@abstractmethod” decorator when you are dealing with methods.

In the example above, we could have used the “@abstractmethod” decorator instead of the “@abstractproperty” decorator and it would have worked just fine. However, I have used the “@abstractproperty” decorator just so we all know that there is an alternative decorator that exists for properties.

An abstract class can implement some functionality of course, but remember that it cannot be instantiated. It’s useful to add implementations to an abstract class if you want to share some common functionality between the child classes. This works in the same way as it does in vanilla inheritance, so we will not look at an example.

Encapsulation

Encapsulation refers to the process of “encapsulating” data and methods together. When we put the data and methods together in such a capsule, we are able to limit access to the data and methods, so that there’s no accidental modification.

There are 3 levels of access modifiers in general, in increasing order of restrictiveness:

Public
Protected
Private

Python has its own conventions when it comes to programming. This is glaringly obvious when we look at access modifiers. If you are a C++ / Java person before and are just discovering Python, it might be a little strange, but you’ll get used to it soon enough.

Typically, in other languages, you would use a keyword to define the access modifier, such as “public”, “protected” or “private”. For instance in Java:

public static void main(String[] args)

Python uses underscores to distinguish between public, protected and private methods. Let’s take a look at each of these in Python, starting with public properties and methods:

Public

class Pokemon:
    def __init__(self):
        self.hp = 100
        self.attack = random.randint(0, 100)
        self.defense = random.randint(0, 100)
    
    def speak(self):
        print("I am a Pokemon!")

Notice the lack of underscores in the property (hp, attack, defense) and method (speak) names. This is the convention for public properties and methods in Python. Very vanilla, nothing new. Properties are public by default. We can access these properties and methods from outside the class as well as from inside the class.

pokemon = Pokemon()
print(pokemon.attack)
print(pokemon.defense)
pokemon.speak()

Output of the code block above — public instance attributes

In the above examples we accessed the attack, defense and speak properties from outside the class. Hence, they are public. Let’s now take a look at protected properties and methods:

Protected

class Pokemon:
    def __init__(self):
        self._hp = 100
        self._attack = random.randint(0, 100)
        self._defense = random.randint(0, 100)
    
    def _speak(self):
        print("I am a Pokemon!")

The instance properties _hp, _attack and _defense as well as the _speak method are meant to be protected. Just like Python doesn’t enforce the variable types mentioned as type hints, it also doesn’t enforce the access modifiers.

Python buys into the notion that “We’re all adults here”. While the protected methods can be accessed from outside the class, just like the public methods, we as python programmers CHOOSE not to.

This means that protected properties need to be accessed via their getters and setters. And protected methods should only be used internally within a class. Let’s look at an example:

You might sometimes encounter a TypeError when using the @property decorator in .ipynb files. I raised an issue about this on the VS Code Jupyter github repo. If you ever encounter the same error, just restart your VS Code and it will be fixed (unless it’s a code error). You can also run the code in a .py file and execute the .py file from the command line like an actual software engineer.

class Pokemon():
    def __init__(self):
        self._hp = 100
        self._attack = 0
        self._defense = 0

    @property
    def attack(self): # attack getter
        return self._attack
    
    @attack.setter
    def attack(self, value):
        self._attack = value

    @property
    def defense(self): # defense getter
        return self._defense
    
    @defense.setter
    def defense(self, value):
        self._defense = value

pokemon = Pokemon()
print(f"The pokemon's attack stat is: {pokemon.attack}")
print(f"The pokemon's defense stat is: {pokemon.defense}")
pokemon.attack = 50
pokemon.defense = 30
print(f"The pokemon's attack stat is: {pokemon.attack}")
print(f"The pokemon's defense stat is: {pokemon.defense}")

Output of the code block above — accessing protected instance attributes using fancy getters and setters

Notice that in the above syntax, we do not access the protected properties directly. Usually, if we had to access a getter function called attack(), we would have called pokemon.attack(). Moreover, if we had a setter function called attack(), we would have set the pokemon’s attack using pokemon.attack(some_value). However, since we used the @property decorator, the function name is treated as a property. When the function is treated as a property, we assign and retrieve values from it as if it were a property. You will be able to contrast this with the non-fancy example that will follow.

In the example above, we define getters and setters for the protected instance properties. We use some slightly fancy syntax to define the getters and setters, but you can choose not to do so and define them in the usual way. If you notice, this is where the Python method overloading exception comes into play. We defined the getters and setters for the protected instance properties using the same method with a different number of arguments. For instance, for the _attack property, we defined the getter with 0 arguments and the setter with 1 argument.

If you do choose to use the fancy way to define getters and setters make sure you follow the order, getter before setter and use the appropriate decorators. Here’s a small template for you to use:

@property
def property(self): # property getter
    return self._property

@property.setter
def property(self, value):
    self._property = value

The non-fancy version:

class Pokemon():
    def __init__(self):
        self._hp = 100
        self._attack = 0
        self._defense = 0

    def get_attack(self): # attack getter
        return self._attack
    
    def set_attack(self, value):
        self._attack = value    
    
    def get_defense(self): # defense getter
        return self._defense
    
    def set_defense(self, value):
        self._defense = value

pokemon = Pokemon()
print(f"The pokemon's attack stat is: {pokemon.get_attack()}")
print(f"The pokemon's defense stat is: {pokemon.get_defense()}")
pokemon.set_attack(50)
pokemon.set_defense(30)
print(f"The pokemon's attack stat is: {pokemon.get_attack()}")
print(f"The pokemon's defense stat is: {pokemon.get_defense()}")

Output of the code block above — accessing protected instance attributes using non-fancy getters and setters

While the results are the same, make sure you notice the difference in syntax between the fancy version and the non-fancy version.

Private

In Python, private properties and methods are defined using double underscores at the beginning of their names. We CANNOT access these private methods and properties from outside the class. Let’s look at an example:

class Pokemon:
    def __init__(self):
        self.__hp = 100
        self.__attack = random.randint(0, 100)
        self.defense = random.randint(0, 100)
    
    def __speak(self):
        print("I am a Pokemon!")

try:
    pokemon = Pokemon()
    print(pokemon.__hp)
except Exception as e:
    print(e)

Output of the code block above — trying to access a private instance attribute

The same thing will happen if we try to access the private attack property (__attack) from outside the class. We can compare this with the defense property (note that we use “defense” and not “_defense” for the comparison, as we are not supposed to use the protected property directly):

pokemon = Pokemon()
print(pokemon.defense)

Output of the code block above

We cannot access the private methods either:

try:
    pokemon = Pokemon()
    pokemon.__speak()
except Exception as e:
    print(e)

Output of the code block above — trying to access a private method

If we really need to access the private methods and properties, for whatever purpose, we can use something called “name mangling”. Take a look at the class and the modified way to access the private properties and methods:

class Pokemon:
    def __init__(self):
        self.__hp = 100
        self.__attack = random.randint(0, 100)
        self.defense = random.randint(0, 100)
    
    def __speak(self):
        print("I am a Pokemon!")

pokemon = Pokemon()
print(pokemon._Pokemon__hp)
print(pokemon._Pokemon__attack)

Output of the code block above — accessing private instance attributes using name mangling

pokemon._Pokemon__speak()

Output of the code block above — accessing private methods using name mangling

At this point, you should be prepared for any OOPs questions in your Data Science interviews. We are well and truly on our way to becoming a Software Engineer in addition to being Data Scientists ʕ •ᴥ•ʔ. I do recommend that you practice writing code to practice these principles.

If you wanna dig deeper into this rabbit hole (I referred to most of these myself while writing this post), take a look at these: