Underscores in python for beginners


Underscores in python for beginners Author: Mohammad Amaan Abbasi | Aug. 25, 2018

The concept of underscores is much easier to understand than you think it is.

We are going to talk about the following underscore patterns:

Single Leading Underscore _x

Single Leading Underscore prefix has a meaning by convention only and is not enforced by the python interpreter. It is only a hint to tell programmers that a variable or a method starting with a single underscore is private.

Unlike C++, python does not have strong distinctions between private and public variables.

Let's try an example to get a better understanding. Say, we have a class User, which have two attributes name and _password:

class User:
    def __init__(self):
        self.name =  "Amaan"
        self. _password = "private123"

Note: If you don't know about self keyword, just ignore it for now.

What you think is going to happen if you instantiate this class and try to get the value of 'name' and '_password'?

Let's check :

>>> u = User()
>>> u.name
Amaan
>>> u._password
private123

As you can see, the leading underscore in _password didn't make the attribute private and allowed us to access the value stored in it. Likewise, functions can also be accessed.

Yet leading underscores impact names that are imported from other files. Let's say you have a file called my_file.py:

def my_func():
    return "This will be imported"

def _my_func2():
    return "This will not be imported"

Now if you try to import both the names using wildcard import, Python will not import the names starting with a leading underscore.

>>> from my_file import *
>>> my_func()
'This will be imported'
>>> _my_func2()
NameError: "name '_internal_func' is not defined"

However, regular imports are not affected by leading single underscores:

>>> import my_file
>>> myfile._my_func2()
'This will not be imported'

Single Trailing Underscore x_

Trailing underscores are added to avoid name conflict with already existing variables. Names like 'list', 'int' or 'class' cannot be used as variable name in python. But if you want to use them, as you can't think of a alternative name, you can append a single underscore to get rid of the naming conflict:

>>> def class():
SyntaxError: "invalid syntax"

>>> def class_():
        print("hello world")

>>> class_()
'hello world'

Double Leading Underscore __x

Now this one has to do with the python interpreter. A double underscore prefix causes the python interpreter to rewrite the method names, to avoid naming conflicts.

This is called name mangling, Let's look at an example:

class Car:
    def __init__(self):
        self.speed = 200
        self._color = 'white' 
        self.__owner = 'Amaan'

>>> c = Car()
>>> dir(c)

['_Car__owner', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_color','speed']        

If you look closely, the variables speed and _color are unchanged.

However, __owner variale doesn't exist and instead it has been converted to _Car__owner(first element in the list above). Python interpreter does this to protect the variable from getting overridden in subclasses.

Let's go a bit further and actually see, How this is useful. Let's create another class that extends the Car class.

 class ExtendedCar(Car):
    def __init__(self):
        super().__init__()
        self.speed = 100
        self._color = 'yellow' 
        self.__owner = 'Abbasi'

Let's take a look at our variables

>>> c2.speed
100
>>> c2._color
'yellow'
>>> c2.__owner
Traceback (most recent call last):
  File "<pyshell#52>", line 1, in <module>
    c2.__owner
AttributeError: 'ExtendedCar' object has no attribute '__owner'

So why didn't the last one worked ? Let's take a look at the list of attributes.

>>> dir(c2)
['_Car__owner', '_ExtendedCar__owner', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_color', 'speed']

Because of name mangling __owner got turned into _ExtendedCar__owner but the original _Car__owner is still there.

Double Leading And Trailing Underscore _x_

Double underscores are often referred to as dunders in python community. Double underscores occur quite often in python code, so to avoid saying the long word "double underscore x double underscore", Pythonistas shorten it to dunders.

Names that have both leading and trailing double underscores are reserved for special use in language.

It's best to not use names that start and end with double underscores in your own python programs, So to avoid collisions with built-in python methods.

A method you will see often is the _init_(self)

Single Underscore _

It is used in two ways:

First, to hold the value of the last expression in a Python REPL session.

>>> _ = 5 + 5
>>> print('hi')
hi
>>> _     
10

Second, used to name insignificant variables. for example:

>>> for _ in range(5):
      print("Hello")

Hello
Hello
Hello
Hello
Hello

In the for loop, we used an underscore instead of a variable because declaring a variable is unnecessary in this situation.

Takeaways

  • Single Leading Underscore "_x": Naming convention to specify a name is meant for internal use (private use).

  • Single Trailing Underscore "x_": Naming convention used to avoid conflict with python built-in names(keywords).

  • Double Leading Underscore "__x": Enforced by the python interpreter, alters name of the class attributes.

  • Double leading and Trailing Underscore "_x_": These are special methods in python, and one should avoid using this scheme to declare their own methods.

  • Single Underscore "_": Used to name variables that are insignificant and to store results of the last expression in a interpreter session.

Well, that's it! I've certainly learned a lot from writing this down. I'll be happy to answer any questions in the comments.