Conda 101

In previous posts, we mentioned parts of the Python ecosystem such as pip (package manager) and virtualenv (environment manager).

In this post, we are going to introduce conda, a powerful system that combines the above two functionalities.

Preliminaries

The core Python language contains various features, e.g. iterables, that can be used to say, display the integers from one to ten:

for i in range(1, 11):
    print(i)

Now suppose you wanted to calculate the square root of each of these integers. The core Python language cannot do this, you have to import the math module from the Python standard library

import math


for i in range(1, 11):
    print(math.sqrt(i))

(the standard library link above lists the modules it contains).

Now suppose you wanted to send an HTTP request (or achieve some other common programming task). You could do this using only the core Python language and standard library. However, typically, it is easier and more robust to use a third-party package, e.g. requests

import requests


r = requests.get('https://bbc.co.uk/')
print(r.status_code)
print(r.content)

There are hundreds and thousands of such packages (some judgement is required to discern which ones are worth using, but usually not much).

The packages take the form of online repositories hosted on cloud platforms, e.g. Python Package Index (PyPI), Anaconda Cloud, and come with package managers, e.g. pip and conda.

pip is the package manager for packages hosted on PyPI. Anaconda Cloud supports multiple package managers. However, the most common two are conda and pip.

Initial setup

On most Unix-like systems, Python and its standard library come pre-installed. Depending on the OS, some third-party packages may also come pre-installed.

To get a better idea of what conda is, let’s first consider what you get with a clean Ubuntu install:

package manager - none
environment - one (global)

The global environment contains usually two Python installations:

Python 2.x.x - /usr/bin/python
Python 3.x.x - /usr/bin/python3

in the form of binary executables.

Each is a Python interpreter and a standalone piece of software with its own functionality, e.g. /usr/bin/python --help, and usually run via the command python or python3, which can be confirmed with which python or which python3.

To list all pre-installed modules for say Python 3.x.x, python3 then

help('modules')

To list those part of the standard library, ls /usr/lib/python3.5, and to list the third-party ones, ls /usr/lib/python3/dist-packages.

Any of the modules above will ‘work out of the box’, i.e.

import <module>

allows you to start using that module’s functionality straight away. This is because /usr/lib/python3.5 and /usr/lib/python3/dist-packages are in Python’s module search path which is one of the variables Python checks when it sees the import keyword.

To verify they are in the module search path,

import sys


sys.path

Installing conda

Follow the conda installation steps. Amongst other things, this will:

prepend /my/path/to/anaconda3/bin to PATH
install in /my/path/to/anaconda3/bin software binaries that come ‘for free’ with conda, e.g. python, ipython, conda-build, conda-env, etc. (this means python now points to my/path/to/anaconda3/bin/python, which python to confirm).
install in /my/path/to/anaconda3/lib/python3.6 the Python standard library modules
install in /my/path/to/anaconda3/lib/python3.6/site-packages default third-party modules

Basic conda usage

Firstly, let’s evaluate what installing conda has done in terms of our clean Ubuntu install.

Before installing conda, we had the following situation:

package manager - none
environment - one (global)

Now, after installation:

package manager - conda
environment - one (conda root)

In some senses, the conda root environment has usurped the previous global environment. But fear not, it has done so without interfering with any of the previous Python-related setup and configuration, e.g. /usr/bin/python3 works just as before.

However, if you’re used to virtualenv, you might expect to see something like

(root) joe@n24-25bu:~/Documents

rather than

joe@n24-25bu:~/Documents

Nonetheless, you can confirm you are in the conda root environment with conda info --envs

# conda environments:
#
root                  *  /my/path/to/anaconda3

where the asterisk indicates the root environment is currently activated.

Further, conda list brings up the pre-installed standard library and third-party modules in /my/path/to/anaconda3/lib/python3.6 as well as the Python interpreter in /my/path/to/anaconda3/bin.

As a rough rule of thumb, the conda root environment should not be used for anything.

Thus the next thing to do (and whenever you start a new project) is create a new environment

conda create --name <myenv>

You can check it is there with conda info --env.

This new environment <myenv> is completely empty. If you run source activate <myenv> then conda list, you should see

# packages in environment at /my/path/to/anaconda3/envs/<myenv>:
#

To delete an environment, conda remove --name <myenv> --all.

In the next post, we will go through installing packages in this empty environment and how to start using it.

P.S.

Even though <myenv> is an empty environment, commands like python, ipython, pip, wheel still work in it.

This is because when conda sees these executables aren’t available in the currently activated environment, it looks in the conda root environment and points to the executables found there.

This can be misleading, e.g.

joe@n24-25bu:~$ source activate <myenv>
(<myenv>) joe@n24-25bu:~$ pip install <mypackage>

installs <mypackage> to the conda root environment and not <myenv>.

The correct way to do this is to first install pip in <myenv>:

conda install --name <myenv> pip

Now, conda list will show pip. Also,

which pip

should point to

/my/path/to/anaconda3/envs/<myenv>/bin/pip rather than /my/path/to/anaconda3/bin/pip like before.

Finally,

pip install <mypackage>.

New post

Even though in practice you should not use the global Python and global environment, for the purposes of explaining how imports work, we will use them.

This is going to use Linux (Ubuntu) as an example. The general ideas should however apply be platform agnostic.

Motivation: Python language is easy, setup, imports, libraries, pip, virtual environments, package managers, open source, dev tools, making your own libraries is not.

On a clean OS, your system if it is a Linux or Mac will have a system Python installation. As already mentioned, in practice you should never use these, but we will refer to them for learning purposes.

Probably not a good to idea to follow certain commands in this tutorial.

The system Python will be somewhere like /usr/bin/python this will point to some interpreter directly or be a symbolic link. In my case, there were two Pythons that came with the system

/usr/bin/python2.7 and usr/bin/python3.5

These are the interpreters.

Generally, what comes below applies to both but for simplicity will just refer to that one for python 3.

My OS came with preinstalled Python libraries. There are two kinds: those from the standard library, and those from third party (from PyPI) which is the Python Package Index.

To see all the preinstalled libraries,

help('modules')

These are found in /usr/lib/python3.5, e.g. in this directory you will see random.py. (the standad library ones)

The other libraries, the third party ones, are found in usr/lib/python3/dist-packages and can see there are folders like requests, bs4.

Using the import statement

You will see that in Python, you just do import random without any file paths or extensions, and it just works, e.g

usr/bin/python3.5

then import random just works, as does import requests.

But how? The key is the module search path (MSP).

This is set each time you start a shell session or run a script.

The exact details of how this is set varies, but basically there are a few ordered key components

the home path (if you are running a script, it is the directory containing the script, if you are in an interactive shell, it is the current working directory)
the environment variable pythonpath
the standard library folder
the place where you pip install goes to (more on this later)
the third party library folder

You can see the MSP

/usr/bin/python3.5 -m site run from /home/jim

where I have removed certain elements that are less relevant

sys.path = [
    '/home/jim',  # changes if you change the dir from where you ran cmd
    '/usr/lib/python3.5',  # std lib preinstalled
    '/usr/local/lib/python3.5/dist-packages',  # where your dls go
    '/usr/lib/python3/dist-packages',  # third party lib preisntalled
]

you can also play with sys.path it is just a list, e.g. you can do

import sys

sys.path = []
import random  # ImportError: No module named 'random'

but dont worry next time you start a session run a script the value of sys.path will go back to its default.

So the libraries that come with whether they be standard library or third party, just work.

But as you do more Python, you will inevitably want to import your own files, and import other third party libraries.

First, let’s just talk about your own files.

File imports vs package imports

File imports

import_examples/
└── dir0
    ├── a.py
    ├── b.py
    └── dir1
        └── c.py

with the contents as follows

# a.py
import b

print(b.x)

# b.py
x = 'hello'

# c.py
y = 'bye'

running /usr/bin/python3.5 import_examples/dir0/a.py

works outputting

hello because when we do import b in a.py, the MSP contains as the first element in the list is home which is the directory containing the script being run, i.e. /path/to/import_examples/dir0.

So it looks in that directory, sees that there is b.py and imports that.

What if instead we did in a.py

import b
import c

print(b.x)
print(c.y)

you will get this error

Traceback (most recent call last):
  File "import_examples/dir0/a.py", line 2, in <module>
    import c
ImportError: No module named 'c'

which makes sense, it goes through the MSP, and looks for c.py, and as c.py is in /path/to/import_examples/dir0/dir1 which is not in the MSP, it errors.

How to get round this? This is a very simple thing we want to do. You can add /path/to/import_examples/dir0/dir1 to your PYTHONPATH environment variable. Or add it to sys.path in your script

import b
import sys
sys.path.append('/path/to/import_examples/dir0/dir1')
import c

print(b.x)
print(c.y)

and now it works.

Upshot: you won’t get very far with these without fiddling around a lot with PYTHONPATH or your sys.path directly. Would not recommend this approach.

Better solution is below

Use package imports.

A package in this context (we’ll see a different definition later on ) is a directory of Python files. Before we were importing a single file. Now we are going to import a directory. In order to do this, each directory in the parent directory (and the parent directory as well) needs to have an __init__.py file in it. So dir0 is our parent directory containing our Python code, so we place these files in dir0 and dir1.

import_examples/
└── dir0
    ├── a.py
    ├── b.py
    ├── dir1
    │   ├── c.py
    │   └── __init__.py
    └── __init__.py

and with our new

import b
import dir1.c

print(b.x)
print(dir1.c.y)

it works. This is known as an absolute package import as we are starting from our home directory path which is in MSP, and appending to it the absolute path dir1.c i.e. /path/to/import_examples/dir0 which is the home path (the directory containing a.py) and to this we are appending dir1.c which maps to dir1/c.py to get /path/to/import_example/dir0/dir1/c.py which is why it works.

Now let’s look at relative package imports. Suppose we add a new file d.py and change c.py so that

import_examples/
└── dir0
    ├── a.py
    ├── b.py
    ├── dir1
    │   ├── c.py
    │   ├── d.py
    │   └── __init__.py
    └── __init__.py

and

# d.py

z = 'ciao'

# c.py
from . import d

y = d.z

which gives when running /usr/bin/python3.5 import_examples/dir0/a.py

hello
ciao

from . import d means we no longer use the usual MSP. Instead, whenever we start with from followed by a . this is called a relative package import.

The absolute package import version of the above is

# c.py
import dir1.d

y = dir1.d.z

What if we moved d.py into a another directory dir2 i.e.

import_examples/
└── dir0
    ├── a.py
    ├── b.py
    ├── dir1
    │   ├── c.py
    │   ├── dir2
    │   │   ├── d.py
    │   │   └── __init__.py
    │   └── __init__.py
    └── __init__.py

Now, we update so that

# c.py
from .dir2 import d

y = d.z

or the absolute form

# c.py
import dir1.dir2.d

y = dir1.dir2.d.z

Attention! You might think that for the relative package import above,

# c.py
from .dir2 import d

y = d.z

we could replace it with

# c.py
from . import dir2.d

y = d.z

# c.py
from . import dir2

y = dir2.d.z

both of which don’t work because the dir2 object does not have an attribute d (the chaining only works on directories, e.g. dir1.dir2, it does not extend to files).

But it is a simple fix, by adding a line to __init__.py in dir2,

# dir2/__init__.py

from . import d

but

# c.py
from . import dir2.d

y = d.z

does not work

from . import dir2.d
                      ^
SyntaxError: invalid syntax

with this new way of writing with the init, the absolute package import still works just as before

# c.py
import dir1.dir2.d

y = dir1.dir2.d.z

However, what might be surprising is if we take a.py and try and do a relative package import like in the below

# c.py
from .dir2 import d

y = d.z

i.e.

# a.py
import b
from .dir1 import c

print(b.x)
print(c.y)

it won’t work

  File "import_examples/dir0/a.py", line 3, in <module>
    from .dir1 import c
SystemError: Parent module '' not loaded, cannot perform relative import

as per https://stackoverflow.com/questions/33837717/systemerror-parent-module-not-loaded-cannot-perform-relative-import

package imports: absolute vs relative imports

You might have noticed that random.py is a file but in usr/lib/python3/dist-packages although some were .py files, most were either directories or files or directories ending in .egg-info. We will igore this for now, but just accept that Python import works on both types, files and directories.

Software distribtion package vs Python package