Conda 101
In previous posts, we mentioned parts of the Python ecosystem such as pip (package manager) and virtualenv (environment manager).
In this post, we are going to introduce conda, a powerful system that combines the above two functionalities.
Preliminaries
The core Python language contains various features, e.g. iterables, that can be used to say, display the integers from one to ten:
Now suppose you wanted to calculate the square root of each of these integers.
The core Python language cannot do this, you have to import the math
module
from the Python standard library
(the standard library link above lists the modules it contains).
Now suppose you wanted to send an HTTP request (or achieve some other common programming task). You could do this using only the core Python language and standard library. However, typically, it is easier and more robust to use a third-party package, e.g. requests
There are hundreds and thousands of such packages (some judgement is required to discern which ones are worth using, but usually not much).
The packages take the form of online repositories hosted on cloud platforms, e.g. Python Package Index (PyPI), Anaconda Cloud, and come with package managers, e.g. pip and conda.
pip is the package manager for packages hosted on PyPI. Anaconda Cloud supports multiple package managers. However, the most common two are conda and pip.
Initial setup
On most Unix-like systems, Python and its standard library come pre-installed. Depending on the OS, some third-party packages may also come pre-installed.
To get a better idea of what conda is, let’s first consider what you get with a clean Ubuntu install:
- package manager - none
- environment - one (global)
The global environment contains usually two Python installations:
- Python 2.x.x -
/usr/bin/python
- Python 3.x.x -
/usr/bin/python3
in the form of binary executables.
Each is a Python interpreter and a standalone piece of software with its own
functionality, e.g. /usr/bin/python --help
, and usually run via the command
python
or python3
, which can be confirmed with which python
or which
python3
.
To list all pre-installed modules for say Python 3.x.x, python3
then
To list those part of the standard library, ls /usr/lib/python3.5
,
and to list the third-party ones, ls /usr/lib/python3/dist-packages
.
Any of the modules above will ‘work out of the box’, i.e.
allows you to start using that module’s functionality straight away. This is
because
/usr/lib/python3.5
and /usr/lib/python3/dist-packages
are in Python’s
module search path which is one of the variables Python checks when
it sees the import
keyword.
To verify they are in the module search path,
Installing conda
Follow the conda installation steps. Amongst other things, this will:
- prepend
/my/path/to/anaconda3/bin
toPATH
- install in
/my/path/to/anaconda3/bin
software binaries that come ‘for free’ with conda, e.g. python, ipython, conda-build, conda-env, etc. (this meanspython
now points tomy/path/to/anaconda3/bin/python
,which python
to confirm). - install in
/my/path/to/anaconda3/lib/python3.6
the Python standard library modules - install in
/my/path/to/anaconda3/lib/python3.6/site-packages
default third-party modules
Basic conda usage
Firstly, let’s evaluate what installing conda has done in terms of our clean Ubuntu install.
Before installing conda, we had the following situation:
- package manager - none
- environment - one (global)
Now, after installation:
- package manager - conda
- environment - one (conda
root
)
In some senses, the conda root
environment has usurped the previous
global
environment. But fear not, it has done so without interfering with any of
the previous Python-related setup and configuration, e.g.
/usr/bin/python3
works just as before.
However, if you’re used to virtualenv, you might expect to see something like
(root) joe@n24-25bu:~/Documents
rather than
joe@n24-25bu:~/Documents
Nonetheless, you can confirm you are in the conda root
environment with conda
info --envs
# conda environments:
#
root * /my/path/to/anaconda3
where the asterisk indicates the root
environment is currently activated.
Further, conda list
brings up the pre-installed standard library and
third-party modules in /my/path/to/anaconda3/lib/python3.6
as well as the
Python interpreter in /my/path/to/anaconda3/bin
.
As a rough rule of thumb, the conda root
environment should not be used for
anything.
Thus the next thing to do (and whenever you start a new project) is create a new environment
conda create --name <myenv>
You can check it is there with conda info --env
.
This new environment <myenv>
is completely empty. If you run source
activate <myenv>
then conda list
, you should see
# packages in environment at /my/path/to/anaconda3/envs/<myenv>:
#
To delete an environment, conda remove --name <myenv> --all
.
In the next post, we will go through installing packages in this empty environment and how to start using it.
P.S.
Even though <myenv>
is an empty environment, commands like python
,
ipython
, pip
, wheel
still work in it.
This is because when conda
sees these executables aren’t available in the currently activated
environment, it looks in the conda root
environment and points to the
executables found there.
This can be misleading, e.g.
joe@n24-25bu:~$ source activate <myenv>
(<myenv>) joe@n24-25bu:~$ pip install <mypackage>
installs <mypackage>
to the conda root
environment and not <myenv>
.
The correct way to do this is to first install pip
in <myenv>
:
conda install --name <myenv> pip
Now, conda list
will show pip. Also,
which pip
should point to
/my/path/to/anaconda3/envs/<myenv>/bin/pip
rather than
/my/path/to/anaconda3/bin/pip
like before.
Finally,
pip install <mypackage>
.
New post
Even though in practice you should not use the global Python and global environment, for the purposes of explaining how imports work, we will use them.
This is going to use Linux (Ubuntu) as an example. The general ideas should however apply be platform agnostic.
Motivation: Python language is easy, setup, imports, libraries, pip, virtual environments, package managers, open source, dev tools, making your own libraries is not.
On a clean OS, your system if it is a Linux or Mac will have a system Python installation. As already mentioned, in practice you should never use these, but we will refer to them for learning purposes.
Probably not a good to idea to follow certain commands in this tutorial.
The system Python will be somewhere like /usr/bin/python
this will point
to some interpreter directly or be a symbolic link. In my case, there
were two Pythons that came with the system
/usr/bin/python2.7
and usr/bin/python3.5
These are the interpreters.
Generally, what comes below applies to both but for simplicity will just refer to that one for python 3.
My OS came with preinstalled Python libraries. There are two kinds: those from the standard library, and those from third party (from PyPI) which is the Python Package Index.
To see all the preinstalled libraries,
help('modules')
These are found in /usr/lib/python3.5
, e.g. in this directory you will
see random.py
. (the standad library ones)
The other libraries, the third party ones, are found in
usr/lib/python3/dist-packages
and can see there are folders like
requests
, bs4
.
Using the import
statement
You will see that in Python, you just do import random
without any
file paths or extensions, and it just works, e.g
usr/bin/python3.5
then import random
just works, as does import requests
.
But how? The key is the module search path (MSP).
This is set each time you start a shell session or run a script.
The exact details of how this is set varies, but basically there are a few ordered key components
- the home path (if you are running a script, it is the directory containing the script, if you are in an interactive shell, it is the current working directory)
- the environment variable pythonpath
- the standard library folder
- the place where you
pip install
goes to (more on this later) - the third party library folder
You can see the MSP
/usr/bin/python3.5 -m site
run from /home/jim
where I have removed certain elements that are less relevant
sys.path = [
'/home/jim', # changes if you change the dir from where you ran cmd
'/usr/lib/python3.5', # std lib preinstalled
'/usr/local/lib/python3.5/dist-packages', # where your dls go
'/usr/lib/python3/dist-packages', # third party lib preisntalled
]
you can also play with sys.path
it is just a list, e.g. you can
do
import sys
sys.path = []
import random # ImportError: No module named 'random'
but dont worry next time you start a session run a script the value of
sys.path
will go back to its default.
So the libraries that come with whether they be standard library or third party, just work.
But as you do more Python, you will inevitably want to import your own files, and import other third party libraries.
First, let’s just talk about your own files.
File imports vs package imports
File imports
import_examples/
└── dir0
├── a.py
├── b.py
└── dir1
└── c.py
with the contents as follows
# a.py
import b
print(b.x)
# b.py
x = 'hello'
# c.py
y = 'bye'
running /usr/bin/python3.5 import_examples/dir0/a.py
works outputting
hello
because when we do import b
in a.py
, the MSP contains as the
first element in the list is home which is the directory containing the
script being run, i.e. /path/to/import_examples/dir0
.
So it looks in that directory, sees that there is b.py
and imports that.
What if instead we did in a.py
import b
import c
print(b.x)
print(c.y)
you will get this error
Traceback (most recent call last):
File "import_examples/dir0/a.py", line 2, in <module>
import c
ImportError: No module named 'c'
which makes sense, it goes through the MSP, and looks for c.py
, and as
c.py
is in /path/to/import_examples/dir0/dir1
which is not in the MSP,
it errors.
How to get round this? This is a very simple thing we want to do. You can
add /path/to/import_examples/dir0/dir1
to your PYTHONPATH
environment variable. Or add it to sys.path
in your script
import b
import sys
sys.path.append('/path/to/import_examples/dir0/dir1')
import c
print(b.x)
print(c.y)
and now it works.
Upshot: you won’t get very far with these without fiddling around a lot
with PYTHONPATH
or your sys.path
directly. Would not recommend this
approach.
Better solution is below
Use package imports.
A package in this context (we’ll see a different definition later on
) is a directory of Python files. Before we were importing a single file.
Now we are going to import a directory. In order to do this, each
directory in the parent directory (and the parent directory as well) needs
to have an __init__.py
file in it. So dir0
is our parent directory
containing our Python code, so we place these files in dir0
and dir1
.
import_examples/
└── dir0
├── a.py
├── b.py
├── dir1
│ ├── c.py
│ └── __init__.py
└── __init__.py
and with our new
import b
import dir1.c
print(b.x)
print(dir1.c.y)
it works. This is known as an absolute package import as we are starting
from our home directory path which is in MSP, and appending to it the
absolute path dir1.c
i.e. /path/to/import_examples/dir0
which is
the home path (the directory containing a.py
) and to this we are
appending dir1.c
which maps to dir1/c.py
to get
/path/to/import_example/dir0/dir1/c.py
which is why it works.
Now let’s look at relative package imports. Suppose we add a new file
d.py
and change c.py
so that
import_examples/
└── dir0
├── a.py
├── b.py
├── dir1
│ ├── c.py
│ ├── d.py
│ └── __init__.py
└── __init__.py
and
# d.py
z = 'ciao'
# c.py
from . import d
y = d.z
which gives when running /usr/bin/python3.5 import_examples/dir0/a.py
hello
ciao
from . import d
means we no longer use the usual MSP. Instead, whenever
we start with from
followed by a .
this is called a relative package
import.
The absolute package import version of the above is
# c.py
import dir1.d
y = dir1.d.z
What if we moved d.py
into a another directory dir2
i.e.
import_examples/
└── dir0
├── a.py
├── b.py
├── dir1
│ ├── c.py
│ ├── dir2
│ │ ├── d.py
│ │ └── __init__.py
│ └── __init__.py
└── __init__.py
Now, we update so that
# c.py
from .dir2 import d
y = d.z
or the absolute form
# c.py
import dir1.dir2.d
y = dir1.dir2.d.z
Attention! You might think that for the relative package import above,
# c.py
from .dir2 import d
y = d.z
we could replace it with
# c.py
from . import dir2.d
y = d.z
or
# c.py
from . import dir2
y = dir2.d.z
both of which don’t work because the dir2
object does not have an
attribute d
(the chaining only works on directories,
e.g. dir1.dir2
, it does not extend to files).
But it is a simple fix, by adding a line to __init__.py
in dir2
,
# dir2/__init__.py
from . import d
but
# c.py
from . import dir2.d
y = d.z
does not work
from . import dir2.d
^
SyntaxError: invalid syntax
with this new way of writing with the init, the absolute package import still works just as before
# c.py
import dir1.dir2.d
y = dir1.dir2.d.z
However, what might be surprising is if we take a.py
and try and do
a relative package import like in the below
# c.py
from .dir2 import d
y = d.z
i.e.
# a.py
import b
from .dir1 import c
print(b.x)
print(c.y)
it won’t work
File "import_examples/dir0/a.py", line 3, in <module>
from .dir1 import c
SystemError: Parent module '' not loaded, cannot perform relative import
as per https://stackoverflow.com/questions/33837717/systemerror-parent-module-not-loaded-cannot-perform-relative-import
package imports: absolute vs relative imports
You might have noticed that random.py
is a file but in
usr/lib/python3/dist-packages
although some were .py
files, most were
either directories or files or directories ending in .egg-info
. We will
igore this for now, but just accept that Python import
works on both
types, files and directories.
Software distribtion package vs Python package