Performance for yum

This is a page documenting benchmark results within yum, over various releases. Also some comparison to rpm/etc. There is no comparison to other package managers, because that's "much harder" and not that useful.

Quick summary for all versions

Note that this has been tested at somewhat random times, and features have been added so these aren't 1-1 comparable for what they are doing. Also sqlite/python changes have made a difference, as has the size of the repo metadata (as Fedora gets bigger). Also the same machine did not run 3.0.1 and 3.2.27+, so there are some HW differences (but that doesn't make a big difference for these things). Also the "big lie" with benchmarks like this, for all packaging systems, is that we are operating on at least 200MB of data on disk so if an IO has to be done (first boot, or just not in page cache) the numbers are drastically different.

op3.0.1 (2006-11-01)3.2.1 (2007-06-21)3.2.8+ (2007-12-03)3.2.16+ (2008-02-27)3.2.24+ (2009-09-28)3.2.27+ (2010-06-04)
noop0.30.44.50.90.60.4
noop-tsn/an/an/an/a0.60.6
list ustr20.45.86.31.810.6
search ustr23.3+-74.35.31.81.31.2
search python18.2+-29.89.22.32.52.5
list python\*28 +-666.41.91.41.1

Comparison between latest "yum list", repoquery --installed and rpm

Yum is never going to be faster than rpm, because we'll always be doing a bit more work. We can try to make the difference as small as possible though. This information tries to show how close we are, and where the extra time goes.

We don't want to measure gnome-terminal etc. time, so for -qa we'll pipe everything to "wc -l".

rpm

Most people look at the obvious:

% time rpm -qa | wc -l
1886
rpm -qa  2.16s user 0.10s system 96% cpu 2.337 total
% time rpm -q yum
yum-0:3.2.27-13.fc14.noarch
rpm -q yum  0.01s user 0.02s system 92% cpu 0.034 total

...and assume that rpm is broken when implementing -qa. The truth is that rpm does a lot of unnecessary checking for a simple list of packages. Turning that off we get:

% time rpm --nodigest --nosignature -qa | wc -l
1886
rpm --nodigest --nosignature -qa  0.25s user 0.09s system 93% cpu 0.362 total

...which is as good as yum is ever going to get (-q is not significantly affected).

rpm-python

Obviously running python instead of C has some overhead, as do the conversions needed for the rpm-python bindings. But they aren't that big:

% cat /tmp/r-p.py
#! /usr/bin/python -tt

import rpm
import sys

ts = rpm.TransactionSet()
ts.setVSFlags((rpm._RPMVSF_NOSIGNATURES|rpm._RPMVSF_NODIGESTS))

if len(sys.argv) > 1:
    mi = ts.dbMatch('name', sys.argv[1])
else:
    mi = ts.dbMatch()

for hdr in mi:
    print '%s-%s:%s-%s.%s' % (hdr['name'],
                              hdr['epochnum'], hdr['version'], hdr['release'],
                              hdr['arch'])
% time python /tmp/r-p.py | wc -l
1886
python /tmp/r-p.py  0.24s user 0.10s system 94% cpu 0.357 total
% time python /tmp/r-p.py yum
yum-0:3.2.27-13.fc14.noarch
python /tmp/r-p.py yum  0.04s user 0.03s system 95% cpu 0.070 total

...as you can see running -qa is actually fractionally faster from the bindings.

Somewhat more realistic rpm-python

Obviously the above rpm-python code isn't really comparable with yum, as yum is an API and you can do many things with it. We can get somewhat closer by creating python "class" objects for each package in the rpmdb, sort them all and filter out the gpgkeys. And then print them:

% cat /tmp/r-p-api.py
#! /usr/bin/python -tt

import rpm
import sys

ts = rpm.TransactionSet()
ts.setVSFlags((rpm._RPMVSF_NOSIGNATURES|rpm._RPMVSF_NODIGESTS))

if len(sys.argv) > 1:
    mi = ts.dbMatch('name', sys.argv[1])
else:
    mi = ts.dbMatch()

class NPkg:
    def __init__(self, hdr):
        for key in ('name', 'epochnum', 'version', 'release', 'arch'):
            setattr(self, key, hdr[key])
    def __str__(self):
        return '%s-%s:%s-%s.%s' % (self.name,
                                   self.epochnum, self.version,
                                   self.release,
                                   self.arch)
    def __cmp__(self, other):
        return cmp(self.name, other.name)

for pkg in sorted([NPkg(hdr) for hdr in mi]):
    if pkg.name == 'gpg-pubkey':
        continue
    print pkg
% time python /tmp/r-p-api.py | wc -l
1876
python /tmp/r-p-api.py  0.31s user 0.08s system 96% cpu 0.410 total
% time python /tmp/r-p-api.py yum
yum-0:3.2.27-13.fc14.noarch
python /tmp/r-p-api.py yum  0.03s user 0.03s system 95% cpu 0.069 total

...as you can see, we are still very close to direct rpm time and we are starting to "do something useful".

yum API

Yum provides a non-trivial API, hence all the programs that use it, however that causes some overhead with current python implementations:

% time sudo python -c 'import yum'
sudo python -c 'import yum'  0.12s user 0.13s system 95% cpu 0.262 total
% time sudo python -c 'import yum; yum.YumBase()'
sudo python -c 'import yum; yum.YumBase()'  0.12s user 0.12s system 93% cpu 0.265 total
% time sudo python -c 'import yum; yum.YumBase().conf'
Loaded plugins: presto
sudo python -c 'import yum; yum.YumBase().conf'  0.17s user 0.15s system 97% cpu 0.328 total

...the first version does **nothing** but find the all the modules for the yum API. This is roughly equivalent to linking when compiling C (where it's then basically free at runtime), and is probably the main thing most people are thinking of if they say python is slow. The last version is minimal overhead for using the yum API, as this creates a single object and loads the main yum.conf configuration.

So if we take the r-p-api code and import yum, and use yum.packages.YumInstalledPackage?() in place of NPkg() we get:

% time ./rpm-python-list.py | wc -l
1876
./rpm-python-list.py  0.52s user 0.28s system 96% cpu 0.830 total
wc -l  0.00s user 0.00s system 0% cpu 0.829 total
% time ./rpm-python-list.py yum
yum-3.2.27-13.fc14.noarch
./rpm-python-list.py yum  0.12s user 0.12s system 89% cpu 0.267 total

...however not all of the package data is stored in rpm now, so let's setup yumdb and get "from_repo" for the packages:

% time ./rpm-python-list.py | wc -l
1876
./rpm-python-list.py  0.68s user 0.47s system 97% cpu 1.184 total
% time ./rpm-python-list.py yum
yum-3.2.27-13.fc14.noarch @rawhide
./rpm-python-list.py yum  0.12s user 0.12s system 90% cpu 0.271 total

...combining those gives us:

% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s\n" % pkg) \
                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages()))' | wc -l
1877
sudo python -c   0.69s user 0.24s system 96% cpu 0.964 total
% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s\n" % pkg) \
                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages(patterns=["yum"])))'
Loaded plugins: presto
yum-3.2.27-13.fc14.noarch
sudo python -c   0.18s user 0.14s system 95% cpu 0.335 total
% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s %s\n" % (pkg, pkg.ui_from_repo)) \
                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages()))' | wc -l
1877
sudo python -c   0.83s user 0.61s system 97% cpu 1.469 total
% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s %s\n" % (pkg, pkg.ui_from_repo)) \
                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages(patterns=["yum"])))'
Loaded plugins: presto
yum-3.2.27-13.fc14.noarch @rawhide
sudo python -c   0.16s user 0.16s system 94% cpu 0.337 total

...this is basically the minimal limit for yum API users.

repoquery

repoquery is a real command, and has it's own argument parsing etc. So while the overhead is not noise, it's not terrible:

% time sudo repoquery --installed -qa | wc -l
1876
sudo repoquery --installed -qa  0.91s user 0.23s system 98% cpu 1.147 total
% time sudo repoquery --installed -qa --qf '%{nevra} %{ui_from_repo}' | wc -l
1876
sudo repoquery --installed -qa --qf '%{nevra} %{ui_from_repo}'  1.16s user 0.60s system 98% cpu 1.795 total
% time sudo repoquery --installed -q yum         
yum-0:3.2.27-13.fc14.noarch
sudo repoquery --installed -q yum  0.25s user 0.15s system 97% cpu 0.417 total
% time sudo repoquery --installed -q yum --qf '%{nevra} %{ui_from_repo}'
yum-0:3.2.27-13.fc14.noarch @rawhide
sudo repoquery --installed -q yum --qf '%{nevra} %{ui_from_repo}'  0.25s user 0.15s system 96% cpu 0.421 total

yum

yum does a **lot** more work, including nicer layout and looking to see if packages have updates/etc. this affects huge output like "yum list installed" a lot. So there is significant overhead over plain repoquery:

% time sudo yum list installed | wc -l
1892
sudo yum list installed  1.04s user 0.61s system 97% cpu 1.696 total
wc -l  0.00s user 0.00s system 0% cpu 1.695 total
% time sudo yum list installed yum
Loaded plugins: aliases, keys, noop, presto, security, tmprepo, ttysz, verify
Installed Packages
yum.noarch                        3.2.27-13.fc14                        @rawhide
sudo yum list installed yum  0.34s user 0.28s system 97% cpu 0.642 total
% time sudo yum list installed yum --color=off
Loaded plugins: aliases, keys, noop, presto, security, tmprepo, ttysz, verify
Installed Packages
yum.noarch                        3.2.27-13.fc14                        @rawhide
sudo yum list installed yum --color=off  0.18s user 0.17s system 96% cpu 0.361 total

summary of comparison for -qa

command for all packagestime
rpm -qa2.337
rpm --nodigest --nosignature -qa0.362
minimal-rpm-python0.357
basic-rpm-python0.410
yum-pkg0.830
yum-API0.964
yumdb-pkg1.184
yum-API+yumdb1.469
repoquery1.147
repoquery+yumdb1.795
yum list installed1.696

summary of comparison for -q yum

command to query yum (+single yumdb info)time
rpm -q yum0.034
minimal rpm-python yum0.070
basic-rpm-python yum0.069
yum-pkg yum0.267
yumdb-pkg yum0.271
yum-API yum0.335
yum-API+yumdb yum0.337
yum list installed yum --color=off0.361
repoquery yum0.417
repoquery yum+yumdb0.421
yum list installed yum0.642

Comparison between "yum install/remove", rpm

For a long time yum had been "much worse" than rpm for install/remove commands. Some of that was due to general fixes, but it also generally has got less love because any difference in speed was only really noticeable with small numbers of small packages (otherwise the I/O from the install/remove vastly outweighs any other part of the operation).

And again, yum has to do a non-trivial amount more work than plain rpm, as yum keeps a detailed history log and it's own yumdb.

These timings were performed on a virt-machine (against the "zint" package -- ~250k, runs ldconfig in %post and %postun):

commandversionoperationtime (seconds)
rpm4.8.1-2install2.0
rpm4.8.1-2remove1.5
yum3.2.25-1install7.0
yum3.2.25-1remove5.5
yum3.2.28-3install6.5
yum3.2.28-3remove4.0
yum3.2.28-8install4.0
yum3.2.28-8remove3.0
yum3.2.28-8localinstall3.2
yum (no history)3.2.28-8install3.5
yum (no history)3.2.28-8remove2.4
yum (no history)3.2.28-8localinstall2.8
bare-rpm-python-erase.py0remove1.0
echo no | yum3.2.28-3install2.5
echo no | yum3.2.28-3remove1.5
echo no | yum3.2.28-8install1.5
echo no | yum3.2.28-8remove0.5
echo no | yum3.2.28-8localinstall0.6

...note that for all the operations except the "echo no" variants, there could be a non-trivial delay due to IO stalls.