Changes between Version 2 and Version 3 of YumBenchmarks

Show
Ignore:
Author:
james (IP: 65.172.155.230)
Timestamp:
06/04/10 21:11:21 (8 years ago)
Comment:

Add explainations of overhead

Legend:

Unmodified
Added
Removed
Modified
  • YumBenchmarks

    v2 v3  
    4848 
    4949 
    50 == Comparison between latest, and rpm == 
     50== Comparison between latest "yum list", repoquery --installed and rpm == 
     51 
     52Yum is never going to be **faster** than rpm, because we'll always be doing a bit more work. We can try to make the difference as small as possible though. This information tries to show how close we are, and where the extra time goes. 
     53 
     54We don't want to measure gnome-terminal etc. time, so for -qa we'll pipe everything to "wc -l". 
     55 
     56=== rpm === 
     57 
     58Most people look at the obvious: 
     59 
     60{{{ 
     61% time rpm -qa | wc -l 
     621886 
     63rpm -qa  2.16s user 0.10s system 96% cpu 2.337 total 
     64% time rpm -q yum 
     65yum-0:3.2.27-13.fc14.noarch 
     66rpm -q yum  0.01s user 0.02s system 92% cpu 0.034 total 
     67}}} 
     68 
     69...and assume that rpm is broken when implementing -qa. The truth is that rpm does a lot of unnecessary checking for a simple list of packages. Turning that off we get: 
     70 
     71{{{ 
     72% time rpm --nodigest --nosignature -qa | wc -l 
     731886 
     74rpm --nodigest --nosignature -qa  0.25s user 0.09s system 93% cpu 0.362 total 
     75}}} 
     76 
     77...which is as good as yum is ever going to get (-q is not significantly affected). 
     78 
     79=== rpm-python === 
     80 
     81Obviously running python instead of C has **some** overhead, as do the conversions needed for the rpm-python bindings. But they aren't that big: 
     82 
     83{{{ 
     84% cat /tmp/r-p.py 
     85#! /usr/bin/python -tt 
     86 
     87import rpm 
     88import sys 
     89 
     90ts = rpm.TransactionSet() 
     91ts.setVSFlags((rpm._RPMVSF_NOSIGNATURES|rpm._RPMVSF_NODIGESTS)) 
     92 
     93if len(sys.argv) > 1: 
     94    mi = ts.dbMatch('name', sys.argv[1]) 
     95else: 
     96    mi = ts.dbMatch() 
     97 
     98for hdr in mi: 
     99    print '%s-%s:%s-%s.%s' % (hdr['name'], 
     100                              hdr['epochnum'], hdr['version'], hdr['release'], 
     101                              hdr['arch']) 
     102% time python /tmp/r-p.py | wc -l 
     1031886 
     104python /tmp/r-p.py  0.24s user 0.10s system 94% cpu 0.357 total 
     105% time python /tmp/r-p.py yum 
     106yum-0:3.2.27-13.fc14.noarch 
     107python /tmp/r-p.py yum  0.04s user 0.03s system 95% cpu 0.070 total 
     108}}} 
     109 
     110...as you can see running -qa is actually fractionally faster from the bindings. 
     111 
     112== Somewhat more realistic rpm-python == 
     113 
     114Obviously the above rpm-python code isn't really comparable with yum, as yum is an API and you can do many things with it. We can get somewhat closer by creating python "class" objects for each package in the rpmdb, sort them all and filter out the gpgkeys. And then print them: 
     115 
     116{{{ 
     117% cat /tmp/r-p-api.py 
     118#! /usr/bin/python -tt 
     119 
     120import rpm 
     121import sys 
     122 
     123ts = rpm.TransactionSet() 
     124ts.setVSFlags((rpm._RPMVSF_NOSIGNATURES|rpm._RPMVSF_NODIGESTS)) 
     125 
     126if len(sys.argv) > 1: 
     127    mi = ts.dbMatch('name', sys.argv[1]) 
     128else: 
     129    mi = ts.dbMatch() 
     130 
     131class NPkg: 
     132    def __init__(self, hdr): 
     133        for key in ('name', 'epochnum', 'version', 'release', 'arch'): 
     134            setattr(self, key, hdr[key]) 
     135    def __str__(self): 
     136        return '%s-%s:%s-%s.%s' % (self.name, 
     137                                   self.epochnum, self.version, 
     138                                   self.release, 
     139                                   self.arch) 
     140    def __cmp__(self, other): 
     141        return cmp(self.name, other.name) 
     142 
     143for pkg in sorted([NPkg(hdr) for hdr in mi]): 
     144    if pkg.name == 'gpg-pubkey': 
     145        continue 
     146    print pkg 
     147% time python /tmp/r-p-api.py | wc -l 
     1481876 
     149python /tmp/r-p-api.py  0.31s user 0.08s system 96% cpu 0.410 total 
     150% time python /tmp/r-p-api.py yum 
     151yum-0:3.2.27-13.fc14.noarch 
     152python /tmp/r-p-api.py yum  0.03s user 0.03s system 95% cpu 0.069 total 
     153}}} 
     154 
     155...as you can see, we are still very close to direct rpm time and we are starting to "do something useful". 
     156 
     157=== yum API === 
     158 
     159Yum provides a non-trivial API, hence all the programs that use it, however that causes some overhead with current python implementations: 
     160 
     161{{{ 
     162% time sudo python -c 'import yum' 
     163sudo python -c 'import yum'  0.12s user 0.13s system 95% cpu 0.262 total 
     164% time sudo python -c 'import yum; yum.YumBase()' 
     165sudo python -c 'import yum; yum.YumBase()'  0.12s user 0.12s system 93% cpu 0.265 total 
     166% time sudo python -c 'import yum; yum.YumBase().conf' 
     167Loaded plugins: presto 
     168sudo python -c 'import yum; yum.YumBase().conf'  0.17s user 0.15s system 97% cpu 0.328 total 
     169}}} 
     170 
     171...the first version does **nothing** but find the all the modules for the yum API. This is roughly equivalent to linking when compiling C (where it's then basically free at runtime), and is probably the main thing most people are thinking of if they say python is slow. The last version is minimal overhead for using the yum API, as this creates a single object and loads the main yum.conf configuration. 
     172 
     173So if we take the r-p-api code and import yum, and use yum.packages.YumInstalledPackage() in place of NPkg() we get: 
     174 
     175{{{ 
     176% time ./rpm-python-list.py | wc -l 
     1771876 
     178./rpm-python-list.py  0.52s user 0.28s system 96% cpu 0.830 total 
     179wc -l  0.00s user 0.00s system 0% cpu 0.829 total 
     180% time ./rpm-python-list.py yum 
     181yum-3.2.27-13.fc14.noarch 
     182./rpm-python-list.py yum  0.12s user 0.12s system 89% cpu 0.267 total 
     183}}} 
     184 
     185...however not all of the package data is stored in rpm now, so let's setup yumdb and get "from_repo" for the packages: 
     186 
     187{{{ 
     188% time ./rpm-python-list.py | wc -l 
     1891876 
     190./rpm-python-list.py  0.68s user 0.47s system 97% cpu 1.184 total 
     191% time ./rpm-python-list.py yum 
     192yum-3.2.27-13.fc14.noarch @rawhide 
     193./rpm-python-list.py yum  0.12s user 0.12s system 90% cpu 0.271 total 
     194}}} 
     195 
     196...combining those gives us: 
     197 
     198{{{ 
     199% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s\n" % pkg) \ 
     200                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages()))' | wc -l 
     2011877 
     202sudo python -c   0.69s user 0.24s system 96% cpu 0.964 total 
     203% time sudo python -c 'import sys,yum; list(sys.stdout.write("%s %s\n" % (pkg, pkg.ui_from_repo)) \ 
     204                                            for pkg in sorted(yum.YumBase().rpmdb.returnPackages()))' | wc -l 
     2051877 
     206sudo python -c   0.83s user 0.61s system 97% cpu 1.469 total 
     207}}} 
     208 
     209...this is basically the minimal limit for yum API users. 
     210 
     211=== repoquery === 
     212 
     213repoquery is a real command, and has it's own argument parsing etc. So while the overhead is not noise, it's not terrible: 
     214 
     215{{{ 
     216% time sudo repoquery --installed -qa | wc -l 
     2171876 
     218sudo repoquery --installed -qa  0.91s user 0.23s system 98% cpu 1.147 total 
     219% time sudo repoquery --installed -qa --qf '%{nevra} %{ui_from_repo}' | wc -l 
     2201876 
     221sudo repoquery --installed -qa --qf '%{nevra} %{ui_from_repo}'  1.16s user 0.60s system 98% cpu 1.795 total 
     222% time sudo repoquery --installed -q yum          
     223yum-0:3.2.27-13.fc14.noarch 
     224sudo repoquery --installed -q yum  0.25s user 0.15s system 97% cpu 0.417 total 
     225}}} 
     226 
     227=== yum === 
     228 
     229yum does a **lot** more work, including nicer layout and looking to see if packages have updates/etc. this affects huge output like "yum list installed" a lot. So there is significant overhead over plain repoquery: 
     230 
     231{{{ 
     232% time sudo yum list installed | wc -l 
     2331892 
     234sudo yum list installed  1.04s user 0.61s system 97% cpu 1.696 total 
     235wc -l  0.00s user 0.00s system 0% cpu 1.695 total 
     236% time sudo yum list installed yum 
     237Loaded plugins: aliases, keys, noop, presto, security, tmprepo, ttysz, verify 
     238Installed Packages 
     239yum.noarch                        3.2.27-13.fc14                        @rawhide 
     240sudo yum list installed yum  0.34s user 0.28s system 97% cpu 0.642 total 
     241% time sudo yum list installed yum --color=off 
     242Loaded plugins: aliases, keys, noop, presto, security, tmprepo, ttysz, verify 
     243Installed Packages 
     244yum.noarch                        3.2.27-13.fc14                        @rawhide 
     245sudo yum list installed yum --color=off  0.18s user 0.17s system 96% cpu 0.361 total 
     246}}} 
     247 
     248== summary ==