Version 5 (modified by skvidal, 8 years ago)

Ideas for New Repodata

problems and goals

The current metadata has helped make searching the information much more reasonable - however, it suffers from the problem that as the repos grow in number of pkgs users are forced to download massively large chunks of data they, in fact, do not NEED.

One of the goals of this discussion is to see if we can strike a happy medium between huge chunks of data and searchability.

filelists - break them up

filelists broken out by paths so you don't have to download a huge glop of the complete filelists just to find out who owns /foo/baz

possible ways to break up the files - info from rawhide as of june 2010:

2.4 million files total in pkgs in rawhide
2.3 million of those are in /usr
1.8 million of those are /usr/share
  Top 3 dirs by file count under /usr/share:
   533046 /usr/share/doc
   120555 /usr/share/javadoc
   105591 /usr/share/icons
45 file-requires requiring something in /usr/share
none of those file-requires are in the top 3 /usr/share dirs
    - most of them are fonts.

so in general we're downloading 75% more files for a filereq check than we'll EVER need.

So if we break the files up by 'top level dir' + 3 layers deep for /usr/*/*/ then we'll have reduced the number unnecessary file-list downloads by about 75%

complete repodata per-pkg

  • in a file or a directory of files so I can grab all of a certain kind of metadata for ONLY one pkg


- provide a way to break out summary/description into a structure that supports translations.

potential structure

And some more specific ideas:


repomd.xml <-- same as before - the index for everything else - but making sure not to use any of the existing data type attributes

packagelist.sqlite contains:

           name, arch, epoch-ver-rel, 
           summary, description, 
           size (package, installed), 
           location (baseurl, mirrorlist(optional), href), 
           header byte-ranges?(maybe), 
           location-style-path to per-pkg metadata and checksum.



provides.sqlite <-- provides: providename + flags + evr requires.sqlite <-- requires: requiresname + flags + evr + prereq conflicts_obsoletes.sqlite <-- conflicts and obsoletes (name, flags, evr) files_by_path.xml <-- index file to point to the files-by-path

path_it_holds + filename + checksum, per file