Version 4 (modified by skvidal, 8 years ago)

Ideas for New Repodata

filelists - break them up

filelists broken out by paths so you don't have to download a huge glop of the complete filelists just to find out who owns /foo/baz

possible ways to break up the files - info from rawhide as of june 2010:

2.4 million files total in pkgs in rawhide
2.3 million of those are in /usr
1.8 million of those are /usr/share
  Top 3 dirs by file count under /usr/share:
   533046 /usr/share/doc
   120555 /usr/share/javadoc
   105591 /usr/share/icons
45 file-requires requiring something in /usr/share
none of those file-requires are in the top 3 /usr/share dirs
    - most of them are fonts.

so in general we're downloading 75% more files for a filereq check than we'll EVER need.

So if we break the files up by 'top level dir' + 3 layers deep for /usr/*/*/ then we'll have reduced the number unnecessary file-list downloads by about 75%

complete repodata per-pkg

  • in a file or a directory of files so I can grab all of a certain kind of metadata for ONLY one pkg


- provide a way to break out summary/description into a structure that supports translations.

potential structure

And some more specific ideas:


repomd.xml <-- same as before - the index for everything else - but making sure not to use any of the existing data type attributes

packagelist.sqlite contains:

           name, arch, epoch-ver-rel, 
           summary, description, 
           size (package, installed), 
           location (baseurl, mirrorlist(optional), href), 
           header byte-ranges?(maybe), 
           location-style-path to per-pkg metadata and checksum.



provides.sqlite <-- provides: providename + flags + evr requires.sqlite <-- requires: requiresname + flags + evr + prereq conflicts_obsoletes.sqlite <-- conflicts and obsoletes (name, flags, evr) files_by_path.xml <-- index file to point to the files-by-path

path_it_holds + filename + checksum, per file