~helmut/debian-dedup.git
5 months agomultiarchanalyze: disable transitive m-a:same hints multiarchhints
Helmut Grohne [Sun, 16 Jan 2022 18:19:41 +0000 (19:19 +0100)]
multiarchanalyze: disable transitive m-a:same hints

It is a long-standing disagreement with lintian. Until now, the hinter
emitted hints when some package could plausibly be marked M-A:same even
when dependencies couldn't. lintian complains about that when such a
dependency comes from the same source package. We now follow lintian.
This deletes around 2000 hints and will cause people to have to apply
hints multiple times. For instance your libfoo-dev package will only be
flagged after your libfooN was flagged. So be it.

6 months agomultiarchimport.py: add --sequential option
Helmut Grohne [Fri, 31 Dec 2021 21:02:21 +0000 (22:02 +0100)]
multiarchimport.py: add --sequential option

6 months agomultiarchanalyze.py: speed up yaml dumping
Helmut Grohne [Fri, 31 Dec 2021 17:14:23 +0000 (18:14 +0100)]
multiarchanalyze.py: speed up yaml dumping

6 months agomultiarchimport.py: httpredir.d.o is deprecated
Helmut Grohne [Fri, 31 Dec 2021 16:52:05 +0000 (17:52 +0100)]
multiarchimport.py: httpredir.d.o is deprecated

6 months agomultiarchanalyze.py: make pylint happier
Helmut Grohne [Fri, 31 Dec 2021 14:52:27 +0000 (15:52 +0100)]
multiarchanalyze.py: make pylint happier

pylint does not recognize that the condition ensures left and right to
be defined.

6 months agodrop remaining Python 2.x support
Helmut Grohne [Thu, 30 Dec 2021 17:07:37 +0000 (18:07 +0100)]
drop remaining Python 2.x support

6 months agomultiarchimport.py: log exceptions from worker processes
Helmut Grohne [Thu, 30 Dec 2021 17:02:23 +0000 (18:02 +0100)]
multiarchimport.py: log exceptions from worker processes

6 months agomultiarchimport.py: use dedup.utils.iterate_packages
Helmut Grohne [Fri, 31 Dec 2021 14:49:14 +0000 (15:49 +0100)]
multiarchimport.py: use dedup.utils.iterate_packages

6 months agomultiarchimport.py: decodetarname was dropped in master
Helmut Grohne [Thu, 30 Dec 2021 09:20:37 +0000 (10:20 +0100)]
multiarchimport.py: decodetarname was dropped in master

Fixes: ba840e8913ef ("Merge branch master into branch multiarchhints")

6 months agoMerge branch master into branch multiarchhints
Helmut Grohne [Fri, 31 Dec 2021 14:45:33 +0000 (15:45 +0100)]
Merge branch master into branch multiarchhints

Among other things, this drops Python 2.x support.

6 months agodedup.utils: uninline helper function iterate_packages
Helmut Grohne [Fri, 31 Dec 2021 14:24:01 +0000 (15:24 +0100)]
dedup.utils: uninline helper function iterate_packages

6 months agowebapp.py: consistently close cursors using context managers
Helmut Grohne [Fri, 31 Dec 2021 12:00:29 +0000 (13:00 +0100)]
webapp.py: consistently close cursors using context managers

6 months agoDecompressedStream: improve performance
Helmut Grohne [Thu, 30 Dec 2021 16:52:38 +0000 (17:52 +0100)]
DecompressedStream: improve performance

When the decompression ratio is huge, we may be faced with a large
(multiple megabytes) bytes object. Slicing that object incurs a copy
becomes O(n^2) while appending and trimming a bytearray is much faster.

6 months agomultiarchimport.py: reduce default logging
Helmut Grohne [Wed, 29 Dec 2021 21:43:15 +0000 (22:43 +0100)]
multiarchimport.py: reduce default logging

6 months agomultiarchanalyze.py: fix python3 compatibility
Helmut Grohne [Wed, 29 Dec 2021 21:36:16 +0000 (22:36 +0100)]
multiarchanalyze.py: fix python3 compatibility

.keys() now returns a special object, but show_files really wants
something that provides len() and supports repeated iteration.

6 months agoDecompressedStream: fix endless loop
Helmut Grohne [Wed, 29 Dec 2021 21:14:50 +0000 (22:14 +0100)]
DecompressedStream: fix endless loop

Fixes: 775bdde52ad5 ("DecompressedStream: avoid mixing types for variable data")

6 months agowebapp: avoid changing variable type
Helmut Grohne [Wed, 29 Dec 2021 20:14:38 +0000 (21:14 +0100)]
webapp: avoid changing variable type

Again static type checking is the driver for the change here.

6 months agoautoimport: avoid changing variable type
Helmut Grohne [Wed, 29 Dec 2021 20:05:58 +0000 (21:05 +0100)]
autoimport: avoid changing variable type

knownpkgvers is a dict while knownpkgs is a set. Separating them helps
static type checkers.

6 months agowebapp: speed up encode_and_buffer
Helmut Grohne [Wed, 29 Dec 2021 20:00:04 +0000 (21:00 +0100)]
webapp: speed up encode_and_buffer

We now know that our parameter is a jinja2.environment.TemplateStream.
Enable buffering and accumulate via an io.BytesIO to avoid O(n^2)
append.

6 months agowebapp: improve performance
Helmut Grohne [Wed, 29 Dec 2021 19:56:03 +0000 (20:56 +0100)]
webapp: improve performance

html_response expects a str-generator, but when we call the render
method, we receive a plain str. It can be iterated - one character at a
time. That's what encode_and_buffer will do in this case. So better
stream all the time.

6 months agowebapp: forward compatibility with newer werkzeug
Helmut Grohne [Wed, 29 Dec 2021 19:34:51 +0000 (20:34 +0100)]
webapp: forward compatibility with newer werkzeug

6 months agoautoimport.py: convert to use pathlib
Helmut Grohne [Wed, 29 Dec 2021 17:05:36 +0000 (18:05 +0100)]
autoimport.py: convert to use pathlib

6 months agoimportpkg: fix suprression of boring content
Helmut Grohne [Wed, 29 Dec 2021 14:55:28 +0000 (15:55 +0100)]
importpkg: fix suprression of boring content

The content must be bytes. Passing str silently skips the suppression.

6 months agoDecompressedHash: also gain a name property for consistency
Helmut Grohne [Wed, 29 Dec 2021 14:36:12 +0000 (15:36 +0100)]
DecompressedHash: also gain a name property for consistency

6 months agoImageHash: gain a name property
Helmut Grohne [Wed, 29 Dec 2021 14:24:34 +0000 (15:24 +0100)]
ImageHash: gain a name property

Instead of retroactively attaching a name to an ImageHash, autogenerate
it via a property. Doing so also simplifies static type checking.

6 months agodon't return the first parameter from hash_file
Helmut Grohne [Wed, 29 Dec 2021 14:04:35 +0000 (15:04 +0100)]
don't return the first parameter from hash_file

Returning the object gets us into trouble as to what precisely the
return type is at no benefit.

6 months agodrop unused function sql_add_version_compare
Helmut Grohne [Wed, 29 Dec 2021 13:55:43 +0000 (14:55 +0100)]
drop unused function sql_add_version_compare

6 months agoDecompressedStream: avoid mixing types for variable data
Helmut Grohne [Wed, 29 Dec 2021 12:43:48 +0000 (13:43 +0100)]
DecompressedStream: avoid mixing types for variable data

The local variable data can be bool or bytes. That's inconvenient for
static type checkers. Avoid doing so.

6 months agoDecompressedStream: eliminate redundant closed field
Helmut Grohne [Wed, 29 Dec 2021 11:00:26 +0000 (12:00 +0100)]
DecompressedStream: eliminate redundant closed field

6 months agostop hiding M-A:same conflicts in binNMUed packages
Helmut Grohne [Mon, 27 Dec 2021 12:23:44 +0000 (13:23 +0100)]
stop hiding M-A:same conflicts in binNMUed packages

The issue has been solved by Mattia Rizzolo in dh-strip-nondeterminism
via #999665.

20 months agodrop obsolete python modules
Helmut Grohne [Sun, 25 Oct 2020 15:50:56 +0000 (16:50 +0100)]
drop obsolete python modules

Both lzma and concurrent.futures are now part of the standard library
and solely exist as virtual packages.

20 months agoexternalize ar parsing to arpy
Helmut Grohne [Sun, 25 Oct 2020 09:20:34 +0000 (10:20 +0100)]
externalize ar parsing to arpy

20 months agouse python3-pil instead of removed python3-imaging
Helmut Grohne [Sun, 25 Oct 2020 09:01:37 +0000 (10:01 +0100)]
use python3-pil instead of removed python3-imaging

21 months agofix tuple mismatch
Helmut Grohne [Sun, 6 Sep 2020 11:04:35 +0000 (13:04 +0200)]
fix tuple mismatch

Fixes: e6115dd16b46 ("hide M-A:same conflicts in binNMUed packages")

21 months agohide M-A:same conflicts in binNMUed packages
Helmut Grohne [Thu, 3 Sep 2020 18:19:45 +0000 (20:19 +0200)]
hide M-A:same conflicts in binNMUed packages

binNMUed packages are not currently reproducible, because buildds don't
pass --binNMU-timestamp to sbuild. Thus they use varying
SOURCE_DATE_EPOCH and produce faulty packages. As much as this is a real
bug, it is not actionable by maintainers. Hide such issues for now.

Link: https://salsa.debian.org/perl-team/modules/packages/libtie-hash-indexed-perl/-/merge_requests/1
Link: https://bugs.debian.org/843773
2 years agofix typo in maforeign_library regex
Helmut Grohne [Mon, 17 Feb 2020 06:58:23 +0000 (07:58 +0100)]
fix typo in maforeign_library regex

2 years agodrop support for Python 2.x
Helmut Grohne [Sun, 16 Feb 2020 07:21:20 +0000 (08:21 +0100)]
drop support for Python 2.x

4 years agoadapt to python3-magic/2:0.4.15-1 API
Helmut Grohne [Mon, 25 Jun 2018 19:07:41 +0000 (21:07 +0200)]
adapt to python3-magic/2:0.4.15-1 API

4 years agomultiarchanalyze: give examples when representing arch sets
Helmut Grohne [Sun, 7 Jan 2018 19:01:59 +0000 (20:01 +0100)]
multiarchanalyze: give examples when representing arch sets

Uwe Kleine-König said that knowing example architectures for file
conflicts would be incredibly useful. The old presentation of
architecture sets would collapse sets that are too big to a single
count. This makes it difficult to find any colliding pair. Now, we'll
now give at least two example architectures in addition to the count.

Reported-By: Uwe Kleine-König <ukleinek@debian.org>
4 years agofix logic inversion in package selection
Helmut Grohne [Fri, 5 Jan 2018 15:19:41 +0000 (16:19 +0100)]
fix logic inversion in package selection

We want the package with the highest version, not the lowest.

Reported-By: Uwe Kleine-König <ukleinek@debian.org>
4 years agomultiarchanalyze: opportunistically emit a version when unique
Helmut Grohne [Thu, 21 Dec 2017 05:35:42 +0000 (06:35 +0100)]
multiarchanalyze: opportunistically emit a version when unique

4 years agoadd module dedup.filemagic
Helmut Grohne [Sat, 23 Sep 2017 08:33:43 +0000 (10:33 +0200)]
add module dedup.filemagic

This module is not used anywhere and thus its dependency on
python3-magic is not recorded in the README. It can be used to guess the
file type by looking at the contents using file magic. It is not a
typical hash function, but it can be used for repurposing dedup for
other analysers.

4 years agofix HashBlacklistContent.copy
Helmut Grohne [Wed, 13 Sep 2017 07:04:24 +0000 (09:04 +0200)]
fix HashBlacklistContent.copy

It wasn't copying the stored member and thus could be blacklist "wrong"
content after a copy.

5 years agomultiarchimport: python 3 forward compatibility
Helmut Grohne [Sun, 5 Mar 2017 16:37:36 +0000 (17:37 +0100)]
multiarchimport: python 3 forward compatibility

5 years agomultiarchanalyze: detect some form wrong M-A:foreign
Helmut Grohne [Sat, 4 Mar 2017 07:51:23 +0000 (08:51 +0100)]
multiarchanalyze: detect some form wrong M-A:foreign

When an arch:any package ships a .so file in a public library search
path (e.g. a symlink as many lib*-dev packages do) it most likely
shouldn't be M-A:foreign. A common exception is plugins loaded into
programs, so exclude that case.

Many thanks to Johannes Schauer and Guillem Jover for helping discover
this pattern of Multi-Arch: foreign abuse.

5 years agoautoimport: fix regresion in url computation
Helmut Grohne [Sun, 13 Nov 2016 07:44:58 +0000 (08:44 +0100)]
autoimport: fix regresion in url computation

The list path got inadvertently prepended to all binary package urls.

Fixes: 420804c25797 ("autoimport: improve fetching package lists")

5 years agomultiarchanalyze: make it easily consumable by tracker.d.o
Helmut Grohne [Sun, 7 Aug 2016 05:08:59 +0000 (07:08 +0200)]
multiarchanalyze: make it easily consumable by tracker.d.o

Many thanks to Paul Wise for his detailed feedback on the data format.

5 years agorepository moved
Helmut Grohne [Fri, 29 Jul 2016 15:04:12 +0000 (17:04 +0200)]
repository moved

6 years agomultiarchanalyze: speed up on sqlite3 3.8.7.1
Helmut Grohne [Sun, 12 Jun 2016 05:52:28 +0000 (07:52 +0200)]
multiarchanalyze: speed up on sqlite3 3.8.7.1

Since all users of archdepcandidate run the results through "exists()"
or "group by", "union" vs "union all" does not make any difference to
the results.

On the performance side however, it avoids a b-tree merge getting the
maforeign_candidate query down from hours to seconds.

6 years agoadd a separate tool for generating hints on Multi-Arch headers
Helmut Grohne [Fri, 10 Jun 2016 05:26:12 +0000 (07:26 +0200)]
add a separate tool for generating hints on Multi-Arch headers

It builds on the core functionality of dedup, but uses a different
database schema. Unlike dedup, it aborts downloading Arch:all packages
early and consumes any other architecture in its entirety instead.

6 years agoDecompressedStream: fix decompression without flush
Helmut Grohne [Thu, 9 Jun 2016 20:48:46 +0000 (22:48 +0200)]
DecompressedStream: fix decompression without flush

In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.

6 years agoautoimport: fix hash check
Helmut Grohne [Thu, 9 Jun 2016 20:44:04 +0000 (22:44 +0200)]
autoimport: fix hash check

Fixes: 2f12a6e2f426 ("autoimport: add option to skip hash checking")

6 years agoautoimport: improve fetching package lists
Helmut Grohne [Wed, 25 May 2016 17:27:35 +0000 (19:27 +0200)]
autoimport: improve fetching package lists

Moving the fetching part into dedup.utils. Instead of hard coding the
gzip compressed copy, try xz, gz and plain in that order. Also take care
to actually close the connection.

6 years agouse urlopen from urllib2 on py2
Helmut Grohne [Tue, 24 May 2016 15:50:57 +0000 (17:50 +0200)]
use urlopen from urllib2 on py2

This causes non-successful fetches to result in HTTPErrors like it does
in py3 already.

6 years agomove dedup.debpkg.process_control back into importpkg
Helmut Grohne [Mon, 23 May 2016 19:49:43 +0000 (21:49 +0200)]
move dedup.debpkg.process_control back into importpkg

After all, it isn't that generic. It knows what information is necessary
for running dedup. Thus it really belongs to the extractor subclass.
By building on handle_control_info, not that much parsing logic is left
in the extractor subclass.

6 years agoDebExtractor: implement parsing of control.tar
Helmut Grohne [Mon, 23 May 2016 19:48:15 +0000 (21:48 +0200)]
DebExtractor: implement parsing of control.tar

6 years agoimportpkg: fix --hash broken in previous commit
Helmut Grohne [Mon, 23 May 2016 19:09:38 +0000 (21:09 +0200)]
importpkg: fix --hash broken in previous commit

6 years agoremove curl dependency
Helmut Grohne [Mon, 23 May 2016 19:03:52 +0000 (21:03 +0200)]
remove curl dependency

Teach importpkg how to download urls using urlopen and thus remove the
need for invoking curl.

6 years agoautoimport: add option to skip hash checking
Helmut Grohne [Mon, 23 May 2016 13:33:40 +0000 (15:33 +0200)]
autoimport: add option to skip hash checking

For variations of dedup, that do not consume the data.tar member, this
option can save significant bandwidth.

6 years agoautoimport: stream package list and use generic decompressor
Helmut Grohne [Sun, 22 May 2016 21:21:16 +0000 (23:21 +0200)]
autoimport: stream package list and use generic decompressor

 * streaming means that we do not need to hold the entire package list
   in memory (but the pkgs dict will become large anyway).
 * The decompress utility allows easily switching to e.g. xz which is
   the only compression format for the dbgsym suites.

6 years agoDecompressedStream: implement readline
Helmut Grohne [Sun, 22 May 2016 21:18:54 +0000 (23:18 +0200)]
DecompressedStream: implement readline

Iteration over file-like is required by deb822.Packages.iter_paragraphs.

6 years agomove from deprecated optparse to argparse
Helmut Grohne [Sat, 21 May 2016 15:54:04 +0000 (17:54 +0200)]
move from deprecated optparse to argparse

6 years agotreat Pre-Depends like regular Depends
Helmut Grohne [Thu, 5 May 2016 19:21:48 +0000 (21:21 +0200)]
treat Pre-Depends like regular Depends

The former behaviour was ignoring them. The intended use for dedup is to
know whenever a package unconditionally requires another package.

6 years agopush more functionality into DebExtractor
Helmut Grohne [Sun, 1 May 2016 12:31:56 +0000 (14:31 +0200)]
push more functionality into DebExtractor

The handle_ar_member and handle_ar_end methods now have a default
implementation adding further handlers handle_debversion,
handle_control_tar and handle_data_tar.

In that process two additional bugs were fixed:
 * decompress_tar was wrongly passing errors="surrogateescape" for
   Python 2.x even though that's only supported for Python 3.x.
 * The use of decompress actually passes the extension as unicode.

6 years agouse same Python version for autoimport and importpkg
Helmut Grohne [Sun, 1 May 2016 12:26:20 +0000 (14:26 +0200)]
use same Python version for autoimport and importpkg

The autoimport tool runs the Python interpreter explicitly. Instead of
invoking just "python" and thus calling whatever the current default is,
use sys.executable which is the interpreter used to run autoimport, thus
locking both to the same Python version.

6 years agosupport Python 3.x in importpkg
Helmut Grohne [Thu, 28 Apr 2016 19:35:42 +0000 (21:35 +0200)]
support Python 3.x in importpkg

In Python 2.x, TarInfo.name is a bytes object. In Python 3.x,
TarInfo.name always is a unicode object. To avoid importpkg crashing
with an exception, we direct the Python 3.x decoding to use
surrogateescapes. Thus decoding the name boils down to checking whether
it contains surrogates.

6 years agodecouple a function decompress out of decompress_tar
Helmut Grohne [Thu, 28 Apr 2016 18:50:12 +0000 (20:50 +0200)]
decouple a function decompress out of decompress_tar

Building on the previous commit, add a decompress function that turns a
compressed filelike into a decompressed filelike. Use it to decouple the
decompression step.

6 years agoextend functionality of DecompressedStream
Helmut Grohne [Thu, 28 Apr 2016 18:28:11 +0000 (20:28 +0200)]
extend functionality of DecompressedStream

It now supports:
 * tell()
 * seek(absolute_position), forward only
 * close()
 * closed

This is sufficient for putting it as a fileobj into tarfile.TarFile. By
doing so we can decouple decompression from tar processing, which eases
papering over the Python 2.x vs Python 3.x differences.

6 years agoimportpkg: move the hash function list to the extractor class
Helmut Grohne [Thu, 21 Apr 2016 21:15:22 +0000 (23:15 +0200)]
importpkg: move the hash function list to the extractor class

They really are an aspect of the particular extractor and can easily be
changed by subclassing.

6 years agoadd a class DebExtractor for guiding feature extraction
Helmut Grohne [Tue, 19 Apr 2016 20:48:02 +0000 (22:48 +0200)]
add a class DebExtractor for guiding feature extraction

It is supposed to separate the parsing of Debian packages (understanding
how the format works) from the actual feature extraction. Its goal is to
simplify writing custom extractors for different feature sets.

6 years agoadd a validate method to HashedStream
Helmut Grohne [Sat, 16 Apr 2016 09:22:18 +0000 (11:22 +0200)]
add a validate method to HashedStream

6 years agoimportpkg: use yaml dumper directly
Helmut Grohne [Sat, 16 Apr 2016 09:14:40 +0000 (11:14 +0200)]
importpkg: use yaml dumper directly

Instead of carefully crafting an iterator to pass to yaml.safe_dump_all,
we simply take control on our own and call represent on a yaml dumper
object where needed.

6 years agoimportpkg: refactor commit handling out of process_package*
Helmut Grohne [Sat, 16 Apr 2016 07:03:51 +0000 (09:03 +0200)]
importpkg: refactor commit handling out of process_package*

6 years agourlopen moved from urllib to urllib.request in py3k
Helmut Grohne [Fri, 8 Apr 2016 18:56:42 +0000 (20:56 +0200)]
urlopen moved from urllib to urllib.request in py3k

7 years agoprocess_control: do not encode to ascii
Helmut Grohne [Thu, 16 Apr 2015 15:58:56 +0000 (17:58 +0200)]
process_control: do not encode to ascii

Otherwise the yaml will contain binary strings on py3k which end up as
binary data in the sqlite database. In py2, yaml can handle those
unicode objects just fine.

7 years agotempfile.mkdtemp does not like bytes in py3k
Helmut Grohne [Thu, 16 Apr 2015 15:56:24 +0000 (17:56 +0200)]
tempfile.mkdtemp does not like bytes in py3k

7 years agounquote moved from urllib to urllib.parse in py3k
Helmut Grohne [Thu, 16 Apr 2015 15:56:02 +0000 (17:56 +0200)]
unquote moved from urllib to urllib.parse in py3k

7 years agoelement access on bytes yields int in py3k
Helmut Grohne [Thu, 16 Apr 2015 15:47:20 +0000 (17:47 +0200)]
element access on bytes yields int in py3k

7 years agozlib.crc32 behaves inconsistently on py2 vs py3
Helmut Grohne [Thu, 16 Apr 2015 15:46:07 +0000 (17:46 +0200)]
zlib.crc32 behaves inconsistently on py2 vs py3

zlib.crc32 returns a int32_t on py2 and a uint32_t on py3.

7 years agothere is no itertools.imap in py3k
Helmut Grohne [Thu, 16 Apr 2015 15:44:31 +0000 (17:44 +0200)]
there is no itertools.imap in py3k

7 years agouse binary stdin on py3k
Helmut Grohne [Thu, 16 Apr 2015 15:43:48 +0000 (17:43 +0200)]
use binary stdin on py3k

7 years agodistinguish bytes from unicode for py3k
Helmut Grohne [Thu, 16 Apr 2015 15:43:11 +0000 (17:43 +0200)]
distinguish bytes from unicode for py3k

7 years agoimportpkg: be more liberal in control file naming
Helmut Grohne [Wed, 23 Jul 2014 16:07:39 +0000 (18:07 +0200)]
importpkg: be more liberal in control file naming

While in current sid packages the control file in control.tar is always
named "./control", some older packages name it "control".

8 years agoimprove schema documentation
Helmut Grohne [Sat, 14 Jun 2014 10:08:09 +0000 (12:08 +0200)]
improve schema documentation

wording, more NOT NULLs, some more explanations

8 years agoadd documentation to schema.sql
Helmut Grohne [Sat, 14 Jun 2014 08:19:55 +0000 (10:19 +0200)]
add documentation to schema.sql

Thanks to Peter Palfrader for explaining what information is needed and
reviewing the documentation.

8 years agoupdate copyright information
Helmut Grohne [Sun, 11 May 2014 13:59:46 +0000 (15:59 +0200)]
update copyright information

8 years agoimportpkg: reduce copy&paste
Helmut Grohne [Sun, 11 May 2014 13:57:36 +0000 (15:57 +0200)]
importpkg: reduce copy&paste

8 years agoimportpkg: add support for data.tar.lzma
Guillem Jover [Wed, 7 May 2014 23:50:48 +0000 (01:50 +0200)]
importpkg: add support for data.tar.lzma

Creating packages with lzma compression has been deprecated since dpkg
1.16.4, but there might be some of those in the wild and supporting them
is strightforward when xz is already supported.

Signed-off-by: Guillem Jover <guillem@debian.org>
8 years agoimportpkg: add support for control.tar and control.tar.xz
Guillem Jover [Wed, 7 May 2014 19:06:38 +0000 (21:06 +0200)]
importpkg: add support for control.tar and control.tar.xz

dpkg supports those since 1.17.6.

Signed-off-by: Guillem Jover <guillem@debian.org>
8 years agodedup.arreader: remove trailing slash from ar members
Guillem Jover [Wed, 7 May 2014 23:46:21 +0000 (01:46 +0200)]
dedup.arreader: remove trailing slash from ar members

The GNU ar format adds a trailing slash to the member names, normalize
the member names to take this into account.

Signed-off-by: Guillem Jover <guillem@debian.org>
8 years agowebapp: allow git-like hash truncation
Helmut Grohne [Sun, 11 May 2014 13:25:46 +0000 (15:25 +0200)]
webapp: allow git-like hash truncation

8 years agoautoimport: support protocols besides http
Helmut Grohne [Mon, 21 Apr 2014 10:50:15 +0000 (12:50 +0200)]
autoimport: support protocols besides http

8 years agoschema: make syntax compatible with postgres
Helmut Grohne [Sat, 8 Mar 2014 08:48:17 +0000 (09:48 +0100)]
schema: make syntax compatible with postgres

8 years agoMerge branch updatesharing-eqclass
Helmut Grohne [Sun, 23 Feb 2014 19:12:18 +0000 (20:12 +0100)]
Merge branch updatesharing-eqclass

8 years agospell check comments
Helmut Grohne [Sun, 23 Feb 2014 17:19:35 +0000 (18:19 +0100)]
spell check comments

8 years agofix spelling mistake
Helmut Grohne [Sun, 23 Feb 2014 16:29:41 +0000 (17:29 +0100)]
fix spelling mistake

Reported-By: Stefan Kaltenbrunner
8 years agowebapp: fix eqclass usage in package comparison
Helmut Grohne [Sun, 23 Feb 2014 14:44:03 +0000 (15:44 +0100)]
webapp: fix eqclass usage in package comparison

When comparing two packages, objects would be considered duplicates
without considering whether the respective hash functions are comparable
by checking their equivalence classes. The current set of hash functions
does not expose this bug.

8 years agoupdate_sharing: weaken assumptions about db layout
Helmut Grohne [Fri, 21 Feb 2014 20:59:04 +0000 (21:59 +0100)]
update_sharing: weaken assumptions about db layout

Hash functions are partitioned into equivalence classes. We are
generally only interested in sharing among hash functions with the same
equivalence class, but the algorithm would compute any sharing. While
the current layout never produces the same hashes for functions in
difference equivalence classes (for different output length), that may
change in future.

Also allow hash functions, that belong to no equivalence class at all
(eqclass = NULL) as a means to add additional metadata to content
without computing any sharing for it.

8 years agoblacklist content rather than hashes
Helmut Grohne [Wed, 19 Feb 2014 13:21:20 +0000 (14:21 +0100)]
blacklist content rather than hashes

Otherwise the gzip hash cannot tell the empty stream and the
compressed empty stream apart.

8 years agoGzipDecompressor: don't treat checksum as garbage trailer
Helmut Grohne [Wed, 19 Feb 2014 13:19:56 +0000 (14:19 +0100)]
GzipDecompressor: don't treat checksum as garbage trailer