~helmut/debian-dedup.git
8 years agodedup.image: img.convert can also raise that crazy stuff
Helmut Grohne [Mon, 27 May 2013 09:59:33 +0000 (11:59 +0200)]
dedup.image: img.convert can also raise that crazy stuff

8 years agowebapp: declare html5 and utf-8
Helmut Grohne [Thu, 9 May 2013 07:20:03 +0000 (09:20 +0200)]
webapp: declare html5 and utf-8

8 years agowebapp: enrich comparison page with version info
Helmut Grohne [Thu, 9 May 2013 06:32:14 +0000 (08:32 +0200)]
webapp: enrich comparison page with version info

8 years agofix attribution of logo
Helmut Grohne [Wed, 8 May 2013 13:52:42 +0000 (15:52 +0200)]
fix attribution of logo

I remembered the wrong name. The logo was made by Sune Vuorela.

8 years agowebapp: markup error in /source template
Helmut Grohne [Sun, 5 May 2013 15:24:50 +0000 (17:24 +0200)]
webapp: markup error in /source template

8 years agowebapp: validator complained about <link> with sizes
Helmut Grohne [Sun, 5 May 2013 15:22:33 +0000 (17:22 +0200)]
webapp: validator complained about <link> with sizes

8 years agowebapp: reference favicon from base.html
Helmut Grohne [Sun, 5 May 2013 15:10:24 +0000 (17:10 +0200)]
webapp: reference favicon from base.html

8 years agoadded favicon.ico
Helmut Grohne [Sun, 5 May 2013 14:19:10 +0000 (16:19 +0200)]
added favicon.ico

Authored: Cyril Brulebois

8 years agowebapp: use jinja's filesizeformat
Helmut Grohne [Thu, 2 May 2013 17:28:24 +0000 (19:28 +0200)]
webapp: use jinja's filesizeformat

Except it doesn't work, so replace it with our version. At least we
might be able to drop this code in a future update.

8 years agowebapp: reduce size of comparison output
Helmut Grohne [Thu, 2 May 2013 16:48:14 +0000 (18:48 +0200)]
webapp: reduce size of comparison output

Only add rowspan when it carries a meaning.

8 years agowebapp: add a css class binary-package
Helmut Grohne [Sat, 27 Apr 2013 08:55:21 +0000 (10:55 +0200)]
webapp: add a css class binary-package

8 years agowebapp: total_size is None if num_files is 0
Helmut Grohne [Thu, 25 Apr 2013 12:19:58 +0000 (14:19 +0200)]
webapp: total_size is None if num_files is 0

8 years agowebapp: color filenames when hovering them
Helmut Grohne [Thu, 25 Apr 2013 12:10:47 +0000 (14:10 +0200)]
webapp: color filenames when hovering them

8 years agowebapp: turn the <br> after filename into a style
Helmut Grohne [Thu, 25 Apr 2013 12:10:18 +0000 (14:10 +0200)]
webapp: turn the <br> after filename into a style

8 years agomove css to /style.css
Helmut Grohne [Thu, 25 Apr 2013 12:02:48 +0000 (14:02 +0200)]
move css to /style.css

8 years agowebapp: make filenames css styleable
Helmut Grohne [Thu, 25 Apr 2013 12:01:11 +0000 (14:01 +0200)]
webapp: make filenames css styleable

8 years agowebapp: top-align fields in /compare pages
Helmut Grohne [Thu, 25 Apr 2013 07:33:03 +0000 (09:33 +0200)]
webapp: top-align fields in /compare pages

Suggested by Paul Wise.

8 years agofix markup in base.html
Helmut Grohne [Thu, 25 Apr 2013 07:32:46 +0000 (09:32 +0200)]
fix markup in base.html

8 years agoimplement the /compare/pkg1/pkg2 page differently
Helmut Grohne [Wed, 24 Apr 2013 18:56:46 +0000 (20:56 +0200)]
implement the /compare/pkg1/pkg2 page differently

The original version had two major drawbacks:
 1) The SQL query used would cause a btree sort, so the time waiting
    for the first output was rather long.
 2) For packages with many equal files, the output would grow with
    O(n^2).

Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The
approach now groups files in package1 by their main hash value (sha512).
It also does some work SQL was designed to solve manually now. To speed
up page generation a new caching table was added identifying which files
have corresponding shared files.

8 years agowebapp: added some useful notes
Helmut Grohne [Sun, 14 Apr 2013 08:31:55 +0000 (10:31 +0200)]
webapp: added some useful notes

8 years agobase.html: add link to wiki.debian.org
Helmut Grohne [Sat, 13 Apr 2013 07:59:45 +0000 (09:59 +0200)]
base.html: add link to wiki.debian.org

8 years agoREADME: improve query after schemachange
Helmut Grohne [Mon, 8 Apr 2013 12:41:23 +0000 (14:41 +0200)]
README: improve query after schemachange

8 years agowebapp: fix problem from the previous merge
Helmut Grohne [Tue, 26 Mar 2013 15:23:37 +0000 (16:23 +0100)]
webapp: fix problem from the previous merge

8 years agoMerge branch schemachange
Helmut Grohne [Tue, 26 Mar 2013 14:59:48 +0000 (15:59 +0100)]
Merge branch schemachange

8 years agowebapp: report correct sizes
Helmut Grohne [Wed, 20 Mar 2013 18:19:50 +0000 (19:19 +0100)]
webapp: report correct sizes

8 years agowebapp: remove broken assert
Helmut Grohne [Wed, 20 Mar 2013 18:12:25 +0000 (19:12 +0100)]
webapp: remove broken assert

Fails on long inputs.

8 years agodedup.image: mask errors from PIL
Helmut Grohne [Mon, 18 Mar 2013 15:51:17 +0000 (16:51 +0100)]
dedup.image: mask errors from PIL

8 years agodedup.arreader: missing bytes marker
Helmut Grohne [Tue, 12 Mar 2013 07:38:57 +0000 (08:38 +0100)]
dedup.arreader: missing bytes marker

8 years agomove ArReader from importpkg to dedup.arreader
Helmut Grohne [Tue, 12 Mar 2013 07:24:49 +0000 (08:24 +0100)]
move ArReader from importpkg to dedup.arreader

Also document it.

8 years agoREADME: update queries to match content table split
Helmut Grohne [Sun, 10 Mar 2013 06:38:22 +0000 (07:38 +0100)]
README: update queries to match content table split

8 years agosplit content table to a hash table
Helmut Grohne [Sat, 9 Mar 2013 17:43:47 +0000 (18:43 +0100)]
split content table to a hash table

In the old content table (package, filename, size) would be the same for
multiple hash functions. Now the schema represents that each file has
precisely one size, but multiple hashes.

8 years agowebapp: drop unused function compute_sharedstats
Helmut Grohne [Sat, 9 Mar 2013 17:37:24 +0000 (18:37 +0100)]
webapp: drop unused function compute_sharedstats

The sharing table works great and I don't want to adapt it for the next
step in the schema change.

8 years agouse "ON DELETE CASCADE" clauses
Helmut Grohne [Thu, 7 Mar 2013 08:05:48 +0000 (09:05 +0100)]
use "ON DELETE CASCADE" clauses

8 years agoenable enforcing foreign keys
Helmut Grohne [Thu, 7 Mar 2013 07:43:15 +0000 (08:43 +0100)]
enable enforcing foreign keys

8 years agoschema.sql: remove unsatisfiable foreign key
Helmut Grohne [Thu, 7 Mar 2013 07:41:35 +0000 (08:41 +0100)]
schema.sql: remove unsatisfiable foreign key

In the dependency table we will insert dependencies on packages which
are not tracked. This happens during initial import and for virtual
packages. Therefore the "required" column cannot be a foreign key.

8 years agoschema.sql: annotat foreign keys of sharing
Helmut Grohne [Thu, 7 Mar 2013 07:28:56 +0000 (08:28 +0100)]
schema.sql: annotat foreign keys of sharing

8 years agointegrate the source table into the package table
Helmut Grohne [Thu, 7 Mar 2013 07:24:44 +0000 (08:24 +0100)]
integrate the source table into the package table

8 years agoREADME: explain queries
Helmut Grohne [Thu, 7 Mar 2013 07:12:01 +0000 (08:12 +0100)]
README: explain queries

8 years agoREADME: added interesting query
Helmut Grohne [Wed, 6 Mar 2013 14:36:49 +0000 (15:36 +0100)]
README: added interesting query

8 years agowebapp: added /source/<pkg> page
Helmut Grohne [Tue, 5 Mar 2013 07:39:06 +0000 (08:39 +0100)]
webapp: added /source/<pkg> page

8 years agowebapp: helper function function_combination
Helmut Grohne [Tue, 5 Mar 2013 07:38:39 +0000 (08:38 +0100)]
webapp: helper function function_combination

8 years agoimportpkg: source header may contain a version
Helmut Grohne [Tue, 5 Mar 2013 07:21:13 +0000 (08:21 +0100)]
importpkg: source header may contain a version

8 years agowebapp: fix index template
Helmut Grohne [Mon, 4 Mar 2013 17:53:23 +0000 (18:53 +0100)]
webapp: fix index template

Apparently not all browsers understand <a ... /> in all rendering modes.

8 years agowebapp: use caching table "shared" for /binary page
Helmut Grohne [Mon, 4 Mar 2013 17:49:54 +0000 (18:49 +0100)]
webapp: use caching table "shared" for /binary page

8 years agowebapp: generate /comparison pages in constant-space
Helmut Grohne [Mon, 4 Mar 2013 12:49:22 +0000 (13:49 +0100)]
webapp: generate /comparison pages in constant-space

8 years agoimportpkg: record the source package relationship
Helmut Grohne [Mon, 4 Mar 2013 10:44:24 +0000 (11:44 +0100)]
importpkg: record the source package relationship

8 years agoupdate_sharing: wrong database name
Helmut Grohne [Sat, 2 Mar 2013 21:33:39 +0000 (22:33 +0100)]
update_sharing: wrong database name

8 years agoadd sharing table
Helmut Grohne [Sat, 2 Mar 2013 21:29:04 +0000 (22:29 +0100)]
add sharing table

The sharing table is a cache for the /binary web pages. It essentially
contains the numbers presented. This caching table is not automatically
populated. It needs to be reconstructed after every (group of) package
imports.

8 years agoupdate README
Helmut Grohne [Sat, 2 Mar 2013 20:46:47 +0000 (21:46 +0100)]
update README

 * Tell about schema.sql.
 * Explain WAL.

8 years agomove fetchiter from webapp to dedup.utils
Helmut Grohne [Sat, 2 Mar 2013 20:24:18 +0000 (21:24 +0100)]
move fetchiter from webapp to dedup.utils

8 years agomove sql schema to a separate file
Helmut Grohne [Sat, 2 Mar 2013 20:18:14 +0000 (21:18 +0100)]
move sql schema to a separate file

8 years agoadded html form to main page
Helmut Grohne [Sat, 2 Mar 2013 10:25:53 +0000 (11:25 +0100)]
added html form to main page

Thanks to Jan Luehr for doing the work.

8 years agowebapp: open database cursor lazily
Helmut Grohne [Mon, 25 Feb 2013 10:56:09 +0000 (11:56 +0100)]
webapp: open database cursor lazily

Makes things more correct when using Application in multiprocessing
context.

8 years agowebapp: pass database to Application class
Helmut Grohne [Mon, 25 Feb 2013 10:52:05 +0000 (11:52 +0100)]
webapp: pass database to Application class

8 years agoREADME: another interesting query
Helmut Grohne [Mon, 25 Feb 2013 10:49:27 +0000 (11:49 +0100)]
README: another interesting query

8 years agoMerge branch 'crosshash'
Helmut Grohne [Mon, 25 Feb 2013 09:00:50 +0000 (10:00 +0100)]
Merge branch 'crosshash'

Conflicts in webapp.py:
 * The fetchall -> fetchiter change caused big conflicts.
 * New hash combination (image_sha512, image_sha512) added.

8 years agowebapp: complete cross hash support
Helmut Grohne [Mon, 25 Feb 2013 08:55:35 +0000 (09:55 +0100)]
webapp: complete cross hash support

8 years agoautoimport: this is not how foreign key constraints work
Helmut Grohne [Mon, 25 Feb 2013 07:55:53 +0000 (08:55 +0100)]
autoimport: this is not how foreign key constraints work

8 years agohash image contents
Helmut Grohne [Sun, 24 Feb 2013 00:03:30 +0000 (01:03 +0100)]
hash image contents

8 years agoREADME: fix mistake
Helmut Grohne [Sun, 24 Feb 2013 00:02:38 +0000 (01:02 +0100)]
README: fix mistake

8 years agoimportpkg: ignore filenames with encoding errors
Helmut Grohne [Sat, 23 Feb 2013 08:53:33 +0000 (09:53 +0100)]
importpkg: ignore filenames with encoding errors

8 years agoautoimport: log which packages are dropped
Helmut Grohne [Sat, 23 Feb 2013 08:36:15 +0000 (09:36 +0100)]
autoimport: log which packages are dropped

8 years agoautoimport: fix version check to actually work
Helmut Grohne [Fri, 22 Feb 2013 18:59:00 +0000 (19:59 +0100)]
autoimport: fix version check to actually work

Don't fail on new packages and skip versions already processed again.

8 years agoautoimport: skip old versions entirely
Helmut Grohne [Fri, 22 Feb 2013 18:55:31 +0000 (19:55 +0100)]
autoimport: skip old versions entirely

Presumably this is responsible for the blocking curl processes, since
importpkg will terminate early when processing an old version.

8 years agowebapp: add caching headers
Helmut Grohne [Fri, 22 Feb 2013 17:33:22 +0000 (18:33 +0100)]
webapp: add caching headers

8 years agowebapp: stream responses
Helmut Grohne [Fri, 22 Feb 2013 17:21:44 +0000 (18:21 +0100)]
webapp: stream responses

Maybe this gets memory usage down for large responses.

8 years agowebapp: attempt to reduce memory usage
Helmut Grohne [Fri, 22 Feb 2013 16:47:14 +0000 (17:47 +0100)]
webapp: attempt to reduce memory usage

8 years agowebapp: support matching sha512 against gzip_sha512
Helmut Grohne [Fri, 22 Feb 2013 13:12:33 +0000 (14:12 +0100)]
webapp: support matching sha512 against gzip_sha512

This covers only the /binary page. The comparison may still be empty.

8 years agoautoimport: first wait on the import
Helmut Grohne [Fri, 22 Feb 2013 06:24:05 +0000 (07:24 +0100)]
autoimport: first wait on the import

Otherwise the import zombifies and curl blocks.

8 years agomove compression functions to module dedup.compression
Helmut Grohne [Thu, 21 Feb 2013 16:33:27 +0000 (17:33 +0100)]
move compression functions to module dedup.compression

8 years agodo not track byted compiled python files
Helmut Grohne [Thu, 21 Feb 2013 16:33:06 +0000 (17:33 +0100)]
do not track byted compiled python files

8 years agomove hashing functions to module dedup.hashing
Helmut Grohne [Thu, 21 Feb 2013 16:10:54 +0000 (17:10 +0100)]
move hashing functions to module dedup.hashing

8 years agoinclude maintainer information
Helmut Grohne [Thu, 21 Feb 2013 14:31:51 +0000 (15:31 +0100)]
include maintainer information

8 years agoadded a base template to the webapp
Helmut Grohne [Thu, 21 Feb 2013 14:26:24 +0000 (15:26 +0100)]
added a base template to the webapp

8 years agoadded useful links to webapp
Helmut Grohne [Thu, 21 Feb 2013 13:35:05 +0000 (14:35 +0100)]
added useful links to webapp

8 years agoadded README
Helmut Grohne [Thu, 21 Feb 2013 07:53:06 +0000 (08:53 +0100)]
added README

8 years agorename test.py to importpkg.py
Helmut Grohne [Thu, 21 Feb 2013 07:42:44 +0000 (08:42 +0100)]
rename test.py to importpkg.py

8 years agolicense as BSD-3
Helmut Grohne [Thu, 21 Feb 2013 07:41:56 +0000 (08:41 +0100)]
license as BSD-3

8 years agofix comparison of conflicting packages
Helmut Grohne [Wed, 20 Feb 2013 21:33:54 +0000 (22:33 +0100)]
fix comparison of conflicting packages

8 years agoreduce memory usage of autoimport
Helmut Grohne [Wed, 20 Feb 2013 20:37:57 +0000 (21:37 +0100)]
reduce memory usage of autoimport

8 years agofix links in index
Helmut Grohne [Wed, 20 Feb 2013 20:24:12 +0000 (21:24 +0100)]
fix links in index

8 years agominimal index page explaining stuff
Helmut Grohne [Wed, 20 Feb 2013 20:12:58 +0000 (21:12 +0100)]
minimal index page explaining stuff

8 years agoimplement autoimport
Helmut Grohne [Wed, 20 Feb 2013 18:04:18 +0000 (19:04 +0100)]
implement autoimport

8 years agomark required packages in binary view
Helmut Grohne [Wed, 20 Feb 2013 16:14:48 +0000 (17:14 +0100)]
mark required packages in binary view

8 years agostore hard dependencies
Helmut Grohne [Wed, 20 Feb 2013 15:52:17 +0000 (16:52 +0100)]
store hard dependencies

8 years agodetermine metadata from control.tar.gz
Helmut Grohne [Wed, 20 Feb 2013 15:41:30 +0000 (16:41 +0100)]
determine metadata from control.tar.gz

8 years agoteach ArReader to read multiple entries
Helmut Grohne [Wed, 20 Feb 2013 14:55:05 +0000 (15:55 +0100)]
teach ArReader to read multiple entries

8 years agocleanup
Helmut Grohne [Wed, 20 Feb 2013 14:39:33 +0000 (15:39 +0100)]
cleanup

8 years agomany improvements
Helmut Grohne [Wed, 20 Feb 2013 14:28:04 +0000 (15:28 +0100)]
many improvements

 * multiple hashes
 * template engine
 * new table package
 * comparison view
 * hashvalue view

8 years agofirst prototype
Helmut Grohne [Wed, 20 Feb 2013 14:27:40 +0000 (15:27 +0100)]
first prototype