DecompressedStream: improve performance
authorHelmut Grohne <helmut@subdivi.de>
Thu, 30 Dec 2021 16:52:38 +0000 (17:52 +0100)
committerHelmut Grohne <helmut@subdivi.de>
Thu, 30 Dec 2021 16:52:38 +0000 (17:52 +0100)
When the decompression ratio is huge, we may be faced with a large
(multiple megabytes) bytes object. Slicing that object incurs a copy
becomes O(n^2) while appending and trimming a bytearray is much faster.

dedup/compression.py

index 6d361ac..da6e9a0 100644 (file)
@@ -101,7 +101,7 @@ class DecompressedStream:
         """
         self.fileobj = fileobj
         self.decompressor = decompressor
-        self.buff = b""
+        self.buff = bytearray()
         self.pos = 0
 
     def _fill_buff_until(self, predicate):
@@ -116,8 +116,8 @@ class DecompressedStream:
                 break
 
     def _read_from_buff(self, length):
-        ret = self.buff[:length]
-        self.buff = self.buff[length:]
+        ret = bytes(self.buff[:length])
+        self.buff[:length] = b""
         self.pos += length
         return ret
 
@@ -164,7 +164,7 @@ class DecompressedStream:
             self.fileobj.close()
             self.fileobj = None
             self.decompressor = None
-            self.buff = b""
+            self.buff = bytearray()
 
 decompressors = {
     '.gz':   GzipDecompressor,