Comment gzip compresser une chaîne en Python?
gzip.GzipFile
existe, mais c'est pour les objets fichier - qu'en est-il des chaînes simples?
python
compression
gzip
Bdfy
la source
la source
StringIO
mais n'explique pas vraiment comment le faire. Donc poser cette question ici est tout à fait valable, à mon humble avis. Quelques essais supplémentaires avant de les poser et de nous en parler auraient été bien, cependant.gzip string in python
et est très raisonnable IMO. Il devrait être rouvert.Réponses:
Si vous souhaitez produire une
gzip
chaîne binaire compatible complète , avec l'en-tête, etc., vous pouvez utilisergzip.GzipFile
avecStringIO
:try: from StringIO import StringIO # Python 2.7 except ImportError: from io import StringIO # Python 3.x import gzip out = StringIO() with gzip.GzipFile(fileobj=out, mode="w") as f: f.write("This is mike number one, isn't this a lot of fun?") out.getvalue() # returns '\x1f\x8b\x08\x00\xbd\xbe\xe8N\x02\xff\x0b\xc9\xc8,V\x00\xa2\xdc\xcc\xecT\x85\xbc\xd2\xdc\xa4\xd4"\x85\xfc\xbcT\x1d\xa0X\x9ez\x89B\tH:Q!\'\xbfD!?M!\xad4\xcf\x1e\x00w\xd4\xea\xf41\x00\x00\x00'
la source
f = gzip.GzipFile(StringIO.StringIO(text)); result = f.read(); f.close(); return result
import zlib; my_string = "hello world"; my_bytes = zlib.compress(my_string.encode('utf-8')); my_hex = my_bytes.hex(); my_bytes2 = bytes.fromhex(my_hex); my_string2 = zlib.decompress(my_bytes); assert my_string == my_string2;
Le moyen le plus simple est l'
zlib
encodage :compressed_value = s.encode("zlib")
Ensuite, vous le décompressez avec:
plain_string_again = compressed_value.decode("zlib")
la source
s
c'est un objet Python 2.x de typestr
.s.encode('rot13')
,s.encode( 'base64' )
plain_string_again = compressed_value.decode("zlib")
str
en Python 3) et les chaînes d'octets (typebytes
).str
les objets ont uneencode()
méthode qui renvoie unbytes
objet et lesbytes
objets ont unedecode()
méthode qui renvoie unstr
. Lezlib
codec est spécial en ce qu'il convertit debytes
enbytes
, donc il ne rentre pas dans cette structure. Vous pouvez utilisercodecs.encode(b, "zlib")
etcodecs.decode(b, "slib")
pour unbytes
objet à lab
place.Version Python3 de la réponse de Sven Marnach en 2011:
import gzip exampleString = 'abcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijmortenpunnerudengelstadrocksklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuv123' compressed_value = gzip.compress(bytes(exampleString,'utf-8')) plain_string_again = gzip.decompress(compressed_value)
la source
zlib
est toujours utilisé,gzip
utilise réellementzlib
, voir: docs.python.org/3/library/zlib.html et docs.python.org/3/library/gzip.html#module-gzipPour ceux qui souhaitent compresser un dataframe Pandas au format JSON:
Testé avec Python 3.6 et Pandas 0.23
import sys import zlib, lzma, bz2 import math def convert_size(size_bytes): if size_bytes == 0: return "0B" size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB") i = int(math.floor(math.log(size_bytes, 1024))) p = math.pow(1024, i) s = round(size_bytes / p, 2) return "%s %s" % (s, size_name[i]) dataframe = pd.read_csv('...') # your CSV file dataframe_json = dataframe.to_json(orient='split') data = dataframe_json.encode() compressed_data = bz2.compress(data) decompressed_data = bz2.decompress(compressed_data).decode() dataframe_aux = pd.read_json(decompressed_data, orient='split') #Original data size: 10982455 10.47 MB #Encoded data size: 10982439 10.47 MB #Compressed data size: 1276457 1.22 MB (lzma, slow), 2087131 1.99 MB (zlib, fast), 1410908 1.35 MB (bz2, fast) #Decompressed data size: 10982455 10.47 MB print('Original data size: ', sys.getsizeof(dataframe_json), convert_size(sys.getsizeof(dataframe_json))) print('Encoded data size: ', sys.getsizeof(data), convert_size(sys.getsizeof(data))) print('Compressed data size: ', sys.getsizeof(compressed_data), convert_size(sys.getsizeof(compressed_data))) print('Decompressed data size: ', sys.getsizeof(decompressed_data), convert_size(sys.getsizeof(decompressed_data))) print(dataframe.head()) print(dataframe_aux.head())
la source
s = "a long string of characters" g = gzip.open('gzipfilename.gz', 'w', 5) # ('filename', 'read/write mode', compression level) g.write(s) g.close()
la source