You may want to use explicitly the chunklen parameter to fine-tune your compression levels:
>>> a = np.arange(1e7)
>>> blz.barray(a)
barray((10000000,), float64) nbytes: 76.29 MB; cbytes: 2.57 MB; ratio: 29.72
cparams := cparams(clevel=5, shuffle=True)
[0.0, 1.0, 2.0, ..., 9999997.0, 9999998.0, 9999999.0]
>>> blz.barray(a).chunklen
16384 # 128 KB = 16384 * 8 is the default chunk size for this barray
>>> blz.barray(a, chunklen=512)
barray((10000000,), float64) nbytes: 76.29 MB; cbytes: 10.20 MB; ratio: 7.48
cparams := cparams(clevel=5, shuffle=True)
[0.0, 1.0, 2.0, ..., 9999997.0, 9999998.0, 9999999.0]
>>> blz.barray(a, chunklen=8*1024)
barray((10000000,), float64) nbytes: 76.29 MB; cbytes: 1.50 MB; ratio: 50.88
cparams := cparams(clevel=5, shuffle=True)
[0.0, 1.0, 2.0, ..., 9999997.0, 9999998.0, 9999999.0]
You see, the length of the chunk affects very much compression levels and the performance of I/O to barrays too.
In general, however, it is safer (and quicker!) to use the expectedlen parameter (see next section).
If you are going to add a lot of rows to your barrays, be sure to use the expectedlen parameter in creating time to inform the constructor about the expected length of your final barray; this allows barray to fine-tune the length of its chunks more easily. For example:
>>> a = np.arange(1e7)
>>> blz.barray(a, expectedlen=10).chunklen
512
>>> blz.barray(a, expectedlen=10*1000).chunklen
4096
>>> blz.barray(a, expectedlen=10*1000*1000).chunklen
16384
>>> blz.barray(a, expectedlen=10*1000*1000*1000).chunklen
131072