Francesc Alted
I am a curious person who studied Physics and Math when I was young. Through the years, I developed a passion for handling large datasets and using compression to enable their analysis using regular hardware that is accessible to everyone.
I am leading the Blosc Development Team, and currently interested in determining, ahead of time, which combinations of codecs and filters can provide a personalized compression experience. This way, users can choose whether they prefer a higher compression ratio, faster compression speed, or a balance between both.
Last, but not least, I have recently been awarded with the "2023 Project Sustainability Award" from NumFOCUS.
You can know more on what I am working on by reading my latest blogs.
Sessions
Data compression is not a one-codec-fits-all problem. It necessarily involves a trade-off between compression ratio and speed. A higher compression ratio usually results in a slower compression process. Depending on the needs, one may want to prioritize one over the other. The issue is that finding the optimal compression parameters can be a slow process due to the large number of combinations of compression parameters (codec, compression level, filter, split mode, number of threads, etc.), and it may require a significant amount of manual trial and error to find the best combinations.
Btune (https://btune.blosc.org) is a dynamic plugin for Blosc2 that can help finding the optimal combination of compression parameters for datasets compressed with Blosc2 (https://github.com/Blosc/c-blosc2, https://github.com/Blosc/python-blosc2), while significantly speeding up this process.
N-dimensional datasets are pervasive in many scientific areas, and getting quick slices of them is critical for an improved exploration experience. Blosc2 is a compression and format library that recently gained support for dealing with such multidimensional datasets. Crucially important, by leveraging compression, Blosc2 can deal with sparse datasets effectively as the zeroed parts are almost suppressed, whereas the non-zero parts can still be stored in smaller sizes than non-compressed counterparts. In addition, the new double data partition inside Blosc2 minimizes the decompression of unnecessary data and provides top-class slicing speed.