On disk compression

classic Classic list List threaded Threaded
4 messages Options
David Tinker David Tinker
Reply | Threaded
Open this post in threaded view
|

On disk compression

I have enabled compression (pageSize=16384, diskPageCompression=ZSTD, diskPageCompressionLevel=18) but the partition files don't appear to be very compressed. I tested by adding approx 16000 data items to my cache and looking at the partition files on disk.

Example: part-96.bin is 339M in size. If I compress that file with zstd (default settings) it goes down to 106M.

Is it possible to do better than this with Ignite? I need to be able to store a lot of data.

Thanks
David

Relevant parts of my ignite config:

    <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="consistentId" value=""/>

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="pageSize" value="16384"/>
...
            </bean>
        </property>

        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="name" value="activity-stream-data"/>
                <property name="atomicityMode" value="ATOMIC"/>
                <property name="diskPageCompression" value="ZSTD"/>
                <property name="diskPageCompressionLevel" value="18"/>
                <property name="backups" value="1"/>
            </bean>
        </property>
    </bean>

Alex Plehanov Alex Plehanov
Reply | Threaded
Open this post in threaded view
|

Re: On disk compression

Hello,

Ignite compresses each page individually. The result of whole file compression will always be better than the result of each individual page compression. Moreover, Ignite stores compressed pages only if the page size shrunk by one or more filesystem blocks. So, for example, if you have fs block size 4K, page size 16Kb and after compression your page size is 13Kb, then the page will be stored without compression.

BTW, how do you check file size? Ignite compression uses sparse files. "ls -l" reports allocated file size and doesn't utilize information about "holes" in a sparse file. To see the real amount of disk space occupied by the file you should use "du" or "ls -s". 


вт, 17 нояб. 2020 г. в 06:18, David Tinker <[hidden email]>:
I have enabled compression (pageSize=16384, diskPageCompression=ZSTD, diskPageCompressionLevel=18) but the partition files don't appear to be very compressed. I tested by adding approx 16000 data items to my cache and looking at the partition files on disk.

Example: part-96.bin is 339M in size. If I compress that file with zstd (default settings) it goes down to 106M.

Is it possible to do better than this with Ignite? I need to be able to store a lot of data.

Thanks
David

Relevant parts of my ignite config:

    <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="consistentId" value=""/>

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="pageSize" value="16384"/>
...
            </bean>
        </property>

        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="name" value="activity-stream-data"/>
                <property name="atomicityMode" value="ATOMIC"/>
                <property name="diskPageCompression" value="ZSTD"/>
                <property name="diskPageCompressionLevel" value="18"/>
                <property name="backups" value="1"/>
            </bean>
        </property>
    </bean>

David Tinker David Tinker
Reply | Threaded
Open this post in threaded view
|

Re: On disk compression

Aha! I didn't know about the sparse file thing. Thanks!

# ll -hs
159M -rw-r--r-- 1 ignite ignite 339M Nov 16 21:32 part-96.bin

So the real space used is only 159M. That's great. I currently have all of this data stored on the filesystem in  csv.gz files using 177M of space for the 16000 I tested with.

Any other tips on how to reduce disk usage? Any point in using compression level more than 18 for ZSTD? Most of this data will only be written once so I am not so concerned about write speed.


On Tue, Nov 17, 2020 at 9:34 AM Alex Plehanov <[hidden email]> wrote:
Hello,

Ignite compresses each page individually. The result of whole file compression will always be better than the result of each individual page compression. Moreover, Ignite stores compressed pages only if the page size shrunk by one or more filesystem blocks. So, for example, if you have fs block size 4K, page size 16Kb and after compression your page size is 13Kb, then the page will be stored without compression.

BTW, how do you check file size? Ignite compression uses sparse files. "ls -l" reports allocated file size and doesn't utilize information about "holes" in a sparse file. To see the real amount of disk space occupied by the file you should use "du" or "ls -s". 


вт, 17 нояб. 2020 г. в 06:18, David Tinker <[hidden email]>:
I have enabled compression (pageSize=16384, diskPageCompression=ZSTD, diskPageCompressionLevel=18) but the partition files don't appear to be very compressed. I tested by adding approx 16000 data items to my cache and looking at the partition files on disk.

Example: part-96.bin is 339M in size. If I compress that file with zstd (default settings) it goes down to 106M.

Is it possible to do better than this with Ignite? I need to be able to store a lot of data.

Thanks
David

Relevant parts of my ignite config:

    <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="consistentId" value=""/>

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="pageSize" value="16384"/>
...
            </bean>
        </property>

        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="name" value="activity-stream-data"/>
                <property name="atomicityMode" value="ATOMIC"/>
                <property name="diskPageCompression" value="ZSTD"/>
                <property name="diskPageCompressionLevel" value="18"/>
                <property name="backups" value="1"/>
            </bean>
        </property>
    </bean>

Alex Plehanov Alex Plehanov
Reply | Threaded
Open this post in threaded view
|

Re: On disk compression

If you have a write-heavy workload, to reduce disk usage you can also compress WAL (see "WAL compaction" and "WAL page snapshots compression" features).
I'm not sure about ZSTD compression levels, you can try it. But there is a warning in the ZSTD manual: "Levels >= 20 should be used with caution, as they require more memory". Perhaps someone who is more familiar with ZSTD will answer how higher compression levels affect resource consumption during decompression.

вт, 17 нояб. 2020 г. в 11:00, David Tinker <[hidden email]>:
Aha! I didn't know about the sparse file thing. Thanks!

# ll -hs
159M -rw-r--r-- 1 ignite ignite 339M Nov 16 21:32 part-96.bin

So the real space used is only 159M. That's great. I currently have all of this data stored on the filesystem in  csv.gz files using 177M of space for the 16000 I tested with.

Any other tips on how to reduce disk usage? Any point in using compression level more than 18 for ZSTD? Most of this data will only be written once so I am not so concerned about write speed.


On Tue, Nov 17, 2020 at 9:34 AM Alex Plehanov <[hidden email]> wrote:
Hello,

Ignite compresses each page individually. The result of whole file compression will always be better than the result of each individual page compression. Moreover, Ignite stores compressed pages only if the page size shrunk by one or more filesystem blocks. So, for example, if you have fs block size 4K, page size 16Kb and after compression your page size is 13Kb, then the page will be stored without compression.

BTW, how do you check file size? Ignite compression uses sparse files. "ls -l" reports allocated file size and doesn't utilize information about "holes" in a sparse file. To see the real amount of disk space occupied by the file you should use "du" or "ls -s". 


вт, 17 нояб. 2020 г. в 06:18, David Tinker <[hidden email]>:
I have enabled compression (pageSize=16384, diskPageCompression=ZSTD, diskPageCompressionLevel=18) but the partition files don't appear to be very compressed. I tested by adding approx 16000 data items to my cache and looking at the partition files on disk.

Example: part-96.bin is 339M in size. If I compress that file with zstd (default settings) it goes down to 106M.

Is it possible to do better than this with Ignite? I need to be able to store a lot of data.

Thanks
David

Relevant parts of my ignite config:

    <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="consistentId" value=""/>

        <property name="dataStorageConfiguration">
            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="pageSize" value="16384"/>
...
            </bean>
        </property>

        <property name="cacheConfiguration">
            <bean class="org.apache.ignite.configuration.CacheConfiguration">
                <property name="name" value="activity-stream-data"/>
                <property name="atomicityMode" value="ATOMIC"/>
                <property name="diskPageCompression" value="ZSTD"/>
                <property name="diskPageCompressionLevel" value="18"/>
                <property name="backups" value="1"/>
            </bean>
        </property>
    </bean>