Buckets-based hashing for gentler resource utilization #60

jeromegn · 2023-09-20T15:38:17Z

Corrosion presently makes full table scans to hash the contents of everything to be able to compare consistency between actors.

This is an expensive operation and it gets more and more expensive.

The new plan would be to create buckets based on primary keys hashes:

When Corrosion has written changes, hash all the unique primary keys, down to a u64
<hash> - (<hash> % <buckets size>) to determine the bucket "id"
Store the ID in a table linking table name, primary key and bucket ID.
Queue a background job to update the bucket's hash
Hash all bucket hashes periodically (or maybe in-place with XOR?) to determine the full consistency hash

This has several benefits:

No need to scan full tables
Should still provide a valid, consistent, hash across nodes
Can be done continuously

I believe this depends on vlcn-io/cr-sqlite#344 so the pk <-> bucket ID table doesn't weigh too much. We'd be able to use the cr-sqlite-encoded primary keys which are varints and therefore a lot smaller.

The text was updated successfully, but these errors were encountered:

jeromegn · 2023-11-28T19:06:52Z

Update: this should be implemented in cr-sqlite directly and feature-flagged or something like that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buckets-based hashing for gentler resource utilization #60

Buckets-based hashing for gentler resource utilization #60

jeromegn commented Sep 20, 2023

jeromegn commented Nov 28, 2023

Buckets-based hashing for gentler resource utilization #60

Buckets-based hashing for gentler resource utilization #60

Comments

jeromegn commented Sep 20, 2023

jeromegn commented Nov 28, 2023