Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly halve the number of writes for small blobs #10

Open
inverted-capital opened this issue Apr 4, 2024 · 3 comments
Open

Possibly halve the number of writes for small blobs #10

inverted-capital opened this issue Apr 4, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@inverted-capital
Copy link
Contributor

For small blobs, ie: those that are under the 64kB limit for DenoKV, using the blob library results in two writes - a meta and a blob write.

I wonder if these couldn't be combined into a single write ?

The format of the meta write is roughly:

  const key: [
    ...userSuppliedKey,
    "__kv_toolbox_meta__"
  ],
  const value: { kind: "buffer", size: 92 }

However if this was always encoded into a Uint8Array, then the first blob write could be appended to the meta value using length encoding or something, resulting in a single write for small values, and decreasing the total write count by 1 for all other sizes ?

@kitsonk kitsonk added enhancement New feature or request declined Respectfully declined labels Apr 5, 2024
@kitsonk
Copy link
Owner

kitsonk commented Apr 5, 2024

That really gets complicated supporting all of the features that are there as well as managing the logic around being able to detect and "view" the blob without decoding it. Embedding the meta data in the first "chunk" also makes it more complicated. Each of the values of the "parts" of the blob are binary chunks of the blob itself. This means these can be read (and streamed) without understanding the contents of the part and even decoding it.

Grabbing just the meta value works really well when you want to represent the value without touching the values of the blob itself, which is exactly what kview does when sending it over the wire. It uses the getMeta() to represent the raw binary data or Blob or File without actually having to read any of the blob itself. The writes are part of a single atomic commit internally, so the overhead is actually quite minimal.

I am afraid it isn't worth the added complexity and the downside of having to always fully decode the first chunk to just understand what type of blob it is.

@kitsonk kitsonk closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
@inverted-capital
Copy link
Contributor Author

Ok I understand - thanks for the detailed explanation.

My use case is that when I know a file is small I use a non blob method so I can get a faster read, otherwise my lookup is two round trips not one. This is hard to consume tho since I have to know in advance when reading which method was used to store. I understand what you mean about the writes being atomic so the overhead being negligible, but reads appear not as immune.

When my isolate is in australia, RTT to the db is 400ms each time, so 800ms to get the meta and then get the blob - I was hoping to do that in one shot.

A suggestion - you could maybe speed up the reads if you read both the meta and the first blob at once using getMany() and then use the list() function to get blobs two and above if they even exist ?

kv-toolbox/blob.ts

Lines 238 to 240 in b18ec53

const list = kv.list<Uint8Array>({ prefix: [...key, BLOB_KEY] }, {
...options,
batchSize: BATCH_SIZE,

Parallel reads are about the same speed as a single one when your database is half the world away...

@kitsonk kitsonk reopened this Apr 6, 2024
@kitsonk kitsonk removed the declined Respectfully declined label Apr 8, 2024
@kitsonk
Copy link
Owner

kitsonk commented Apr 8, 2024

I am still thinking about what the right approach for this. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants