Possibly halve the number of writes for small blobs #10

inverted-capital · 2024-04-04T23:18:21Z

For small blobs, ie: those that are under the 64kB limit for DenoKV, using the blob library results in two writes - a meta and a blob write.

I wonder if these couldn't be combined into a single write ?

The format of the meta write is roughly:

  const key: [
    ...userSuppliedKey,
    "__kv_toolbox_meta__"
  ],
  const value: { kind: "buffer", size: 92 }

However if this was always encoded into a Uint8Array, then the first blob write could be appended to the meta value using length encoding or something, resulting in a single write for small values, and decreasing the total write count by 1 for all other sizes ?

kitsonk · 2024-04-05T05:39:01Z

That really gets complicated supporting all of the features that are there as well as managing the logic around being able to detect and "view" the blob without decoding it. Embedding the meta data in the first "chunk" also makes it more complicated. Each of the values of the "parts" of the blob are binary chunks of the blob itself. This means these can be read (and streamed) without understanding the contents of the part and even decoding it.

Grabbing just the meta value works really well when you want to represent the value without touching the values of the blob itself, which is exactly what kview does when sending it over the wire. It uses the getMeta() to represent the raw binary data or Blob or File without actually having to read any of the blob itself. The writes are part of a single atomic commit internally, so the overhead is actually quite minimal.

I am afraid it isn't worth the added complexity and the downside of having to always fully decode the first chunk to just understand what type of blob it is.

inverted-capital · 2024-04-06T00:46:27Z

Ok I understand - thanks for the detailed explanation.

My use case is that when I know a file is small I use a non blob method so I can get a faster read, otherwise my lookup is two round trips not one. This is hard to consume tho since I have to know in advance when reading which method was used to store. I understand what you mean about the writes being atomic so the overhead being negligible, but reads appear not as immune.

When my isolate is in australia, RTT to the db is 400ms each time, so 800ms to get the meta and then get the blob - I was hoping to do that in one shot.

A suggestion - you could maybe speed up the reads if you read both the meta and the first blob at once using getMany() and then use the list() function to get blobs two and above if they even exist ?

kv-toolbox/blob.ts

Lines 238 to 240 in b18ec53

    
           const list = kv.list<Uint8Array>({ prefix: [...key, BLOB_KEY] }, { 
        
             ...options, 
        
             batchSize: BATCH_SIZE,

Parallel reads are about the same speed as a single one when your database is half the world away...

kitsonk · 2024-04-08T04:19:57Z

I am still thinking about what the right approach for this. 🤔

kitsonk added enhancement New feature or request declined Respectfully declined labels Apr 5, 2024

kitsonk closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024

kitsonk reopened this Apr 6, 2024

kitsonk removed the declined Respectfully declined label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly halve the number of writes for small blobs #10

Possibly halve the number of writes for small blobs #10

inverted-capital commented Apr 4, 2024

kitsonk commented Apr 5, 2024

inverted-capital commented Apr 6, 2024

kitsonk commented Apr 8, 2024

Possibly halve the number of writes for small blobs #10

Possibly halve the number of writes for small blobs #10

Comments

inverted-capital commented Apr 4, 2024

kitsonk commented Apr 5, 2024

inverted-capital commented Apr 6, 2024

kitsonk commented Apr 8, 2024