The blockchain data lake that captures
everything in a block, as it executes.
Bitquery's blockchain data lake records raw blockchain data for every block the moment it's processed — every transaction, log, internal call and operation-level trace, plus balance changes enriched with token metadata. It's a better alternative to running archive nodes: bigger and more complete than the node's own output, across 40+ chains. Pull it raw and build anything.
More than an archive node ever exposes
We don't dump node bytes after the fact. We record what happens while each block is processed — every operation the node performs — and we capture balance and token context on top. The result is bigger and more complete than the node's own output.
Captured at execution
We observe the block as it's processed, not just the block the node hands back. Every operation the node took is recorded.
Richer than node output
Operation-level traces and balance / token context that node RPC never exposes. More complete than your own archive node, with none of the ops.
You own what you build
Pull the data raw and derive exactly what you need — balances, trades, token lists. No vendor schema imposed; the protobuf definitions are open.
One layout, every chain
The same protobuf + LZ4 object layout across 40+ chains, partitioned by block height and ready for bulk pulls.
From the header down to every storage write
Each object is one block, serialized to protobuf, capturing what happened as it executed — including the operation-level detail and token context most APIs discard.
Transactions & receipts
Every transaction with fees, resource / energy usage, signatures, success or failure — plus the block header, witness / validator and chain id.
Event logs
Full logs per transaction — address, data and topics — exactly as emitted on chain.
Operation-level traces
Everything the node did while processing the block: internal calls, opcode-level steps, and pre/post storage diffs.
Balance changes + token metadata
Balance changes captured at transaction level, enriched with token metadata: symbol, decimals and total supply.
One block, endless derivations
Because the capture is complete, you can reconstruct almost anything from a single block — no extra lookups. Here's the same block, parsed three ways.
import lz4.frame from tron import block_message_pb2 as tron block = tron.BlockMessage() block.ParseFromString(lz4.frame.decompress(open("…block.lz4","rb").read())) # balance changes for every address that appears in Transfer events TRANSFER = bytes.fromhex("ddf252ad…") # Transfer(from,to,value) hit = set() for tx in block.Transactions: for c in tx.Contracts: for log in c.Logs: if log.Topics and log.Topics[0].Hash == TRANSFER: hit.update(t.Hash for t in log.Topics[1:]) # from / to for bu in c.Trace.TokenBalanceUpdates: # captured by us if bu.Address in hit: print(bu.Address, bu.Currency.Symbol, bu.Currency.Decimals, bu.PreBalance, bu.PostBalance)
Parse it your way
Decode a block with the open protobuf definitions, then walk transactions, logs and traces to derive exactly what you need. These run against one block object — no node, no index.
Get the schema & decoders →- →Balances per address, before and after
- →DEX trades — pools, amounts in / out
- →Token lists with symbol, decimals, supply
- →Internal calls and pre/post storage diffs
A better alternative to running your own archive node
Running archive nodes across many chains is expensive, slow to backfill, and still leaves out the data you actually need. The Bitquery Blockchain Data Lake gives you raw blockchain data with execution-level detail — without operating a single node.
Running your own archive node
the hard way- −Sync and babysit a node per chain — disk, bandwidth, reorgs and downtime are yours to manage.
- −RPC returns the block the node hands back; internal calls and pre/post storage diffs are gone.
- −No token context — you fetch symbol, decimals and total supply yourself.
- −Backfilling genesis-to-tip history takes weeks of compute and storage.
- −Every chain has its own client, schema and quirks to maintain.
Bitquery Blockchain Data Lake
the lake- +One object store across 40+ chains — no nodes to run, sync, or babysit.
- +Execution-level capture: internal calls, opcode-level steps and pre/post storage diffs included.
- +Token-aware balance changes with symbol, decimals and total supply built in.
- +Genesis-to-tip history ready for bulk pulls — raw blockchain data, day one.
- +One protobuf + LZ4 layout, partitioned by block height, for every chain.
Built for teams that want the whole block
If you'd otherwise run archive nodes, stitch together RPC calls, or wait on someone else's decoded schema, the lake gives you complete, execution-level data to build on directly.
Build your own indexer
Ship an indexer or analytics product without operating node fleets — pull the blocks, derive your own tables, and stay in control of the schema.
Own the pipeline end to end
Bulk-load genesis-to-tip history into your warehouse and replicate new blocks into your own bucket — one protobuf layout, no per-chain glue code.
Trace funds with full detail
Follow money through internal calls and pre/post storage diffs that node RPC drops — the trace-level completeness investigations actually need.
Train on the full corpus
Backtest and train on every block from genesis to the live tip, in one consistent format across 40+ chains — the complete record, not a sample.
Reconstruct balances per address
Rebuild balances and token histories straight from token-aware balance changes — symbol, decimals and supply included, no extra lookups.
Retire your archive nodes
Get execution detail and token context an archive node never gives you, across 40+ chains — with none of the ops, storage or sync to babysit.
Pulled the history? Stream new blocks live.
When you need to keep the lake current, subscribe to blocks as they're processed — the same protobuf objects, resumable from any height, with no gap and no dedup. Over Kafka, gRPC, or WebSocket.
- One cursor: replay from any block straight into the live tip
- Re-org rollback markers so you cleanly un-apply orphaned blocks
- Same execution-level capture as the lake — history and live are one format
Blockchain data lake, answered
What is the Bitquery Blockchain Data Lake?
It's a blockchain data lake that captures raw blockchain data for every block at the moment it's processed — transactions, logs, internal calls, operation-level traces and token-aware balance changes. Each block is serialized to protobuf, LZ4-compressed and stored in an S3-compatible object store across 40+ chains, from genesis to the live tip.
Is this a better alternative to running an archive node?
Yes. Instead of operating an archive node per chain, you pull execution-level data straight from the lake. It includes internal calls, opcode-level steps, pre/post storage diffs and token metadata that an archive node and node RPC never expose — with none of the sync, storage or downtime.
What does "raw blockchain data" mean here?
Each object is one complete block in an open protobuf schema. We impose no decoded schema — you pull the raw blockchain data in bulk and derive exactly what you need, such as balances, DEX trades or token lists, using the open reference decoders.
Which chains are covered, and how far back?
40+ chains, partitioned by block height from genesis to the live tip, all in the same protobuf and LZ4 object layout.
How is the data delivered?
As objects in an S3-compatible object store — one block per object, protobuf-serialized and LZ4-compressed. Pull in bulk or replicate to your own bucket, and add a live stream over Kafka, gRPC or WebSocket to stay current.
Own the whole block. Build anything on it.
Execution-level capture of every block, in a lake you control — pull it raw, derive whatever you need. Stream new blocks when you're ready to stay current. Talk to sales about bulk access, replication and streaming for your team.