Bitquery Blockchain Data Lake · 40+ chains · object-storage

The blockchain data lake that captures
everything in a block, as it executes.

Bitquery's blockchain data lake records raw blockchain data for every block the moment it's processed — every transaction, log, internal call and operation-level trace, plus balance changes enriched with token metadata. It's a better alternative to running archive nodes: bigger and more complete than the node's own output, across 40+ chains. Pull it raw and build anything.

open protobuf schema · reference decoders on GitHub
ingesting · genesis → tip
lake · tron.blocksprotobuf · lz4
tron.blocks · 81,000,010 objects · 26.3 TB · genesis → tip
40+
chains in the lake
genesis
to the live tip
op-level
execution traces
token-aware
balance changes captured
Why it's different

More than an archive node ever exposes

We don't dump node bytes after the fact. We record what happens while each block is processed — every operation the node performs — and we capture balance and token context on top. The result is bigger and more complete than the node's own output.

Captured at execution

We observe the block as it's processed, not just the block the node hands back. Every operation the node took is recorded.

Richer than node output

Operation-level traces and balance / token context that node RPC never exposes. More complete than your own archive node, with none of the ops.

You own what you build

Pull the data raw and derive exactly what you need — balances, trades, token lists. No vendor schema imposed; the protobuf definitions are open.

One layout, every chain

The same protobuf + LZ4 object layout across 40+ chains, partitioned by block height and ready for bulk pulls.

What's inside every block

From the header down to every storage write

Each object is one block, serialized to protobuf, capturing what happened as it executed — including the operation-level detail and token context most APIs discard.

Transactions & receipts

Every transaction with fees, resource / energy usage, signatures, success or failure — plus the block header, witness / validator and chain id.

Event logs

Full logs per transaction — address, data and topics — exactly as emitted on chain.

Operation-level traces

Everything the node did while processing the block: internal calls, opcode-level steps, and pre/post storage diffs.

Balance changes + token metadata

Balance changes captured at transaction level, enriched with token metadata: symbol, decimals and total supply.

What you can build

One block, endless derivations

Because the capture is complete, you can reconstruct almost anything from a single block — no extra lookups. Here's the same block, parsed three ways.

balance_changes.py
import lz4.frame
from tron import block_message_pb2 as tron
block = tron.BlockMessage()
block.ParseFromString(lz4.frame.decompress(open("…block.lz4","rb").read()))

# balance changes for every address that appears in Transfer events
TRANSFER = bytes.fromhex("ddf252ad…")          # Transfer(from,to,value)
hit = set()
for tx in block.Transactions:
    for c in tx.Contracts:
        for log in c.Logs:
            if log.Topics and log.Topics[0].Hash == TRANSFER:
                hit.update(t.Hash for t in log.Topics[1:])   # from / to
        for bu in c.Trace.TokenBalanceUpdates:               # captured by us
            if bu.Address in hit:
                print(bu.Address, bu.Currency.Symbol,
                      bu.Currency.Decimals, bu.PreBalance, bu.PostBalance)

Parse it your way

Decode a block with the open protobuf definitions, then walk transactions, logs and traces to derive exactly what you need. These run against one block object — no node, no index.

Get the schema & decoders →
Derive from one block
  • Balances per address, before and after
  • DEX trades — pools, amounts in / out
  • Token lists with symbol, decimals, supply
  • Internal calls and pre/post storage diffs
Archive node alternative

A better alternative to running your own archive node

Running archive nodes across many chains is expensive, slow to backfill, and still leaves out the data you actually need. The Bitquery Blockchain Data Lake gives you raw blockchain data with execution-level detail — without operating a single node.

Running your own archive node

the hard way
  • Sync and babysit a node per chain — disk, bandwidth, reorgs and downtime are yours to manage.
  • RPC returns the block the node hands back; internal calls and pre/post storage diffs are gone.
  • No token context — you fetch symbol, decimals and total supply yourself.
  • Backfilling genesis-to-tip history takes weeks of compute and storage.
  • Every chain has its own client, schema and quirks to maintain.

Bitquery Blockchain Data Lake

the lake
  • +One object store across 40+ chains — no nodes to run, sync, or babysit.
  • +Execution-level capture: internal calls, opcode-level steps and pre/post storage diffs included.
  • +Token-aware balance changes with symbol, decimals and total supply built in.
  • +Genesis-to-tip history ready for bulk pulls — raw blockchain data, day one.
  • +One protobuf + LZ4 layout, partitioned by block height, for every chain.
Who it's for

Built for teams that want the whole block

If you'd otherwise run archive nodes, stitch together RPC calls, or wait on someone else's decoded schema, the lake gives you complete, execution-level data to build on directly.

indexers & analytics

Build your own indexer

Ship an indexer or analytics product without operating node fleets — pull the blocks, derive your own tables, and stay in control of the schema.

data engineering

Own the pipeline end to end

Bulk-load genesis-to-tip history into your warehouse and replicate new blocks into your own bucket — one protobuf layout, no per-chain glue code.

forensics & compliance

Trace funds with full detail

Follow money through internal calls and pre/post storage diffs that node RPC drops — the trace-level completeness investigations actually need.

ml & research

Train on the full corpus

Backtest and train on every block from genesis to the live tip, in one consistent format across 40+ chains — the complete record, not a sample.

wallets & portfolio apps

Reconstruct balances per address

Rebuild balances and token histories straight from token-aware balance changes — symbol, decimals and supply included, no extra lookups.

multi-chain infra

Retire your archive nodes

Get execution detail and token context an archive node never gives you, across 40+ chains — with none of the ops, storage or sync to babysit.

Stay current

Pulled the history? Stream new blocks live.

When you need to keep the lake current, subscribe to blocks as they're processed — the same protobuf objects, resumable from any height, with no gap and no dedup. Over Kafka, gRPC, or WebSocket.

  • One cursor: replay from any block straight into the live tip
  • Re-org rollback markers so you cleanly un-apply orphaned blocks
  • Same execution-level capture as the lake — history and live are one format
live · tron.blocksresume from any height
FAQ

Blockchain data lake, answered

What is the Bitquery Blockchain Data Lake?

It's a blockchain data lake that captures raw blockchain data for every block at the moment it's processed — transactions, logs, internal calls, operation-level traces and token-aware balance changes. Each block is serialized to protobuf, LZ4-compressed and stored in an S3-compatible object store across 40+ chains, from genesis to the live tip.

Is this a better alternative to running an archive node?

Yes. Instead of operating an archive node per chain, you pull execution-level data straight from the lake. It includes internal calls, opcode-level steps, pre/post storage diffs and token metadata that an archive node and node RPC never expose — with none of the sync, storage or downtime.

What does "raw blockchain data" mean here?

Each object is one complete block in an open protobuf schema. We impose no decoded schema — you pull the raw blockchain data in bulk and derive exactly what you need, such as balances, DEX trades or token lists, using the open reference decoders.

Which chains are covered, and how far back?

40+ chains, partitioned by block height from genesis to the live tip, all in the same protobuf and LZ4 object layout.

How is the data delivered?

As objects in an S3-compatible object store — one block per object, protobuf-serialized and LZ4-compressed. Pull in bulk or replicate to your own bucket, and add a live stream over Kafka, gRPC or WebSocket to stay current.

Own the whole block. Build anything on it.

Execution-level capture of every block, in a lake you control — pull it raw, derive whatever you need. Stream new blocks when you're ready to stay current. Talk to sales about bulk access, replication and streaming for your team.