Querying Blockchain Data Using Google BigQuery: A Comprehensive Tutorial
If you want to query blockchain data, querying it from a data indexing service will be time-consuming. Most of the indexing APIs will have some kind of limit on them, so you have to query them again and again till you get all the data you need. What if there was indexed data stored on a SQL database that you could access on demand without any restrictions?
Various public and private datasets offer access to blockchain data. Here are a few notable ones:
- BigQuery Public Cryptocurrency Datasets
- AWS Public Blockchain Data
- Bitquery’s Data in Cloud
In this tutorial, we are going to learn about Google BigQuery which provides us a way to query large amounts of data for public blockchains like Bitcoin, Ethereum, Polygon and many more with their public dataset.
Today we are going to learn how to query public Ethereum datasets and query data in the Google Cloud SQL workspace using SQL, and how to use the Python client for BigQuery to fetch data from those public datasets. We will also learn about the drawbacks of these public datasets and how to overcome them.
As the title suggests, we are going to use Google products, so you need a Google account. For technical prerequisites, you should have basic knowledge of SQL, and if you’re going to continue with the Python part of the tutorial, you should have some knowledge of Python.
Setting Up Google BigQuery
If you're new to Google BigQuery, the first step is to set up your account and create a new project within Google BigQuery. You can initiate this process by visiting the following link: https://console.cloud.google.com/bigquery
Once your account setup is complete, you'll gain access to all the publicly available datasets on Google BigQuery. Additionally, you'll be able to use the built-in editor to write and execute queries, allowing you to explore the Ethereum public dataset.
You can open a new tab by clicking on the blue plus button, which will display a new editor where you can write your SQL queries.
Accessing Blockchain Data via BigQuery Dashboard
Let's begin by examining the public Ethereum dataset. You can find detailed information about the dataset here.
Upon visiting the page, you'll notice that there's a free tier available for BigQuery, allowing us to explore the data. The page also provides some examples illustrating how the dataset can be utilized.
To dive deeper into the dataset, let's initiate our exploration by running our first query to discover the available tables within this dataset. Open your editor and paste the following query:
SELECT * FROM `bigquery-public-data.crypto_ethereum.INFORMATION_SCHEMA.TABLES`;
In this query, we are accessing the metadata of the dataset to understand its structure using the dataset ID, which, in this case, is
bigquery-public-data.crypto_ethereum. To access the metadata, we utilize
INFORMATION_SCHEMA.TABLES, and by selecting all columns using
*, we can view all the details.
To execute the query, simply click the "Run" button located at the top of the editor window, and you should see the following result:
You'll be presented with a list of table names, including tokens, blocks, contracts, and more. In total, there are 11 tables available in this dataset, which are as follows:
In this tutorial, we’ll explore some of these tables by writing queries to query data from them, like query to get Ethereum balances for specific addresses and the latest transactions from a given address, etc. Let's start with the
Before diving into the Ethereum Balances table, it's important to familiarize ourselves with its schema to understand its structure. To query the schema, execute the following query:
SELECT column_name, data_type FROM `bigquery-public-data.crypto_ethereum.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = 'balances';
This query will provide you with the following result:
In the schema, we have the 'address' field, which is of type
STRING, and the
eth_balance field, which is of type
This dataset set is limited to the current balance history of the address and not the balance of ERC20 tokens, the balance history of native tokens, or ERC20 tokens. Let's write some queries to extract data related to addresses.
Get balance of the address
To query the balance of a specific address, we need to filter the data based on the address using the WHERE keyword:
SELECT address, eth_balance FROM `bigquery-public-data.crypto_ethereum.balances` WHERE address = '0xcda7559bcef42e68f16233b5b8c99c757a5f4697';
Get Top 10 Addresses with Highest Balances
To get a list of the top 10 addresses with the highest balances, we simply need to order the
eth_balance in descending order and limit the results to the first 10:
SELECT address, eth_balance FROM `bigquery-public-data.crypto_ethereum.balances` ORDER BY eth_balance DESC LIMIT 10;
The Transactions table provides information about transactions that have occurred on the Ethereum blockchain. Before we dive into querying this dataset, let's explore its schema by running the following query:
SELECT column_name, data_type FROM `bigquery-public-data.crypto_ethereum.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = 'transactions';
This query will display a number of columns related to transactions. The dataset provides comprehensive information about transactions, but for internal transaction details, you can explore them separately. Let's delve into this dataset by writing some queries.
Get transactions by transaction hash
To query a transaction by its hash, we'll use the
WHERE clause to specify the hash value:
SELECT * FROM `bigquery-public-data.crypto_ethereum.transactions` WHERE `hash` = "0x1f5abf832162265242a27b7f8bf6a6ec6b2f4a4fd6d9e681cab6924807636fd7" AND block_timestamp > TIMESTAMP '2023-09-06 00:00:00';
You'll notice a difference here compared to the previous queries. In the query above, you'll see that the term
hash is enclosed in backticks (
), and there's also a block_timestamp
value included. Why is this the case? Well,hash
happens to be a keyword in BigQuery, so to indicate thathash` is the name of the column, we enclose it in backticks.
block_timestamp filter is included to narrow down the data we're retrieving, making it more efficient by filtering through a smaller subset of the table rather than the entire table. This optimization improves the performance of our query.
After running this query, you will see the following result:
Get latest transaction from an address
To get the latest transaction from a specific address, we'll check if either the
to_address matches the target address and then order the results by the transaction timestamp:
SELECT * FROM `bigquery-public-data.crypto_ethereum.transactions` WHERE (from_address = '0xcda7559bcef42e68f16233b5b8c99c757a5f4697' OR to_address = '0xcda7559bcef42e68f16233b5b8c99c757a5f4697') ORDER BY block_timestamp DESC LIMIT 10;
The Token Transfers table provides information on ERC20 transfers from the Ethereum blockchain. Along with query table schema via editor, you can also explore the schema in the BigQuery Dashboard using the explorer, which is on the right-hand side.
You can explore details of the dataset, ranging from the schema of the table to a preview of the data, among other details.
Get token transfers of address
To get token transfers for a specific address, the query is similar to fetching transactions for an address, but with a change in the queried table from transaction to
SELECT * FROM `bigquery-public-data.crypto_ethereum.token_transfers` WHERE (from_address = '0xcda7559bcef42e68f16233b5b8c99c757a5f4697' OR to_address = '0xcda7559bcef42e68f16233b5b8c99c757a5f4697') ORDER BY block_timestamp DESC LIMIT 10;
Get Latest USDT Transfers
SELECT * FROM `bigquery-public-data.crypto_ethereum.token_transfers` WHERE token_address = ‘0xdAC17F958D2ee523a2206206994597C13D831ec7’ AND block_timestamp > TIMESTAMP '2023-09-04 00:00:00' ORDER BY block_timestamp DESC LIMIT 10;
We have explored some tables and written queries that show how to query data. However, you can also create more complex queries by combining multiple tables to extract more meaningful insights. Here are some example queries that you can explore:
- What are the 10 most popular Ethereum collectibles (ERC721 contracts), by number of transactions?
- What are the 10 most popular Ethereum tokens (ERC20 contracts), by number of transactions?
Querying Blockchain Data Using Python
While you can query data using SQL in the editor, you also have the option to explore the data in a Python notebook. The dashboard provides you with the tool to automatically generate a Python notebook for the query you have executed. To create one, click on "Explore Data" and select the "Explore with Python Notebook" option.
This will provide you with a ready-made Python Notebook from which you can access data from the BigQuery Ethereum dataset.
Problems with using BigQuery for Blockchain Data
BigQuery makes it easy to access data with its public datasets, which are great for simple applications. But these datasets have some limits, so if you need a wider range of data, you might face problems.
If you want data through an API, you'd usually need to build your own systems to handle it. To help with this, Bitquery offers different products for different application needs.
Bitquery can give you indexed blockchain data via API, and you can also explore it using our IDE. There's another product called "Data in the Cloud," similar to BigQuery's public dataset, which we'll talk about in the next section.
Bitquery's data covers transactions, token transfers, DEX trades, block data, smart contract details like calls and events, and more. If you want to explore this data, our IDE uses GraphQL to make it easy, so you don't need to worry about complex SQL queries. It even has a query builder, which is helpful if you're not familiar with GraphQL or our API.
Data in the Cloud by Bitquery
Bitquery’s “Data in the Cloud” product provides an even richer dataset that covers the data BigQuery provides and much more.
Datasets provided with the “Data in the Cloud’ product provide comprehensive data coverage for many other things like smart contract calls, events, token transfers (including ERC20, ERC-721, and ERC-1155), DEX trades for 40+ protocols like Uniswap, Balancer, 1Inch, and many more.
In short, if you’re looking for something that provides comprehensive blockchain data via a cloud product, Data in the Cloud is the solution you’re looking for.
Subscribe to our newsletter
Subscribe and never miss any updates related to our APIs, new developments & latest news etc. Our newsletter is sent once a week on Monday.