Tl;dr: This report updates on what Josie, a Bitcoin CoreDev, and Coinbase Crypto Community Fund grant recipient, has been working on over the first part of their year-long Crypto development grant. This specifically covers their work on bitcoin transaction privacy.
Since late last year, I have been working with a group of researchers on a project centered around Bitcoin transactions with two or fewer outputs. While the research is still on-going, we identified an opportunity for improvement with respect to Bitcoin transaction privacy. This post details the motivation for the change and work completed thus far.
Privacy in Bitcoin transactions
When thinking about privacy in Bitcoin, I find the following definition helpful:
“Privacy is the power to selectively reveal oneself to the world” — Eric Hughes (1993)
This definition motivates the following statement, “Software should never reveal more information than necessary about a user’s activity.” Applied to Bitcoin transactions, this means we should attempt to keep the payment address and amount private between the payer and payee. One way to break this privacy today is through the “Payment to a different script type” heuristic.
In short, this heuristic works by inferring which of the outputs in a transaction is the change output by examining script types. If a transaction is funded with bech32 (native segwit) inputs and has two outputs, one P2SH and the other bech32, it is reasonable to infer the bech32 output is a change address generated by the payee’s wallet. This allows an outside observer to infer the payment value and change value with reasonable accuracy.
How big of a problem is this?
But how often does this happen? Is this worth improving at all or is it a rare edge case? Let’s look at some data!
Payments to different script types over time
In analyzing transactions from 2010 — present, we found this type of transaction first appearing after the 2012 activation of P2SH addresses, and growing significantly after the 2017 segwit activation. From 2018 onward, these types of transactions account for ~30% of all transactions on the Bitcoin blockchain. This is expected to continue to increase over time as we see increased taproot adoption, which introduces the new bech32m address encoding. This means that we have an opportunity to improve privacy for up to 30% of all Bitcoin transactions today if every wallet had a solution for this.
How can we improve this?
The first step to solve this problem is to match the payment address type when generating a change output. From our earlier example, this means our wallet should instead generate a P2SH address so that the transaction is now bech32 inputs to two P2SH outputs, effectively hiding which of the outputs is the payment and which is the change.
This was logic was merged into Bitcoin core in #23789 — meaning that our wallet will now have a mix of output types depending on our payment patterns. What happens when we spend these UTXOs? Is our privacy from the original transaction still preserved?
Mixing output types when funding a transaction
As it turns out, we might still leak information about our first transaction (txid: a) when spending the change output in a subsequent transaction. Consider the following scenario:
mixing input types in subsequent transactions
- Alice has a wallet with bech32 type UTXOs and pays Bob, who gives them a P2SH address
- Alice’s wallet generates a P2SH change output, preserving their privacy in txid: a
- Alice then pays Carol, who gives them a bech32 address
- Alice’s wallet combines the P2SH UTXO with a bech32 UTXO and txid: b has two bech32 outputs
From an outsider observer’s perspective, it is reasonable to infer that the P2SH Output in txid: b was the change from txid: a. To avoid leaking information about txid: a, Alice’s wallet should avoid mixing the P2SH output with other output types and either fund the transaction with only P2SH outputs or with only bech32 outputs. As a bonus, if txid: b can be funded with the P2SH output, the change from txid: b will be bech32, effectively cleaning the P2SH output out of the wallet by converting it to a payment and bech32 change.
Avoid mixing different output types during coin selection
I have been implementing this logic in Github with ongoing work and review..
If this topic is interesting to you, or if you are looking for ways to get involved with Bitcoin Core development, you can participate in the upcoming Bitcoin PR Review Club for #24584 (or read the logs from the meeting).
Ongoing work
If this logic is merged into Bitcoin Core, my hope is that other wallets will also implement both change address matching and avoid mixing output types during coin selection, improving privacy for all Bitcoin users.
This work has inspired a number of ideas for improving privacy in the Bitcoin Core wallet, as well as improving how we test and evaluate changes to coin selection. Many thanks to Coinbase for supporting my work — I hope to find other opportunities for improvement motivated by analysis as our research continues.
Coinbase is officially seeking applications for our 2022 developer grants focused on blockchain developers who contribute directly to a blockchain codebase, or researchers producing white papers. Learn more about the call for applications here.
Improving Transaction Privacy on the Bitcoin Blockchain was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.