Bitcoin/Dogecoin and large blk0001.dat and blkindex.dat files

I participate in Dogecoin mining, and at one point tried Bitcoin mining (in earlier days) but decided it wasn’t worth the tradeoff (electricity in Silicon Valley is expensive). Regardless of which “coin”, the overall problem I’m about to discuss is the same.

The wallet softwares (ex. dogecoin-qt.exe, bitcoin-qt.exe, etc.) download a massive number block chains. If you’re curious about the innards, see here.

There’s a lot of technobabble all throughout the technical sections of the Bitcoin wiki, so I’ll sum it up as I understand it: I imagine block chains as a transaction history entry of a coin transfer (when someone sends or receives coins). All transactions gets sent/announced to a series of nodes (servers), which then get handed off to all clients (anyone running the wallet software at that moment in time — otherwise the next time they launch the client, they’ll have to download the blocks between the last time the client was run and present). Clients, I think, send back some sort of confirmation message to the nodes. You can verify this by looking at the transaction history in your wallet, and hovering over the checkbox on the left to see the tooltip which indicates how many confirmations of that transaction there are. As I understand it, the more confirmations there are, the more likely whatever coin transfer (receive or send) was legitimate. (I remember reading somewhere that at least 3, or in some cases 7, confirmations need to be received for a transaction to be considered legitimate)

Anyway, what this means is that essentially every single person using a wallet has to download a large sum of data (and someone who has never run the client before has to download all the block chains since day 1 — which in some cases can literally day 24 hours or more!).

The problems here are almost infinite. I don’t even know where to begin, so I’ll just start dumping data points and information.

The transaction histories for the wallet are stored in files named blkindex.dat and blkNNNN.dat, where NNNN is a number that starts at 00001 and increments every time the file reaches 2 gigabytes. That’s right: 2GB. And that was an intentional design choice given the concerns over filesystems that had 2GB filesize limitations (more specifically a signed 32-bit value).

blkindex.dat is a Berkeley DB btree file (read this if you’re interested in its fields/format), while blkNNNN.dat is quite literally just concatenated bunches of raw data.

As of this writing (2013/01/08), there are 47036 block chains for Dogecoin. My blkindex.dat file is 432MBytes, and my blk0001.dat file is over 969MBytes. I have no debug.log however (and I’ll explain how I got rid of that at the end).

The wallet software (on Windows anyway) defaults to sticking its data into %APPDATA%, which on 98% of systems out there resides on one’s C: drive.

Many workstations or home computers these days are using SSDs for their OS drives, which means limited capacity, and justified concerns over excessive writes being issued to the drive. For example, I sure as hell don’t want a wallet wasting all my erase cycles on my SSD just because some developer somewhere didn’t think ahead.

Neither of these blk*.dat files use any sort of compression. They’re literally just raw data being shoved into files at massive rate. If you think this isn’t a problem (“hard disks today are so big, who cares?”), then you’re intentionally being ignorant. The space-wasting nature of the wallet software has already begun to surface — and it’s only going to get worse as time goes on.

In fact, the problem has been discussed by Bitcoin developers — and I urge anyone reading this blog post to read every single comment posted there, particularly Gavin Andresen’s opinion that compression isn’t worth it, justifying his stance with the following quote, followed immediately by his closing of the ticket:

“25% savings on diskspace is ‘meh’ to me, I vote no. Disk space is cheap, development and testing time is expensive”

The ways of solving this are almost infinite. In fact, there are so many that I’m opting to not even go into details here. This is intentionally an engineer (or engineers) choosing to bury their head in the sand to a guaranteed-to-happen problem. I respect and understand prioritisation of time vs. effort, but this really isn’t something you can just ignore or sweep under the rug. And I’m not the only one who thinks so. Each entry in blkNNNN.dat could indeed be compressed with zlib, gzip, or lzjb and the savings would be easily worthwhile; the CPU hit would be completely negligible given how all of this data is used and how things are designed at present. There’s literally no downside to doing this.

And that’s just about disk usage.

Now think about someone starting up a brand new wallet or being introduced to Bitcoin or Dogecoin (or any coin) — they’re going to have to download all the block chains. Right now this may take hours (for Dogecoin), but as more and more transactions happen (as a result of popularity of digital currency and more people get on the bandwagon), the amount of network traffic starts to become staggering. Because of the distributed nature of it all, I don’t have any real-world numbers for how much network traffic would be used for a full block chain download — but I will point out that there are some places already selling DVDs of block chains (presumably just blkindex.dat and blkNNNN.dat files) to help minimise the network traffic. Ponder that for a little bit, while simultaneously considering the word “scalability” (and that isn’t a word I use often). And from the packets I have looked at, most do not seem to be utilising any form of compression (think Apache mod_deflate), which is also a design disappointment.

But back to the disk usage issue…

My own situation caused me to ask something similar to the fellow over on the doges.org forum: can I use something other than %APPDATA%\Dogecoin, e.g. use my D: drive?

The answer is a big fat YES. The problem is that nobody in any of these digital coin projects (more specifically the developers of the wallet software) bothers to actually document these features in a user-friendly manner, nor do any forum posts (or reddit posts for that matter) bother to discuss them. It’s as if people think all of this is black magic (“Don’t open the box! You’ll let out all the magic smoke!“).

I had to go read the source code to the dogecoin wallet software, specifically the HelpMessage() function in src/init.cpp.

Most (but not all) of those flags you see in the source are usable in your dogecoin.conf file. But I tend to want to keep everything in a single directory (most people refer to this as “standalone”, e.g. a standalone application).

For example, I have all my Dogecoin wallet data in D:\DogeCoin\qtdata, since my D: drive is a 1TB MHDD (while C: is a 256GB SSD). My dogecoin.conf goes in there as well — and I could move all of this to a USB drive, for example, if I wanted it to be portable.

So how did I do that? Using the -datadir flag (passed to the dogecoint-qt.exe executable).

I placed the dogecoin-qt-v14-Win software into D:\DogeCoin\dogecoin-qt-v14-Win, and created a shortcut on my desktop to D:\DogeCoin\dogecoin-qt-v14-win\dogecoin-qt.exe. I then did a Properties on the shortcut and modified the Target to be D:\DogeCoin\dogecoin-qt-v14-Win\dogecoin-qt.exe -datadir=D:\DogeCoin\qtdata

The files within D:\DogeCoin\qtdata should be the same files as what was in %APPDATA%\Dogecoin. So it’s as easy as exiting the client, copying all the files over to the new directory of your choice, then modifying a shortcut and using that from then on. Really it is.

In case you’re curious what’s in my qtdata directory:

 Directory of D:\DogeCoin\qtdata

01/08/2014  01:14    <DIR>          .
01/08/2014  01:14    <DIR>          ..
01/07/2014  22:22    <DIR>          database
12/23/2013  22:37                 0 .lock
01/08/2014  01:15     1,017,079,330 blk0001.dat
01/08/2014  01:15       453,894,144 blkindex.dat
12/23/2013  22:37                 0 db.log
01/05/2014  13:28               261 dogecoin.conf
01/08/2014  01:14           890,808 peers.dat
01/08/2014  01:15            98,304 wallet.dat
               7 File(s)  1,471,962,847 bytes
               3 Dir(s)  903,838,597,120 bytes free

This leads me to what I mentioned earlier: how I got rid of debug.log. Simply edit your dogecoin.conf file and add the following line to it:

printtoconsole=1

At least on Windows, this takes care of the problem. You can still use the debug console (under Help / Debug window / Console) if you need to. I use that for doing things like getblockcount, getmininginfo, and getpeerinfo. The debug console has a command called help in case you want to poke around — but don’t hold me responsible if you break something.

Happy mining — and here’s to hoping the wallet software maintainers pull their heads out of the sand and start actually thinking about the implications of their design choices.