At Cyberstorm Mauritius we work on several projects and code for fun. One of the interesting projects we have look at is an application called Tarsnap which is used to perform a secure backup on the cloud. At Cyberstorm Mauritius, myself (@TheTunnelix) and Codarren (@Devildron) recently send codes to Tarsnap and same were approved. That's really cool when someone's code is approved and used worldwide by thousands of companies. Today, I have the privilege to speak on Tarsnap at the DevConMru 2016 which was held at Voila hotel, Bagatelle. On reaching there, I was impressed by the number of people already waiting inside the conference room who were curious about Tarsnap. Some were entrepreneurs whilst others were students. I should say around 30 people attended the conference. Since it was a Sunday at 11:30 am, the team did not hesitate to bring some beer to the little crowd present there. I was busy setting up my laptop for the presentation.
As usual, I like to get the attention of my audience before the presentation. My first slide showed the logo of Tarsnap upside down.
Everyone was turning their head and making the effort to read the content. And here we go. I noticed that they are all ready and curious about it.
Check out the Slide here. Please wait some minutes. It's loading...
The basics of Tarsnap were explained. Tarsnap take streams of archive data and splits then into variable-length blocks. Those blocks are compared and any duplicate blocks are removed. Data de-duplication happens before its uploaded to the Tarsnap server. Tarsnap does not create Temporary files but instead create a cache file on the client. The cache file is the files that are being back up to the Tarsnap server. After deduplication, the data is then compressed, encrypted, signed and send to the Tarsnap server. I also explained that the archived are saved on an Amazon S3 with EC2 server to handle it. Another interesting point raised was the concept of Tarsnap which uses smart Rsync-like block oriented snapshot operations that upload only data which is charged to minimize transmission costs. One does not need to trust any vendor cryptographic claims and you have full access to the source codes which uses open-source libraries and industry-vetted protocols such as RSA, AES, and SHA.
Getting on to the other part of Tarsnap and Bandwidth, an emphasis was made on Tarsnap which synchronized blocks of data using a very intelligent algorithm. Nowadays, there are companies that still use tapes for backups. Imagine having so many tapes and when restoration time has arrived, this would take tremendous time. Tarsnap compresses, encrypts and cryptographically signs every byte you send to it. No knowledge of cryptographic protocols is required. At this point, I asked a question about volunteers who are thinking to look at the Tarsnap code. There were three persons who raised their hands. The importance of the key file was raised up as some companies secure their private key in a safe. Tarsnap also supports the division of responsibilities where an explanation was laid out where a particular key can only be used to create an archive and not delete them.
An analogy between google drive compared to Tarsnap was given. Many already understood the importance of Tarsnap compared to Google Drive. The concept of deduplication was explained using examples. For the network enthusiasts, I laid emphasis on the port 9279 which should not be blocked on the firewall as Tarsnap runs on the following port number. Coming to confidentiality, the matter was made clear enough to the audience how much the data is secured. If it happens someone lost the key there is no way of getting back the data.
Tarsnap is not an open source product. However, their client code is open to learn, break and study. I laid emphasis on the reusable open source components that come with Tarsnap, for example, the Script KDF (Key derivation function). KDF derives one or more secret keys from a secret value such as a master key, a password or passphrase or using a pseudo-random function. The Kivaloo data store was briefly explained. Its a collection of utilities which together form a data store associating keys up to 255 bytes with a value up to 255 bytes. Writes are accepted until data has been synced. If A completed before B, B will see the results of A. The SPIPED secure pipe daemon which is a utility for creating symmetrically encrypted and authenticated pipes between socket addresses so that one may connect to one address.
I also explained to the audience the pricing mechanism which was perceived rather cheap for its security and data deduplication mechanisms. Tarsnap pricing works similarly as a prepaid utility-metered model. A deposit of $5 is needed. Many were amazed when I told them that the balance is a track to 18 decimal places. Prices are paid exactly what is consumed.
Other interesting features such as regular expression support and interesting kinds of stuff with the dry run features of Tarsnap was given. The concept of Tar command compared to Tarsnap was also explained. Commands, hints, and tricks explained.
At the end, i consider it really important to credit Colin, the author of Tarsnap and i have been strongly inspired by the work of Michael Lucas on Tarsnap. Indeed, another great achievement of Cyberstorm Mauritius at the DevConMru 2016.