FTP is the most common means of moving files to and from a server. When the FTP specification was written, certain convenience features were recommended to client implementers. This was done through four data type definitions. With these definitions, the clients can perform transformations of the data to ease the burden of performing these transformations yourself every time you upload a file.
The data types are: ASCII, EBCDIC, Image, and Local. The first two are different character sets that the local file can be converted to during transmission to the other server. Image is now commonly known as “binary mode” and it transfers the data without changing it in any way. Local allows hosts to specify custom byte sizes for storage and transmission. For this article, I will be focusing on ASCII and Image and what can go wrong if you choose the incorrect transfer mode.
Why use ASCII mode?
In addition to converting the local character set to the NVT-ASCII, line endings are converted to the style in use on the recipient’s machine. If a file is uploaded from a Windows machine to a Linux machine, the line endings will be converted from CRLF to LF, and visa-versa when you download the file. This was probably quite handy years ago, otherwise the files would end up with extra gaps from extra CRs (*nix) or all one line from missing CRs (Windows). However, today this isn’t much of a problem. Text editors on both operating systems can now convert to their desired types of line endings. On Linux, there’s usually some indication of this when you open the file.
“[ Read 4 lines (Converted from DOS format) ]”
On the Windows side, WordPad reads both formats without issue. Notepad, however, only knows how to read CRLF, so if you see a file all one line in Notepad, this is why.
How does the client choose to transfer the files?
The only way for multiple modes to work out is if there is some way for the FTP client to know how it should transfer the files you’ve queued up. This is done through file extensions. Clients generally contain a configurable list of extensions that should be transferred in ASCII mode. Additionally, some clients allow you to force a specific mode for all transfers. Enter “Auto mode.” In this mode, the client will look at the extension lists and decide how to transfer the file, rather than always using one mode. When a file lacks an extension, there is no information available for the client to make a decision about the file. In this case, FTP clients should always transfer the file in binary mode. This will prevent possible data loss if the file isn’t pure text.
What’s the problem?
If a file is transferred in the wrong mode, it can become corrupted. For text files, it usually doesn’t matter what mode you’re transferring them in unless you’re expecting to see how the file actually looks on the server. For any kind of binary file (images, executables, etc…), transferring them in ASCII mode will ruin them. Your file had an 0x0A (LF) in it? Well, now it has 0x0D 0x0A (CRLF) in that spot. That was an image? That’s unfortunate, because now it’s going to render differently (if at all).
This is easy enough to fix, just switch the transfer mode to Binary and re-upload the files.
The big problem
Backups. These handy little things let you sleep at night knowing that your data is replicated somewhere should something bad happen. It also lets you move to a new web host, should the need arise. If the backup gets corrupted by your FTP client, you may be in tears after you try to restore it. I have seen this several times in our support forums. A user pulls the files down from an old host and subsequently loses all of their attachments due to corruption. The old host has ended their contract with the user and they now have no good backup.
Why do attachments break? For your own safety, uploaded attachments have their names changed to a hash and their file extensions are removed. (You can read about a few of the reasons for our attachment handling here.) Most FTP clients adhere to their list of ASCII extensions and upload everything else in binary mode. Some clients though, like FileZilla, default to transferring extensionless files in ASCII mode.
Find the transfer settings page and make sure that extensionless files are not set to be transferred in ASCII mode. In FileZilla, this is listed under Edit → Settings → Transfer → File Types → Treat files without extensions as ASCII files (uncheck this box!). Alternatively, you can just set your client to always transfer files in binary mode. If you have a text editor that natively supports both line endings, then it doesn’t matter how it comes to you.
Recovering corrupted files
Recovering the files is not possible at this point. The only option is to change the transfer mode and download them again if you still have access to the originals. “But can’t I just re-upload the files and let the line endings get converted back?” Unfortunately, no. Existing Line Feeds would have had a Carriage Return added in front of them, and existing CRLFs would have been left alone. Uploading it would convert both the new and the previous CRLFs to LF. The file is lost.
Why have they done this?
How an FTP client can assume that a completely unknown file is going to be pure ASCII is beyond me. This is an assumption that should have NEVER been made, the default should be binary mode for unknown data. If you’re backing up a whole Linux file system to your Windows machine to restore later, all of your binaries will be corrupted since they don’t have file extensions. FileZilla is aware of the issue, but their developers respond with comments like, “All the extensionless files I transfer really are text files.” This was three years ago and it still hasn’t been changed! About a week ago, one of our developers asked in their IRC channel about this and it went largely unanswered. The main response was that Windows Notepad can’t handle Unix style line endings, so ASCII mode should stay. It was also re-iterated to us that they have never personally transferred an extensionless file that wasn’t pure text. It’s curious how that makes FileZilla omnipotent about the types of files you’re transferring. Trading file integrity for the assumption that all unknown files are ASCII files and that it will be opened in Windows Notepad is just irresponsible. Let the users make that call and leave the box unchecked by default.
There have been some comments from our own users that we should fix this by leaving the existing extensions or adding a known binary extension. The first option leads to security issues as mentioned in the previously linked article. The second moves the assumptions from the FTP client to us. Now we are making a guess as to what your data could be and how your client might handle it. This position is no better than the one taken by FileZilla. The .phpbb extension would probably be safe to use, but there’s no guarantee that there isn’t an FTP client out there that will have problems with bogus extensions. It’s best to just leave data as it is, on both sides.
Do other clients do this?
It’s possible, but I have not personally seen other clients do this. WinSCP has a very similar layout when it comes to selecting transfer modes, but extensionless files are automatically transferred in binary mode. FireFTP defaults to binary mode for all transfers. Choosing automatic mode lists extensions that will be transferred in ASCII mode, others are transferred in binary mode. CoreFTP defaults to transferring a specific set of extensions in ASCII mode, but provides the option of ignoring them all and transferring them in binary mode.
With the tools we have available today, I think ASCII mode should fade way. It’s a file transfer protocol, so it should just transfer files and not try to be smart about what you might want to do with them later. The reality is that ASCII mode won’t be going away any time soon, so configure your clients, and maybe reply to bug tickets about the issue. 😉