|
Carefully Call Out to Other Resources
Do not put your trust in princes, in
mortal men, who cannot save.
Psalms 146:3 (NIV)
Practically no program is truly self-contained; nearly all programs call out to other programs for resources, such as programs provided by the operating system, software libraries, and so on. Sometimes this calling out to other resources isn't obvious or involves a great deal of ``hidden'' infrastructure which must be depended on, e.g., the mechanisms to implement dynamic libraries. Clearly, you must be careful about what other resources your
program trusts and you must make sure that the way you send requests to them.
Call Only Safe Library Routines
Sometimes there is a conflict between security and the development principles of abstraction (information hiding) and reuse. The problem is that some high-level library routines may or may not be implemented securely, and their specifications won't tell you. Even if a particular implementation is secure, it may not be possible to ensure that other versions of the routine will be safe, or that the same interface will be safe on other platforms.
In the end, if your application must be secure, you must sometimes re-implement your own versions of library routines. Basically, you have to re-implement routines if you can't be sure that the library routines will perform the necessary actions you require for security. Yes, in some cases the library's implementation should be fixed, but it's your users who will be hurt if you choose a library routine that is a security weakness. If can, try to use the high-level interfaces when you must re-implement something - that way, you can switch to the high-level interface on systems where its use is secure.
If you can, test to see if the routine is secure or not, and use it if it's secure - ideally you can perform this test as part of compilation or installation (e.g., as part of an ``autoconf'' script). For some conditions this kind of run-time testing is impractical, but for other conditions, this can eliminate many problems. If you don't want to bother to re-implement the library, at least test to make sure it's safe and halt installation if it isn't. That way, users will not accidentally install an insecure program and will know what the problem is.
Limit Call-outs to Valid Values
Ensure that any call out to another program only permits valid and expected values for every parameter. This is more difficult than it sounds, because many library calls or commands call lower-level routines in potentially surprising ways. For example, many system calls are implemented indirectly by calling the shell, which means that passing characters which are shell
metacharacters can have dangerous effects. So, let's discuss metacharacters.
Handle Metacharacters
Many systems, such as the command line shell and SQL interpreters, have ``metacharacters'', that is, characters in their input that are not interpreted as data. Such characters might commands, or delimit data from commands or other data. If there's a language specification for that system's interface that you're using, then it certainly has metacharacters. If your program invokes those other systems and allows attackers to insert such metacharacters, the usual result is that an attacker can completely control your program.
One of the most pervasive metacharacter problems are those involving shell metacharacters. The standard Unix-like command shell (stored in /bin/sh) interprets a number of characters specially. If these characters are sent to the shell, then their special interpretation will be used unless escaped; this fact can be used to break programs. According to the WWW Security FAQ [Stein 1999, Q37], these metacharacters are: +---------------------------------------------------------------------------+
|& ; ` ' \ " | * ? ~ < > ^ ( ) [ ] { } $ \n \r |
I should note that in many situations you'll also want to escape the tab and space characters, since they (and the newline) are the default parameter separators. The separator values can be changed by setting the IFS environment variable, but if you can't trust the source of this variable you should have thrown it out or reset it anyway as part of your environment variable processing.
Unfortunately, in real life this isn't a complete list. Here are some other characters that can be problematic:
- '!' means ``not'' in an expression (as it does in C); if the return value
of a program is tested, prepending ! could fool a script into thinking something had failed when it succeeded or vice versa. In some shells, the "!" also accesses the command history, which can cause real problems. In bash, this only occurs for interactive mode, but tcsh (a csh clone found in some Linux distributions) uses "!" even in scripts.
- '#' is the comment character; all further text on the line is ignored.
- '-' can be misinterpreted as leading an option (or, as - -, disabling all
further options). Even if it's in the ``middle'' of a filename, if it's preceded by what the shell considers as whitespace you may have a problem.
- ' ' (space), '\t' (tab), '\n' (newline), '\r' (return), '\v' (vertical
space), '\f' (form feed), and other whitespace characters can have many dangerous effects. They can may turn a ``single'' filename into multiple arguments, for example, or turn a single parameter into multiple parameter when stored. Newline and return have a number of additional dangers, for example, they can be used to create ``spoofed'' log entries in some programs, or inserted just before a separate command that is then executed (if an underlying protocol uses newlines or returns as command separators).
- Other control characters (in particular, NIL) may cause problems for some
shell implementations.
- Depending on your usage, it's even conceivable that ``.'' (the ``run in
current shell'') and ``='' (for setting variables) might be worrisome characters. However, any example I've found so far where these are issues have other (much worse) security problems.
What makes the shell metacharacters particularly pervasive is that several important library calls, such as popen(3) and system(3), are implemented by calling the command shell, meaning that they will be affected by shell metacharacters too. Similarly, execlp(3) and execvp(3) may cause the shell to be called. Many guidelines suggest avoiding popen(3), system(3), execlp(3), and execvp(3) entirely and use execve(3) directly in C when trying to spawn a process [Galvin 1998b]. At the least, avoid using system(3) when you can use the execve(3); since system(3) uses the shell to expand characters, there is more opportunity for mischief in system(3). In a similar manner the Perl and shell backtick (`) also call a command shell.
Since SQL also has metacharacters, a similar issue revolves around calls to SQL. See [http://www.spidynamics.com/papers/SQLInjectionWhitePaper.pdf] SPI Dynamic's paper ``SQL Injection: Are your Web Applications Vulnerable?'' for further discussion on this. As discussed in Chapter 4, define a very limited pattern and only allow data matching that pattern to enter; if you limit your pattern to ^[0-9]$ or ^[0-9A-Za-z]*$ then you won't have a problem. If you must handle data that may include SQL metacharacters, a good approach is to convert it (as early as possible) to some other encoding before storage, e.g., HTML encoding (in which case you'll need to encode any ampersand characters too). Also, prepend and append a quote to all user input, even if the data is numeric; that way, insertions of white space and other kinds of data won't be as dangerous.
Forgetting one of these characters can be disastrous, for example, many programs omit backslash as a shell metacharacter [rfp 1999]. As discussed in the Chapter 4, a recommended approach by some is to immediately escape at least all of these characters when they are input. But again, by far and away the best approach is to identify which characters you wish to permit, and use a filter to only permit those characters.
A number of programs, especially those designed for human interaction, have ``escape'' codes that perform ``extra'' activities. One of the more common (and dangerous) escape codes is one that brings up a command line. Make sure that these ``escape'' commands can't be included (unless you're sure that the specific command is safe). For example, many line-oriented mail programs (such as mail or mailx) use tilde (~) as an escape character, which can then be used to send a number of commands. As a result, apparently-innocent commands such as ``mail admin < file-from-user'' can be used to execute arbitrary programs. Interactive programs such as vi, emacs, and ed have ``escape'' mechanisms that allow users to run arbitrary shell commands from their session. Always examine the documentation of programs you call to search for escape mechanisms. It's best if you call only programs intended for use by other programs.
The issue of avoiding escape codes even goes down to low-level hardware components and emulators of them. Most modems implement the so-called ``Hayes'' command set. Unless the command set is disabled, inducing a delay, the phrase ``+++'', and then another delay forces the modem to interpret any following text as commands to the modem instead. This can be used to implement denial-of-service attacks (by sending ``ATH0'', a hang-up command) or even forcing a user to connect to someone else (a sophisticated attacker could re-route a user's connection through a machine under the attacker's control). For the specific case of modems, this is easy to counter (e.g., add "ATS2-255" in the modem initialization string), but the general issue still holds: if you're controlling a lower-level component, or an emulation of one, make sure that you disable or otherwise handle any escape codes built into them.
Many ``terminal'' interfaces implement the escape codes of ancient, long-gone physical terminals like the VT100. These codes can be useful, for example, for bolding characters, changing font color, or moving to a particular location in a terminal interface. However, do not allow arbitrary untrusted data to be sent directly to a terminal screen, because some of those codes can cause serious problems. On some systems you can remap keys (e.g., so when a user presses "Enter" or a function key it sends the command you want them to run). On some you can even send codes to clear the screen, display a set of commands you'd like the victim to run, and then send that set ``back'', forcing the victim to run the commands of the attacker's choosing without even waiting for a keystroke. This is typically implemented using ``page-mode buffering''. This security problem is why emulated tty's (represented as device files, usually in /dev/) should only be writeable by their owners and never anyone else - they should never have ``other write'' permission set, and unless only the user is a member of the group (i.e., the ``user-private group'' scheme), the ``group write'' permission should not be set either for the terminal [Filipski 1986]. If you're displaying data to the user at a (simulated) terminal, you probably need to filter out all control characters (characters with values less than 32) from data sent back to the user unless they're identified by you as safe. Worse comes to worse, you can identify tab and newline (and maybe carriage return) as safe, removing all the rest. Characters with their high bits set (i.e., values greater than 127) are in some ways trickier to handle; some old systems implement them as if they weren't set, but simply filtering them inhibits much international use. In this case, you need to look at the specifics of your situation.
A related problem is that the NIL character (character 0) can have surprising effects. Most C and C++ functions assume that this character marks the end of a string, but string-handling routines in other languages (such as Perl and Ada95) can handle strings containing NIL. Since many libraries and kernel calls use the C convention, the result is that what is checked is not what is actually used [rfp 1999].
When calling another program or referring to a file always specify its full path (e.g, /usr/bin/sort). For program calls, this will eliminate possible errors in calling the ``wrong'' command, even if the PATH value is incorrectly set. For other file referents, this reduces problems from ``bad'' starting directories.
Call Only Interfaces Intended for Programmers
Call only application programming interfaces (APIs) that are intended for use by programs. Usually a program can invoke any other program, including those that are really designed for human interaction. However, it's usually unwise to invoke a program intended for human interaction in the same way a human would. The problem is that programs's human interfaces are intentionally rich in functionality and are often difficult to completely control. Interactive programs often have ``escape'' codes, which might enable an attacker to perform undesirable functions. Also, interactive programs often try to intuit the ``most likely'' defaults; this may not be the default you were expecting, and an attacker may find a way to exploit this.
Examples of programs you shouldn't normally call directly include mail, mailx, ed, vi, and emacs. At the very least, don't call these without checking their input first.
Usually there are parameters to give you safer access to the program's functionality, or a different API or application that's intended for use by programs; use those instead. For example, instead of invoking a text editor to edit some text (such as ed, vi, or emacs), use sed where you can.
Check All System Call Returns
Every system call that can return an error condition must have that error condition checked. One reason is that nearly all system calls require limited system resources, and users can often affect resources in a variety of ways. Setuid/setgid programs can have limits set on them through calls such as setrlimit(3) and nice(2). External users of server programs and CGI scripts may be able to cause resource exhaustion simply by making a large number of simultaneous requests. If the error cannot be handled gracefully, then fail safe as discussed earlier.
Avoid Using vfork(2)
The portable way to create new processes in Unix-like systems is to use the fork(2) call. BSD introduced a variant called vfork(2) as an optimization technique. In vfork(2), unlike fork(2), the child borrows the parent's memory and thread of control until a call to execve(2V) or an exit occurs; the parent process is suspended while the child is using its resources. The rationale is that in old BSD systems, fork(2) would actually cause memory to be copied while vfork(2) would not. Linux never had this problem; because Linux used copy-on-write semantics internally, Linux only copies pages when they changed (actually, there are still some tables that have to be copied; in most circumstances their overhead is not significant). Nevertheless, since some programs depend on vfork(2), recently Linux implemented the BSD vfork(2) semantics (previously vfork(2) had been an alias for fork(2)).
There are a number of problems with vfork(2). From a portability point-of-view, the problem with vfork(2) is that it's actually fairly tricky for a process to not interfere with its parent, especially in high-level languages. The ``not interfering'' requirement applies to the actual machine code generated, and many compilers generate hidden temporaries and other code structures that cause unintended interference. The result: programs using vfork(2) can easily fail when the code changes or even when compiler versions change.
For secure programs it gets worse on Linux systems, because Linux (at least 2.2 versions through 2.2.17) is vulnerable to a race condition in vfork()'s implementation. If a privileged process uses a vfork(2)/execve(2) pair in Linux to execute user commands, there's a race condition while the child process is already running as the user's UID, but hasn`t entered execve(2) yet. The user may be able to send signals, including SIGSTOP, to this process. Due to the semantics of vfork(2), the privileged parent process would then be blocked as well. As a result, an unprivileged process could cause the privileged process to halt, resulting in a denial-of-service of the privileged process' service. FreeBSD and OpenBSD, at least, have code to specifically deal with this case, so to my knowledge they are not vulnerable to this problem. My thanks to Solar Designer, who noted and documented this problem in Linux on the ``security-audit'' mailing list on October 7, 2000.
The bottom line with vfork(2) is simple: don't use vfork(2) in your programs. This shouldn't be difficult; the primary use of vfork(2) is to support old programs that needed vfork's semantics.
Counter Web Bugs When Retrieving Embedded Content
Some data formats can embed references to content that is automatically retrieved when the data is viewed (not waiting for a user to select it). If it's possible to cause this data to be retrieved through the Internet (e.g., through the World Wide Wide), then there is a potential to use this capability to obtain information about readers without the readers' knowledge, and in some cases to force the reader to perform activities without the reader's consent. This privacy concern is sometimes called a ``web bug.''
In a web bug, a reference is intentionally inserted into a document and used by the content author to track who, where, and how often a document is read. The author can also essentially watch how a ``bugged'' document is passed from one person to another or from one organization to another.
The HTML format has had this issue for some time. According to the [http:// www.privacyfoundation.org] Privacy Foundation:
Web bugs are used extensively today by Internet advertising companies on Web pages and in HTML-based email messages for tracking. They are typically 1-by-1 pixel in size to make them invisible on the screen to disguise the fact that they are used for tracking. However, they could be any image (using the img tag); other HTML tags that can implement web bugs, e.g., frames, form invocations, and scripts. By itself, invoking the web bug will provide the ``bugging'' site the reader IP address, the page that the reader visited, and various information about the browser; by also using cookies it's often possible to determine the specific identify of the reader. A survey about web bugs is available at [http:// www.securityspace.com/s_survey/data/man.200102/webbug.html] http:// www.securityspace.com/s_survey/data/man.200102/webbug.html.
What is more concerning is that other document formats seem to have such a capability, too. When viewing HTML from a web site with a web browser, there are other ways of getting information on who is browsing the data, but when viewing a document in another format from an email few users expect that the mere act of reading the document can be monitored. However, for many formats, reading a document can be monitored. For example, it has been recently determined that Microsoft Word can support web bugs; see [http:// www.privacyfoundation.org/advisories/advWordBugs.html] the Privacy Foundation advisory for more information . As noted in their advisory, recent versions of Microsoft Excel and Microsoft Power Point can also be bugged. In some cases, cookies can be used to obtain even more information.
Web bugs are primarily an issue with the design of the file format. If your users value their privacy, you probably will want to limit the automatic downloading of included files. One exception might be when the file itself is being downloaded (say, via a web browser); downloading other files from the same location at the same time is much less likely to concern users.
Hide Sensitive Information
Sensitive information should be hidden from prying eyes, both while being input and output, and when stored in the system. Sensitive information certainly includes credit card numbers, account balances, and home addresses, and in many applications also includes names, email addressees, and other private information.
Web-based applications should encrypt all communication with a user that includes sensitive information; the usual way is to use the "https:" protocol (HTTP on top of SSL or TLS). According to the HTTP 1.1 specification (IETF RFC 2616 section 15.1.3), authors of services which use the HTTP protocol should not use GET based forms for the submission of sensitive data, because this will cause this data to be encoded in the Request-URI. Many existing servers, proxies, and user agents will log the request URI in some place where it might be visible to third parties. Instead, use POST-based submissions, which are intended for this purpose.
Databases of such sensitive data should also be encrypted on any storage device (such as files on a disk). Such encryption doesn't protect against an attacker breaking the secure application, of course, since obviously the application has to have a way to access the encrypted data too. However, it does provide some defense against attackers who manage to get backup disks of the data but not of the keys used to decrypt them. It also provides some defense if an attacker doesn't manage to break into an application, but does manage to partially break into a related system just enough to view the stored data - again, they now have to break the encryption algorithm to get the data. There are many circumstances where data can be transferred unintentionally (e.g., core files), which this also prevents. It's worth noting, however, that this is not as strong a defense as you'd think, because often the server itself can be subverted or broken.
|
|