What Can Buffer Overflow Attacks Do and Not Do

February 15, 2022 Post a Comment

What causes a buffer overflow?

A buffer overflow occurs when a program tries to write as well much data into the buffer. This can cause the plan to crash or to execute arbitrary lawmaking. Buffer overflow vulnerabilities exist only in low-level programming languages such as C with direct access to memory. However, they also affect the users of high-level web languages because the frameworks are often written in low-level languages.

The idea of a buffer overflow vulnerability (also known as a buffer overrun) is unproblematic. The following is the source code of a C program that has a buffer overflow vulnerability:

          char greeting[5]; memcpy(greeting, "Hullo, earth!\north", 15); printf(greeting);

What do you think will happen when we compile and run this vulnerable plan? The answer may be surprising: anything tin happen. When this code snippet is executed, it will try to put xv bytes into a destination buffer that is only five bytes long. This means that ten bytes will be written to memory addresses outside of the array. What happens later depends on the original content of the overwritten ten bytes of memory. Maybe important variables were stored at that place and we accept merely changed their values?

The case to a higher place is broken in such an obvious fashion that no sane programmer would make such a fault. So let'due south consider another example. Let's suppose that we need to read an IP address from a file. We can do information technology using the following C code:

          #include <stdio.h> #define MAX_IP_LENGTH 15 int chief(void) {   char file_name[] = "ip.txt";   FILE *fp;   fp = fopen(file_name, "r");   char ch;   int counter = 0;   char buf[MAX_IP_LENGTH];   while((ch = fgetc(fp)) != EOF) {     buf[counter++] = ch;   }   buf[counter] = '\0';   printf("%s\due north", buf);   fclose(fp);   return 0; }

A mistake in the higher up case is non so obvious. We assume that the IP address, which we want to read from a file, will never exceed 15 bytes. Proper IP addresses (for example, 255.255.255.255) tin can't be longer than 15 bytes. Nonetheless, a malicious user tin prepare a file that contains a very long fake cord instead of an IP accost (for example, 19222222222.16888888.0.1). This string volition cause our program to overflow the destination buffer.

If you call up that even this bug is too obvious and that no programmer would make such a mistake, stay tuned. Further on, you will come across a real-life example of a buffer overflow issues which occurred in a serious project and is not much more sophisticated than the above example.

Stack buffer overflow assault example

At present that we know a program tin overflow an array and overwrite a fragment of retentivity that information technology should not overwrite, let's see how this tin can be used to mount a buffer overflow assail. In a typical scenario (called stack buffer overflow), the problem is caused – similar so many issues in information security – by mixing data (meant to be processed or displayed) with commands that command program execution.

In C, like in most programming languages, programs are built using functions. Functions call each other, pass arguments to each other, and return values. For example, our code, which reads an IP address from a file, could be part of a part chosen readIpAddress, which reads an IP address from a file and parses it. This part could be called by some other office, for example, readConfiguration. When readConfiguration calls readIpAddress, it passes a filename to it and then the readIpAddress role returns an IP address equally an array of 4 bytes.

Fig. 1. The arguments and the render value of the readIpAddress office

Fig. 1. The arguments and the return value of the readIpAddress function

During this function call, three different pieces of information are stored side-by-side in computer memory. For each program, the operating system maintains a region of retention which includes a part called the stack or call stack (hence the proper noun stack buffer overflow). When a function is called, a fragment of the stack is allocated to it. This piece of the stack (called a frame) is used to:

Recall the line of lawmaking from which programme execution should resume when function execution completes (in our case, this will be a specific line in the readConfiguration part)
Shop the arguments passed to the office by its caller (in our case, permit'south presume /home/someuser/myconfiguration/ip.txt)
Store the render value that the part returns to its caller (in our case, information technology'due southa four-byte assortment, let's say (192, 168, 0, one))
Store local variables of the called function while this office is existence executed (in our case, the variable char[MAX_IP_LENGTH] buf)

And then if a program has a buffer allocated in the stack frame and tries to insert more data than tin fit there, user input data may spill over and overwrite the retentiveness location where the return accost is stored.

Fig. 2. Contents of the stack frame when the readIPAddress part is called

Fig. 2. Contents of the stack frame when the readIPAddress function is called

If the problem was caused by random malformed user input information, the new return accost about likely volition not point to a memory location where any other plan is stored, so the original program will simply crash. However, if the data is advisedly prepared, it may atomic number 82 to unintended lawmaking execution.

The get-go step for the assaulter is to ready special data that can be interpreted equally executable lawmaking and will work for the assaulter'southward benefit (this is called ashellcode). The 2d pace is to identify the accost of this malicious data in the verbal location where the return address should be.

Fig. three. The content of ip.txt overwrites the return address

Fig. 3. The content of ip.txt overwrites the return address

In effect, when the function reads the IP grapheme string and places it into the destination buffer, the render address is replaced past the address of the malicious code. When the office ends, programme execution jumps to malicious code.

Tin you forbid buffer overflows?

Since the discovery of the stack buffer overflow attack technique, authors of operating systems (Linux, Microsoft Windows, macOS, and others) take been trying to find prevention techniques:

The stack tin can be made non-executable, so even if malicious code is placed in the buffer, it cannot be executed.
The operating system may randomize the memory layout of the accost space (retentiveness space). When malicious code is then placed in a buffer, the attacker cannot predict its address.
Other protection techniques (for example, StackGuard) modify a compiler in such a way that each office calls a piece of code that makes sure the return address has non changed.

In practice, even if such protection mechanisms make stack buffer overflow attacks harder, they don't brand them incommunicable. Some of these measures may also bear on operation.

Buffer overflow vulnerabilities be in programming languages which, like C, trade security for efficiency and do non cheque memory access. In higher-level programming languages (eastward.chiliad. Python, Java, PHP, JavaScript or Perl), which are often used to build web applications, buffer overflow vulnerabilities cannot exist. In these languages, you only cannot put excess data into the destination buffer. For example, try to compile and execute the following piece of Coffee code:

          int[] buffer = new int[five]; buffer[100] = 44;

The Coffee compiler will not warn you, but the runtime Java virtual machine will detect the trouble and instead of overwriting random retentiveness, it will interrupt program execution.

Buffer overflows and the Web

Even so, even programmers who apply high-level languages should know and care about buffer overflow attacks. Their programs are often executed within operating systems that are written in C or utilize runtime environments written in C, and this C code may be vulnerable to such attacks. In order to run into how a buffer overflow vulnerability may affect a programmer using such a loftier-level programming linguistic communication, allow'south analyze CVE-2015-3329 – a real-life security vulnerability discovered in the PHP standard library in 2015.

A PHP application is a drove of *.php files. In order to make it easier to distribute such an awarding, it may be packed into a unmarried file archive – as a zilch file, a tar file, or using a custom PHP format called phar. A PHP extension called phar contains a form that you can apply to piece of work with such archives. With this class, you can parse an annal, listing its files, extract the files, etc. Using this class is quite uncomplicated. For case, to extract all files from an archive, apply the following code:

          $phar = new Phar('phar-file.phar'); $phar->extractTo('./directory');

When the Phar course parses an archive (that's new Phar('phar-file.phar')), it reads all filenames from the archive, concatenates each filename with the archive filename, and then calculates the checksum. For example, for an archive chosen myarchive.phar that contains files index.php and components/hello.php, the Phar form calculates checksums of ii strings: myarchive.pharindex.php and myarchive.pharcomponents/hello.php. The reason why the authors implemented it this mode is non important here – what is of import is how they implemented information technology. Until 2015, this operation was done using the following function (run into the old PHP source code):

          phar_set_inode(phar_entry_info *entry TSRMLS_DC) /* {{{ */ {         char tmp[MAXPATHLEN];         int tmp_len;          tmp_len = entry->filename_len + entry->phar->fname_len;         memcpy(tmp, entry->phar->fname, entry->phar->fname_len);         memcpy(tmp + entry->phar->fname_len, entry->filename, entry->filename_len);         entry->inode = (unsigned brusk)zend_get_hash_value(tmp, tmp_len); }

As you tin come across, this function creates a char assortment of called tmp. First, the name of the phar archive (in our example, myarchive.phar) is copied into this array using the following control:

          memcpy(tmp, entry->phar->fname, entry->phar->fname_len);

In this command:

The first argument, tmp, is a destination where bytes should be copied.
The second argument, entry->phar->fname, is a source from where bytes should be copied – in our example, the filename of the archive (myarchive.phar).
The tertiary statement, entry->phar->fname_len, is a number of bytes that should exist copied – in our example it is the length (in bytes) of the archive filename.

The role copies the filename (in our instance, index.php or components/hullo.php) into the tmp char array using the following command:

          memcpy(tmp + entry->phar->fname_len, entry->filename, entry->filename_len);

In this command:

The get-go statement, tmp + entry->phar->fname_len, is a destination where bytes should be copied – in our case, information technology is a location in the tmp array just after the end of the archive filename.
The 2nd argument, entry->filename, is a source from where bytes should be copied.
The third argument, entry->filename_len, is a number of bytes that should exist copied.

Then the zend_get_hash_value function is called to calculate the hashcode.

Notice how the size of the buffer is declared:

          char tmp[MAXPATHLEN];

Information technology has a size of MAXPATHLEN, which is a constant defined every bit the maximum length of a filesystem path on the current platform.

The authors assumed that if they concatenate the filename of the archive with the name of a file inside the annal, they will never exceed the maximum allowed path length. In normal situations, this assumption is met. However, if the attacker prepares an archive with unusually long filenames, a buffer overflow is imminent. The function phar_set_inode will cause an overflow in the tmp array.

An attacker can use this to crash PHP (causing a deprival of service) or even make it execute malicious code. The problem is similar to our uncomplicated instance from above – the developer fabricated a elementary mistake, trusted user input too much, and causeless that the information will e'er fit in a fixed-size buffer. Fortunately, this vulnerability was discovered in 2022 and fixed.

How to avert buffer overflow vulnerabilities

Programmers tin mitigate the run a risk of buffer overflow attacks by always validating user input length. However, a adept general fashion to avoid buffer overflow vulnerabilities is to stick to using safe functions that include buffer overflow protection (whichmemcpy does not). Such functions are available on different platforms, for instance, strlcpy, strlcat, snprintf (OpenBSD) or strcpy_s, strcat_s, sprintf_s (Windows).

Article written by: Piotr Sobolewski

About the Writer

Zbigniew Banach

Technical Content Writer at Invicti. Drawing on his experience as an IT journalist and technical translator, he does his all-time to bring web security to a wider audience on the Netsparker web log and website.

beambetation.blogspot.com

Source: https://www.netsparker.com/blog/web-security/buffer-overflow-attacks/

Beam Betation