Technical SupportSkip Navigation
border

Conferences and Events | Online Resources | Programs | Security | Services | Shared Network | Technical Support | Training
About MOREnet | Contact Us | Search | MyMOREnet Login | Collaboration Matrix


Home » Technical Support » Research and Innovation » Detecting DoS Attacks
Document Links
 
Spacer Graphic

Detecting Denial of Service Attacks

Introduction

A Denial of Service (DoS) attack is an attempt by some party to prevent a targeted network resource from supplying its intended services. The target could be a single computer, a network of computers or an entire organization. Such an attack can include a physical attack on a host or its supporting infrastructure, a successful hacking attempt that changes the configuration of the host or its infrastructure, a DNS exploit that causes clients of the service to be unable to reach it or a myriad of other assaults on the target's functionality. DoS attacks are particularly problematic because running a firewall, keeping patches up to date and following all the other recommended security practices does not necessarily provide any defense.

MOREnet's interest in detecting DoS attacks at the core stems from several incidents over the past few years that have caused slowdowns in the backbone network, sometimes to the point of making parts of it unusable. Several of the ATM line cards used in MOREnet's core routers behave badly in the face of massive congestion on individual ATM PVCs. Attacks that should affect only a single customer can impact other customers and backbone services by consuming all available resources in these cards. Although MOREnet has made architectural changes to reduce this risk, it is still important to have better tools in place so network engineers can quickly identify and characterize a DoS attack to minimize the impact on MOREnet and its customers.

In this paper we will discuss various types of attacks and the tools that exist to detect them. We are primarily concerned with flooding attacks as those generally have the greatest impact on the customer under attack and cause the greatest collateral damage to other customers' connectivity. We will mention other attacks for completeness, but a discussion of detection and mitigation strategies for these attacks lies outside the scope of this document.

Back to top

Categories of Denial of Service

There are many ways that an attacker might prevent a target resource from being accessed. The type of attack used is dependant on the attacker's aims and the resources at their disposal. We cover a variety of them below but the list is far from complete. Perhaps the greatest danger is the attack you haven't thought to protect against yet.

Network Flooding Attacks

The simplest form of network-based denial-of-service attack is a traffic flood. The attacker simply transmits so much data to the target that its network links are overwhelmed and it can no longer reliably communicate with clients. Slightly smarter flooding attacks send traffic that will require the target itself to consume resources in some way. This strategy might include CPU time, memory, kernel resources or disk space. The average attacker doesn't have a sufficiently large connection to threaten a large organization directly but there are ways for attackers to magnify their power through the use of intermediaries. Below are several forms of flooding attacks.

ICMP/UDP Flood

If an attacker has access to greater bandwidth than his or her target, a simple blast of data is often sufficient to saturate its network link. For instance, a machine at a university or large company can easily send a stream of data capable of saturating the bandwidth of a dialup, cable modem or DSL user. This sort of attack isn't very sophisticated, but it is sufficient for hackers trying to knock each other off of their IRC channels. Legitimate users sometimes launch attacks at the source site but more often someone from outside has compromised a host at the source location. Using a compromised host allows the attacker to hide his or her tracks better as well as gain access to the greater bandwidth at the site. ICMP and UDP are both good candidates for use in floods because they are simple and connectionless. Sending ICMP pings is effective as is sending UDP packets to a running service on the target because both not only eat bandwidth but also require target to spend CPU time processing the packets and possibly sending a response. This attack creates load on the target host and can congest the target's outgoing bandwidth as well.

ICMP Broadcast Attack (Smurf)

One of the earliest methods attackers found to increase the power of their DoS attacks is called a Smurf attack. The attacker sends an ICMP ping packet to the IP broadcast address of a network (or multiple networks) known not to filter such packets. This destination network is not the target of the attack but an intermediary. The attacker forges the header of each packet and sets source address to the IP address of the real target. Because the packet is to the broadcast address every machine on the intermediate network sends an ICMP reply back to the forged source address. This effectively multiplies the size of the original packet by the number of hosts on the intermediate network. Assuming that the intermediate networks have a large amount of bandwidth available they will often be able to overwhelm the target. A single individual on a dialup connection could make a large corporation's network unusable if a large enough intermediate network could be found.

UDP Broadcast Attack (Fraggle)

The attack known as Fraggle appeared shortly after Smurf and, like Smurf, depends on sending packets to the broadcast address of an intermediate network. Instead of ICMP, though, a UDP packet is used. Several simple UDP-based services (ECHO, DAYTIME, CHARGEN and others) listen for packets to their port numbers and send replies to the packet's source. Thus, a UDP packet to the echo port of the broadcast address on the intermediate network could result in tens or hundreds of packets sent to the spoofed source address. Fraggle has the advantage of working with intermediate or target networks that filter ICMP. However, these UDP services have never been very widely implemented so finding usable intermediate networks is difficult.

DNS Servers as Intermediaries

Some newer attacks use DNS servers as intermediaries. The attacker sends a DNS query with the source address set to the IP of the target to a DNS server at a well-connected site. The query is a simple one that the attacker knows will result in a large response. For example, a request for the root name servers is the shortest possible request and requires only 49 bytes. The result is 324 bytes long. The DNS server sends the response to each packet's source address (the target) resulting in a larger stream of traffic headed to the target, thus multiplying the size of the attack.

TCP Attacks

TCP flooding attacks are not normally intended to saturate the target's network connection. Instead the goal is to consume resources on the target machine itself and perhaps crash it. One type of TCP-based attack is called a SYN attack. The attacker generates a stream of connection requests to the target, possibly with spoofed source addresses. This begins the three-way TCP handshake required for a full TCP connection. The target sends an acknowledgement to the source address and waits for a acknowledgement in return from the source in order to finish the connection setup. While the target waits the half-open connection sits idle in its operating system kernel and consumes some of the limited memory and table space. The attacker never sends the expected acknowledgement so the half-open connection remains alive until it times out, which could take minutes. Soon all of the available space for half-open connections is consumed and the target can no longer accept new connections, even from legitimate sources.

Distributed Denial of Service Attacks

Most networks that were once configured to allow external broadcast packets have been fixed over the past few years, causing the effectiveness of Smurf and Fraggle attacks to dwindle so attackers have had to find other ways to amplify the power of their attacks. One obvious answer is to just take over a large number of hosts and use them to attack in concert. Distributed attack tools were created to control these compromised machines, often called zombies, and allow a user to launch an attack from all of them with a single command. DDoS tools have evolved over the past few years and have become hard to detect, easy to use and highly automated. This development has made it possible for individuals less skilled or disciplined than the original authors to launch attacks with little chance of getting caught. Some attackers have been able to put together large groups of zombies and bring the largest sites on the Internet to their knees. Yahoo and Amazon have both been victims of well-publicized DDoS attacks.

Deployment

Methods of deploying Distributed Denial of Service (DDoS) initially required manual effort but over time they have become more automated. The Code Red worm and more functional Ramen worm were able to automatically scan for hosts with vulnerabilities, use the vulnerabilities to compromise target hosts and install themselves. The newly compromised host would then begin scanning for new targets. A new worm called Slapper loads a fully functional DDoS attack tool on hosts as it compromises them.

Back to top

Non-flooding Denial of Service Attacks

Flooding attacks are a bit like professional wrestling. They involve lots of brawn but usually very little subtlety. There are other, more devious and sometimes more effective ways to disrupt connectivity to a target host or service. These are usually less difficult to defend against than a huge traffic flood but hackers are constantly finding new vulnerabilities to exploit.

Operating System Bugs

One method of taking down a target is to exploit software bugs within the target or its infrastructure devices. New vulnerabilities in operating systems and server software are being found all the time.

Operating system kernel-level bugs are uncommon and are usually fixed relatively quickly, but they can result in a crash of the entire target computer. For instance, older versions of several operating systems were susceptible to the "ping of death." If an ICMP ping was received that was larger than the allocated buffer, the entire computer would freeze or reboot. IP fragmentation attacks take advantage of bugs in the way an operating system deals with fragmented IP packets. Some systems will crash if the fragments of a packet overlap (an illegal condition).

Server software bugs have been common for a long time. Most modern operating systems partition applications so that these bugs can't crash the entire machine but they may still allow the server software to be crashed or permit an intruder some control over the host machine. There have been numerous security bugs reported in mail server software, web servers, database servers, DNS servers and many others. A wide variety of SNMP agents have been found to hang or crash if they receive a badly formatted SNMP packet. These problems appear in commercial and open-source software and are often the result of simple, well known mistakes such as buffer overflows or bounds-checking mistakes.

Device Configuration Mistakes

Exploiting improper device configuration or other administration mistakes can lead to compromises or crashes. Failing to set a password or using a weak password makes it easier for an attacker to take over the machine and disable it. Failure to set up routing properly might allow connectivity to the device to be disrupted. A variety of mistakes in server software configuration such as web or FTP servers might allow access to unintended parts of the server. Failure to properly secure some often-ignored protocols such as syslog might allow an attacker to overload the servers in a number of ways.

Resource Overload Attacks

Characteristics of the normal operation of a piece of software can be exploited to cause a denial of service. Simply making valid requests to a website at a high enough rate could overload it and make it unusable. This is particularly true if some services on the target require substantial work to perform. For instance, a dynamic web page that requires a complex database search could be abused fairly easily by making requests for it at a higher rate than the server can handle. Sending a large volume of e-mail messages to a particular mail server might exhaust disk space and cause the server to reject messages. Some types of servers such as network news servers have long-lived connections and each one requires substantial resources. Opening many connections could use up all memory or CPU resources on the target.

The worst overloads often occur when an extreme number of legitimate users of a service try to use it at once. Many major news sites were overloaded and became unreachable on September 11, 2001 as news-hungry readers searched for new details of the attacks. The term "Slashdot effect" describes the overload of a site's web servers after an article about the site is placed on the http://slashdot.org message board and thousands of interested readers try to check out the referenced site at the same time.

Weak Security Models

Exploiting weaknesses in the security models of the target device or infrastructure can provide a means of access. Some protocols transmit a password across the network without encryption. If the attacker can gain access to the network between the client and target he or she may be able to capture the password. Many firewalls and other access controls provide filtering based on the source IP address, but IP addresses can be spoofed. Spoofing is trivial for UDP-based protocols and more difficult but still possible with TCP. If a host's operating system uses insufficiently random sequence numbers in TCP connections then attackers might be able to insert their own data into an existing TCP connection, allowing the attacker to cause damage or take over the host. Various mistakes made in implementing encryption algorithms can make the encryption relatively easy to break. Many other tricks are possible.

Routing Protocol and Infrastruction Attacks

It is important to remember that the host doesn't have to be directly attacked for service to be disrupted. If attackers can gain access to a router or switch in the path between the target and the rest of the world then they could reconfigure it in a variety of ways to break connectivity. Also, dynamic routing protocols can be misused to redirect or "black hole" traffic. If an attacker can gain access to a trusted router on the Internet they could potentially introduce a fake BGP route announcement to redirect traffic. Some organizations fail to sufficiently secure their internal routing protocol and might accept routing announcements from devices outside their control. Anyone connected to a router configured this way, even over a modem pool, could introduce a bogus route to misdirect traffic to the target from the organization or even the Internet at large. Vulnerabilities in DNS servers have also been exploited to introduce fake address records and misdirect traffic to another machine.

Social Engineering

Social engineering is yet another potential method for attackers to accomplish their aims. Tricking a security guard into allowing someone physical access is only one possibility. It might also be possible to trick someone who has legitimate access to equipment such as a technician or administrator into doing something destructive. This technique could include getting them to unplug the wrong cable or to make a server or infrastructure configuration change that inadvertently causes a vulnerability or loss of service. As with physical attacks vulnerability can extend to equipment well outside the target organization's control such as the power company equipment, telephone company switches, Internet service provider routers or other critical infrastructure. For instance, a forged request to a DNS registrar might trick the registrar into assigning the target's DNS domain to the attacker.

This is far from a complete list. There are lots of clever computer criminals with time on their hands coming up with new methods of attack all the time.

Back to top

Physical Attacks

Though it might be stretching some people's use of the term "Denial-of-Service," the most extreme method of preventing access to a resource would be to attack the physical computers and infrastructure devices (power systems, network connections, routers, etc.) providing the service. Attacks on the machine itself can range from simply unplugging the network or power cable of the target computer to bashing it to pieces with a sledgehammer. The same attacks could be used against any infrastructure equipment between the target and its clients, including such devices as Ethernet switches, routers, ATM switches and the CSUs, MUXes and other equipment required for wide-area network circuits. A particularly determined attacker might even be able to smuggle explosives into the equipment room, possibly inside the chassis of a piece of networking or computer equipment.

Even if the equipment room the servers are housed in proves to be secure against an attack, there are other ways to disable a service. If the attacker can get into other parts of the building the equipment resides in they might be able to break into a main electrical closet and cut power or break a water pipe on a higher floor to flood the equipment room. No matter what the level of security it is probably impossible to be completely safe against an attacker with sufficient resources. A good fake ID or a gun would likely be sufficient to get past most non-military security. Finally, a disgruntled employee could be the greatest physical threat of all.

A knowledgeable attacker could disrupt services even without gaining entry to the building where the target resides. Backhoe operators frequently cause accidental denials of service by cutting copper or fiber network connections. Intentional cuts are quite possible if the attacker can get information on the location of communications or power lines. In the most extreme cases damage to telephone company switching facilities or to power company substations or transformers could cause a wide-spread outage that could affect the target as well.

We believe it unlikely that an attacker would be sufficiently determined to physically assault MOREnet's infrastructure, except perhaps indirectly as a result of an attack on some other target. There are non-physical means to disrupt our infrastructure that would be less dangerous to the attacker and could be nearly as effective. More significant physical threats would include accidents such as a backhoe hit, train derailment, or fire and natural disasters such as lightning strikes, tornados or earthquakes. MOREnet has reasonable disaster recovery schemes to deal with these possibilities, though possibly not without network disruption.

Back to top

Collateral Damage

DoS attacks don't always affect only the intended target. For instance, scanning is a method of gathering information about networks but is normally not an attack itself. However, rapidly scanning every possible IP on large networks can result in ARP storms as the connected routers try to find each host, and the large number of ARP entries that collect in the router (most for nonexistent hosts) may exhaust the router's table space. For instance, the infamous Code Red worm never succeeded in its intended goal but its heavy use of scanning to find more hosts to infect caused collateral damage on many networks.

Back to top

Hardening the Network Against Attack

The first and best way to deal with attacks is to either prevent them from happening or to take proactive steps to limit their damage. The most critical step is to keep up to date on software patches on all equipment in the network including desktop machines, servers, switches, routers and other infrastructure equipment. First, keeping everything current limits the chance of the equipment having catastrophic bugs that could cause the it to crash, reboot or hang when presented with unexpected traffic. Also, it decreases the chance that someone might be able to take over the device with all the dangers already mentioned.

Protecting against loss of service from physical problems is important but must be balanced against expense. Filtered power, backup power and backup air conditioning systems can protect equipment from damage and keep it operational in emergencies. Temperature and humidity sensors can allow technicians to catch and fix environmental problems before they can damage equipment. Keeping spare parts on hand can reduce the length of outages. Having multiple network circuits and physically diverse building entry points can protect against loss of network connectivity. A well-written and closely followed site security policy can prevent unauthorized persons from getting access to the equipment.

Traffic filtering and rate limiting should be employed to control unwanted traffic as close to the source as possible. Dropping bad traffic as soon as it can be identified saves network resources downstream. Rate limiting protocols that normally see little volume such as ICMP can reduce the danger of some forms of DoS attack. Unfortunately these things are difficult to do at the core of a provider network. MOREnet can't simply block various types of traffic network-wide because some customers may have a legitimate need for them. Also, MOREnet's core equipment doesn't perform well with access controls in place. Traffic filters that overload the router CPU during an attack could actually contribute to the loss of service. The best MOREnet can do is to protect its server farms and business networks with these methods.

Where MOREnet has critical servers, it could provide redundancy and separate critical systems to limit the damage of a DoS attack. From a survivability standpoint, it is often better to have two or more moderate-sized web servers with load balancing between them than to have a single large server. Also, having multiple IPs means it is possible to block access to one without killing a service completely. Consider the Code Red worm that attacked a single IP address associated with www.whitehouse.gov. The White House website was made up of multiple servers on different IPs and load-balanced between them. When the attack started their network provider create a black-hole route for the targeted IP while the rest of the cluster operated normally.

Separating different critical servers onto different Ethernet networks, if possible with different WAN links, can prevent an attack against one service from affecting others. Where possible, front-end and back-end networks should be separated as well. For instance, if only a small group of application servers need access to a database server then a private network can be set up between these devices. The public side of the application servers would be on a different network. An attack against the application servers would not affect the database server.

Another basic defensive strategy is to overprovision the network and be prepared for the worst case instead of just the normal one. In other words, circuits should have extra capacity to absorb attacks. Routers and other infrastructure equipment should have sufficient extra CPU and memory to handle worst-case scenarios. Multiple smaller routers can be used in the place of a single large router to lessen the impact of hardware failure or a router crash. Smaller routers also tend have more CPU capacity relative to their total traffic capacity than larger routers. Use of access lists and other CPU-intensive features should be thought out with respect to the capabilities of the router and the maximum traffic it might be presented with. Features that work well under normal conditions could overload a router when an attack is in progress. Many features that would work acceptably at the edge may be unsafe or impossible at the core. Wherever possible CPU intensive work such as packet filtering and packet marking should be distributed to the edge routers rather than performed on core routers.

To prevent customers from sourcing DoS attacks MOREnet should configure our customer edge routers to prevent IP spoofing. This configuration doesn't completely solve the problem of spoofing since the attack program can still use another IP address at the same site as a source address, but it does at least make it more difficult for a customer's machines to be used in an attack without it being discovered, however.

Most of the hardening methods above require money to implement. Serious thought must to be given to how much is enough, how much is too much and which options offer the most protection for the available budget.

Back to top

DoS Detection Methods

We have primarily researched methods of detecting flooding attacks for this paper. These attacks are easy to launch, require little knowledge about the target and can potentially disrupt MOREnet's network as well as the customer networks. Detection means simply noticing that something irregular is going on. Verification and characterization of attacks is discussed later in this paper.

The least desirable but most certain way to detect when an attack is underway is from customer calls to the help desk. Obviously, customers will complain any time they are seeing bad performance and this may be due to normal traffic rather than an attack. If a single customer complains then it is probably a smaller attack that isn't having a major impact on the rest of the backbone. If multiple callers complain of high latency then it is more likely to be a major attack that is congesting our Internet circuits or the backbone.

A somewhat better indicator of an attack is by observing slow interactive access to equipment at other hub locations. In this case the latency is the result of either an unusual traffic spike or of an attack that is congesting the backbone links.

MOREnet would like to know about a network slowdown more quickly than our customers do. It would be ideal to have some sort of automated system to detect and report anomalies. A simple solution would be to use a tool to monitor bandwidth or ping latency thresholds and report unusual spikes. Such a tool could be purchased or could be written in-house. This tool still doesn't provide any clues as to exactly what is going on but at least it lets technicians know congestion is occurring and gives an idea of where.

The ideal method of DoS detection would be one that observes the actual traffic on the network and look for anomalies. In theory, this method can tell us not only that something is happening but possibly what and where. DoS detection of this sort must be done at the core of the network in order to detect an attack against any customer. However, the requirement of capturing detailed traffic data limits solutions to only those that will fit within MOREnet's backbone architecture. For MOREnet, a successful DoS detection system would have to meet the following requirements.

  1. The system should not need to have probes placed in-line on MOREnet's network circuits. The MOREnet backbone is made up of OC3 and OC12 circuits (OC12 is used for local interconnects only) running either ATM or Packet-over-Sonet (POS). In the past, the majority of in-line devices, whether they are for caching, filtering or statistics collection, have tended to support only Fast or Gigabit Ethernet. Those devices that do support ATM or POS are usually prohibitively expensive and MOREnet would generally need one probe per circuit, making the cost multiply quickly. A system that can use the header information exported by NetFlow would be ideal from an architecture standpoint because it is already active in the network.
  2. The system should not require user-configured policies in order to detect traffic anomalies. Some vendors require the user to configure rules such as "if more than X bits per second are being transmitted to host Y then take some action." MOREnet is just a provider and has little idea of what policies should be configured for its customers. Most MOREnet customers lack any security policy and none have one that could easily be converted into rules for such a system. Even if this were possible to implement for the known servers of all customers, there is no way a reasonable policy rule could be formulated for student and staff workstations at all sites. Unfortunately, these are more likely to be targets than any of the major servers. A DoS detection system should recognize attacks based on the patterns of traffic involved.
  3. The system should not detect abnormal traffic based solely on the previous history of IP addresses. The problem here is that many MOREnet customers assign IPs for students via DHCP. With DHCP, an IP address that might have been totally unused yesterday could be transferring enormous quantities of data today. History by IP address alone is very important but probably not sufficient for accurately identifying DoS attacks.
  4. The system should provide some means of viewing a partial snapshot of the traffic it believes is a DoS attack. A full dump of the packets is not possible with NetFlow since it only stores header information, but basic header information from a limited number of packets or flows is critical. Technicians need this data to verify that the anomaly is really is an attack and to determine any valuable details.
  5. The system should report all, or nearly all, of the Denial-of-Service attacks that occur but should have minimal false alarms. Perfect detection with no false alarms is obviously an unreachable goal, but a system that misses many attacks is useless. A system that generates too many false alarms will likely be ignored and end up being equally useless. MOREnet's staff are very busy with their regular duties and do not have the time to verify a large number of false reports. MOREnet's security group estimates that if more than 300 false alarms a year must be dealt with then they would require another staff member to take part of the load.
  6. The system should report DoS attacks quickly. A report within five minutes of the start of the attack is ideal, and within ten is probably acceptable. The system doesn't necessarily have to tell us that there is a problem before we or our customers notice it, but it should tell us what the problem is and where it is occurring in much less time than we could figure it out ourselves.
  7. The system must be able to send reports of possible DoS attacks via e-mail. The ability to send SNMP traps is also desirable, as is a web-based interface for viewing a list of attacks in progress and details about them.
  8. If the system includes the ability to automatically block attacks then it must be possible to turn this feature off. Ideally, the system would simply provide a suggested remedy and allow humans to make the final decision on what to do.

Back to top

Verifying and Characterizing Attacks

Once an anomaly is noticed staff must begin the hard process of figuring out what is really going on. This task is complicated by the fact that DoS tools go to some effort to make themselves hard to detect and block. Also, many new applications, particularly multimedia applications and peer-to-peer file trading tools, are difficult to identify. These applications generate a lot of traffic on odd port numbers and are easy to confuse with a DoS attack. Worse, they may in fact be an unintentional DoS attack, so, in some cases, ignoring them could end up being the wrong thing to do. Even if MOREnet purchases an automated DoS detection system it will still be necessary for staff to spend some time verifying an attack is real as these systems aren't likely to be 100% accurate.

The first and simplest tool available for locating where congestion is occurring is Ping. By pinging between the core routers and observing the resulting latency, technicians can quickly determine if the outage is affecting the backbone and if so then where. Pinging along each path (Columbia to St. Louis, Kansas City and Jefferson City, then Jefferson City to St. Louis and Springfield, and finally from Kansas City to Springfield) will indicate which circuits are OK and which are slow. If pings between one router and all of its directly connected routers are slow then the router itself is overwhelmed, either by ordinary traffic or an attack. Otherwise, the problem is either a circuit problem or traffic saturation along a backbone path, and saturation may also indicate a DoS.

The show ip route-cache flow command on the router is a good way to spot attacks consisting of large numbers of packets with consistent source, destination and port numbers. On 12000-series routers it is first necessary to attach to a line card with the attach <slot number> command. Then we run show ip cache flow | include K to view all those flows of more than 1000 packets. Particularly large flows may be a sign of an attack. Looking at the protocol and port numbers for large flows should allow the operator to guess whether the traffic is legitimate or not. Remember that each linecard only contains records of the flows passing through it so it might be necessary to check more than one. Unfortunately many DoS attacks use randomized source IPs and port numbers and the NetFlow inspection features on the router provide no help in this case.

Another tool in MOREnet's toolbox is Network Health, our circuit statistics system. Network Health is most useful once you have an idea of which core router seems to be affected by latency. It is possible to run trend reports (simple line graphs) for up to 10 subinterfaces at a time and look for any that seem to have peaked out (show a nearly flat line of high traffic). Unfortunately, this is a very slow way to find the problem. Network Health's real value is in checking suspicious circuits. If the circuits show up as flat-lined then it is worthy of further investigation.

MOREnet's NetFlow reports page can provide more detailed information about what is occurring at a site after it has been identified. Looking at the top incoming IPs over the past hour can give a technician an idea of what machine is likely to be under attack. If outgoing traffic is the problem then the report might show the culprit under the top outgoing IPs, but this is less likely since the source IPs are easily spoofed. Large amounts of traffic on odd port numbers might give a hint as to what sort of attack is occurring, providing the attack doesn't use randomized port numbers.

Finally, MOREnet collects raw statistics that can be mined for details of what is occurring. Viewing summaries of traffic on each interface of a core router that is performing badly can give an idea of which interfaces might be overloaded. Traffic can be filtered in a variety of ways and summarized by nearly any combination of interface number, IP address, protocol and port number, site ID or AS number. It may take running many different reports to discover the right incantation to make the attack stand out clearly. The process can be slow, particularly since there are often several times more flows than normal to sift through during an attack. Unfortunately, the raw data is currently scattered across several different servers for load balancing purposes. Also, the report tool is rather complex to use. Currently only the Network Systems group has access to the accounts needed to run these reports. Since it is easy to accidentally create reports many megabytes in size it doesn't seem wise to create a front end for the report program.

Back to top

Attack Mitigation

Once the characteristics of the attack have been determined (the target, some of the sources and the type of traffic involved, etc.) MOREnet can take steps to block it. The options available for preventing the attack from reaching its target are generally fairly simple but are also rather limited.

The first option is to black hole all traffic to the target IP. If a single IP address is under attack then it is easiest to create a "null" route that essentially discards all traffic to that IP. If the target were "1.1.1.1" then the command ip route 1.1.1.1 255.255.255.255 null0 would perform this action in any of MOREnet's hub routers. This should be done in the router closest to the ingress point of the traffic if at all possible. Note that although this option protects the network it also completes the job that the attacker started. The target can no longer reach the network.

The next option is an access list. Access lists require more knowledge on the part of the technician involved to create, but can be tailored to fit an attack much better. An access list can block traffic based on source address, port numbers and other criteria which makes it far more flexible than a null route. The disadvantage is that access lists are much more resource-intensive than null routes. Performance problems with access lists in MOREnet's Cisco 12000-series routers have occurred in the past so blocking at this level is not an option. MOREnet's newer Cisco 10000 distribution-layer routers seem to be much better suited for this type of activity so far.

The third option, rate limiting, is even more of a performance problem and MOREnet has not experimented with it yet. If the target is being attacked by a type of traffic that it doesn't need to operate (e.g., an ICMP flood against a web server), and if it is critical that access to the device is not be blocked completely then it may be worthwhile to use this feature. The 12000 routers almost certainly cannot handle rate limiting but the 10000 routers may be able to. To use it, a technician simply creates an access list X matching the unwanted traffic, then uses the interface configuration command rate-limit output access-group X <bps rate> exceed-action drop. This technique shouldn't be used until MOREnet has a clear idea of what impact it could have on the router.

Finally, if the attack doesn't end fairly quickly then MOREnet needs to contact the upstream provider that is delivering the traffic and ask them to block the traffic. They can black hole or filter it before it reaches MOREnet's circuit, thus preventing Internet connectivity from being wasted. It often takes time to find an engineer at our providers that can do this, however, so blocking the traffic ourselves should always be the first step.

Back to top

DoS Detection: Lessons Learned

As part of our research we have made a simple attempt to create our own DoS detection system in order to understand the difficulties involved. We used NetFlow as our source of data since we already collect it for statistical purposes. The tool was run on a periodic basis over data from a single core router. We generated statistics on total bytes and traffic flows per IP address and then threw away those that are responsible for less than 1% of the total traffic in each category. We compared the activity of the resulting addresses with historical data we collected for them in order to find unusual traffic spikes. We experimented with a several different methods of determining when traffic was out of profile but none were wholly satisfactory. We also experimented with some simple heuristics to try to identify specifically what sort of activity was occurring, but this technique also proved more problematic than we expected. Traffic patterns we first thought would be indicative of improper activity often turned out to be common among legitimate software as well.

The tool was reasonably good at detecting traffic anomalies and found many network scans along with a few DoS and DDoS attacks. Unfortunately, the real problem activity was nearly lost in the noise of the false alarms. Most of the reported anomalies weren't security problems at all but were simply ordinary traffic spikes such as those of file-sharing programs, multimedia streaming and large FTP transfers. We never got to a point where false alarms could be limited to a reasonable level.

One worrying thing we found was that even humans often have difficulty telling if a series of traffic flows to a device is really an attack or not. NetFlow data doesn't preserve the order in which packets are received or the payload of the packets. Both would have provided a lot of help in figuring out what was really going on. We thought we had a good heuristic for detecting port scans but even this turned out to be more difficult than we originally thought as a few peer-to-peer file-sharing programs turned out to have similar characteristics. In the final analysis, it is probably beyond our abilities to write a DOS detection tool that catches all attacks and still provides an acceptably low level of false positives.

Back to top

Possible DoS Detection Solutions

There are several possible options. Each solution has its pros and cons, and it might be possible to use some of them together.

Purchase a Commermcial Product

Products of this type are relatively new but are starting to be implemented in a few provider networks. Some of these products must inspect traffic inline but several can use NetFlow data. Commercial detection products claim to detect DoS attacks and other suspicious activity through either network base lining or by applying rule sets. The drawback is that there are only a few vendors of special-purpose DoS detection systems so far and they appear to be rather expensive. One solution cost in excess of $100,000 for the MOREnet network. It is not clear how quickly such a system would pay for itself without a good idea of how often MOREnet customers are really subject to attacks, how long they normally last and what impact they have on the network. Also, we have not yet placed one of these devices in our network to observe its behavior. Vendors have assured MOREnet technicians that the system really does catch most or all flooding DoS attacks without extensive false alarms, but more testing would be necessary before any system could be implemented.

Purchase an IDS

Some, but not all, intrusion detection systems (IDS) are also capable of detecting DoS attacks in progress. This doesn't appear to be a good option for protecting our entire network. A true IDS must be able to look into the payload of packets, requiring it to exist as a "bump on the line" between our router and our ATM switch or a vendor's SONET gear. This is an architectural problem because many IDS devices would be required to cover all of MOREnet's network links. Many IDS solutions lack interfaces to monitor the ATM and POS circuits used in our backbone, and those that do are quite expensive. Also, according to independent product reviews these IDS tools aren't likely to handle traffic on a scale like that of our backbone. Finally these tools also are reported to generate many false alarms. Tuning an IDS to prevent these false alarms might require more knowledge about our customers than we as a network provider are ever likely to have, and would likely be extremely time consuming.

Write a Detection System

MOREnet could possibly build a tool to process NetFlow data and look for suspicious traffic patterns that might indicate Denial of Service attacks. As mentioned above we have made some initial attempts. However, the problem of accurately identifying attacks while generating a minimum of false alarms seems complex and may require skills and effort beyond what we have available. It would take a human an average of one to two minutes per hub router per 15-minute period to do a good job of looking over the output of the existing experimental tool, plus considerably more time if something suspicious were detected. This could work out to more than half of an FTE for all five hubs, which is probably unacceptable. Improving accuracy through more complex heuristics is certainly possible, but it is uncertain if we could ever get them to an acceptable level of false alarms.

Use and Event Correlation System

There are tools that purport to correlate events from our own and participating customers' firewalls and intrusion detection systems. Implementing something like this would likely be useful for detecting scanning, automated worms and other widespread attacks, but it probably wouldn't be much help in identifying DoS attacks against individual targets.

Back to top

Recommendations

Recommendations on the best course of action for MOREnet obviously depend on the scope of what we decide to accomplish. We see the following three options depending on the level of effort we are willing to step up to.

Do Nothing

The simplest and cheapest solution is to simply do nothing. The serious latency issues experienced on the backbone during the 2001-2002 school year were the result of the deficiencies of the ATM line cards in some of MOREnet's core routers. MOREnet introduced new distribution-layer routers over the summer and these have been shown to handle customer line saturation without excessively impacting other customers.

However, individual customers would still be in danger of attack and MOREnet would have no method to determine if attacks are occurring, who is being attacked and in what manner. Attacks could still consume large portions of our Internet and backbone bandwidth. The impact to other customers would no longer be debilitating but it would still be considerable.

Do Simple Congestion Monitoring

If MOREnet must do something about DoS attacks then most basic need is to detect when customer, backbone or Internet circuits become saturated. This can be done with a fairly simple application to monitor utilization levels and generate an alarm of some sort when they become too high. This application should be fairly easy to write in-house and should not require much in the way of resources to operate. As mentioned before, this option only helps technicians know when and where the network is becoming saturated. It won't help figure out what is really going on. It could take anywhere from 10 minutes to more than an hour for technicians to run NetFlow reports manually and determine what is really going on. This is painfully slow when a customer is having problems and looking to MOREnet for an answer.

Full DoS Detection

The most effective but difficult and expensive option is to provide a full-fledged DoS detection system for our network. As we indicated previously, writing a system in-house that has acceptable accuracy and low false-alarm rates is going to be difficult and time consuming at best. At worst we may never be able to write such an application. Purchasing a system would give us a more reliable tool with much less staff time required for implementation. The prices for commercial DoS Detection tools are high, however, and there is little guarantee that commercial software would truly meet our performance goals. Note that we do recommend a tool developed specifically for DoS detection, not a general-purpose IDS. A full-blown IDS is probably not feasible for MOREnet's network.

Back to top

Additional Resources

Arbor Networks webpage - http://www.arbornetworks.com

Asta Networks - http://www.astanetworks.com

Captus Networks - http://www.captusnetworks.com

Esphion - http://www.esphion.com

Mazu Networks - http://www.mazunetworks.com

Panoptis - http://panoptis.sourceforge.net

Back to top

References

CERT, "Denial of Service Attacks", http://www.cert.org/tech_tips/denial_of_service.html

CERT, "DoS Attacks Using Nameservers", http://www.cert.org/incident_notes/IN-2000-04.html

CERT, "Managing the Threat of Denial-of-Service Attacks", http://www.cert.org/archive/pdf/Managing_DoS.pdf

CERT, "Trends in Denial of Service Attack Technology", http://www.cert.org/archive/pdf/DoS_trends.pdf

Dittrich, Dave, University of Washington, "Distributed Denial of Service Attacks/Tools", http://staff.washington.edu/dittrich/misc/ddos/

Messmer, Ellen, Network World, "The Anti-DDos prescription", http://www.nwfusion.com/buzz2001/antiddos/, Sept. 24, 2001

Messmer, Ellen, Network World, "Test Reveals IDS Strengths, Weaknesses", http://www.nwfusion.com/news/2002/0701ids.html, July 1, 2002

Back to top

Credits

Will O'Brien, MOREnet Network Systems
Beth Young, MOREnet Security

Back to top

Glossary

ATM - Asynchronous Transfer Mode
A technology developed to allow many voice, video and data streams to flow across the same physical circuit or circuits. Each stream is called a PVC or SVC and generally has specific bandwidth and timing guarantees.
BGP - Border Gateway Protocol
The protocol used to exchange routing information between IP networks that are under different administrative control. It is generally needed whenever a network connects to two or more peers or providers. It provides many route filtering and control features needed for complex routing policies.
Black hole
The act of throwing away traffic. Also a machine or interface that discards traffic. Generally this means sending packets to a Null interface or bit bucket. Packets go in but they don't come out.
DDoS - Distributed Denial of Service
Many network hosts attacking a single target.
DoS - Denial of Service
An attack or action which prevents legitimate users from accessing a resource or service.
Fraggle
A small plush creature from children's show created by Jim Henson. Also, a variant of the Smurf attack which uses UDP Echo packets instead of ICMP Echo packets.
IDS - Intrusion Detection System
These tools inspect all traffic on a network, including the payload. They look for attempts to compromise machines within the network based on a set of pattern matching rules. IDSs often know about specific types of exploits and can also recognize some general classes of suspicious activity. They tend to be highly configurable but require a good understanding of the network in order to configure effectively.
NetFlow
A proprietary Cisco technology for collecting IP traffic information. Each flow consists of all observed packets that have the same source and destination IP address, protocol, port numbers, type of service and other header parameters. Data about these flows can be forwarded to servers which can process the data for traffic statistics, detection of attack patterns, etc. Only IP and TCP or UDP header data is recorded. There is no access to payload information.
PVC - Permanent Virtual Circuit
This is a logical stream of data that runs across a physical network circuit.
Smurf
A small, blue cartoon creature. Also, a form of network flooding attack that uses an intermediate network's broadcast address to multiply the ICMP response packets sent to the target.
Spoofing
In the context of the Internet spoofing generally means forcing the source address of an IP packet to be something other than the real IP of the transmitting machine.
SVC - Switched Virtual Circuit
This is similar to a PVC, except that SVCs are created and destroyed on demand.
Zombie
A zombie is a mindless undead creature known for attacking in groups. Often many are under the control of a single evil individual. The word also refers to a computer that has been compromised and has had software installed on it to allow remote control.

Back to top

border
Copyright © 2002 MOREnet. All rights reserved. Reviewed October 8, 2002.
Contact strategic-tech@more.net. DMCA and other copyright information.
Site Information: Copyright, accessibility, privacy and other information about this site.
PageMinder: Receive an e-mail notice when this page updates.

Search MOREnet  Advanced Search