Five well-known Border Gateway Anomalies (BGP) anomalies
WannaCrypt, Moscow blackout, Slammer, Nimda, Code Red I, occurred in May 2017, May 2005, January 2003, September 2001, and July 2001, respectively.
The Reseaux IP Europeens (RIPE) BGP update messages are publicly available from the Network Coordination Centre (NCC) and contain:
WannaCrypt, Moscow blackout, Slammer, Nimda, Code Red I, and regular data: https://www.ripe.net/analyse/.
Regular data are also collected from BCNET: http://www.bc.net/.
37 features are extracted from BGP update messages that originated from AS 513 (route collector rrc 04). The data collected during periods of Internet anomalies include:
BGP update messages are originally collected in multi-threaded routing toolkit (MRT) format.
"Zebra-dump-parser" written in Perl is used to extract to ASCII the BGP updated messages.
The 37 BGP features were extracted using a C# tool to generate uploaded datasets (csv files).
Labels have been added based on the periods when data were collected.
File content:
List of features extracted from BGP update messages:
RIPE
Route Views
- WannaCrypt (WannaCry) is a cryptoworm ransomware that works by gaining administrative privileges and employs the EternalBlue exploit and DoublePulsar backdoor in systems running Microsoft Windows 7.
- The Chagino substation of the Moscow energy ring experienced a transformer failure on May 24, 2005 at 20:57 (MSK). The event caused a complete shutdown of the substation and a blackout that affected all customer until 16:00 (MSK) of May 26, 2005. During the blackout, the Internet traffic exchange point MSK-IX was disconnected from 11:00 to 17:00 (MSK).
- Slammer infected Microsoft SQL servers through a small piece of code that generated IP addresses at random. The number of infected machines doubled approximately every 9 seconds.
- Nimda exploited vulnerabilities in the Microsoft Internet Information Services (IIS) web servers for Internet Explorer 5. The worm propagated by sending an infected attachment that was automatically downloaded once the email was viewed.
- The Code Red I worm attacked Microsoft IIS web servers by replicating itself through IIS server weaknesses Unlike the Slammer worm, Code Red I searched for vulnerable servers to infect. The rate of infection was doubling every 37 minutes.
- eight-day period for WannaCrypt (four days of the attack as well as two days prior and two days after the attack);
- five-day period for Moscow blackout, Slammer, and Code Red I (the day of the attack as well as two days prior and two days after the attack);
- six-day period for Nimda (two days of the attack as well as two days prior and two days after the attack). Note that there are 31 missing data points in the Nimda dataset.
Raw data from the "route collector rrc 04" are organized in folders labeled by the year and month of the collection date.
Complete datasets for WannaCrypt, Moscow blackout, Slammer, Nimda, and Code Red I are available from the RIPE route collector rrc 04 site:
     
RIPE NCC: https://www.ripe.net
     
Analyze: https://www.ripe.net/analyse
     
Internet Measurements: https://www.ripe.net/analyse/internet-measurements
     
Routing Information Service (RIS): https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris
     
RIS Raw Data: https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ris-raw-data
     
     
rrc04.ripe.net: data.ris.ripe.net/rrc04/
The date of last modification and the size of the datasets are also included.
Columns 1-4: time (column 1: hour+minute; column 2: hour; column 3: minute; column 4: second)
Columns 5-41: features
Column 42: labels for the regular (-1) and anomalous (1) data.
Note that for RNN algorithms, the PyTorch library requires that label (-1) be changed to (0).
1. Number of announcements
2. Number of withdrawals
3. Number of announced NLRI prefixes
4. Number of withdrawn NLRI prefixes
5. Average AS-path length
6. Maximum AS-path length
7. Average unique AS-path length
8. Number of duplicate announcements
9. Number of implicit withdrawals
10. Number of duplicate withdrawals
11. Maximum edit distance
12. Arrival rate
13. Average edit distance
14-23. Maximum AS-path length = n, where n = (11, ...,20)
24-33. Maximum edit distance = n, where n = (7, ...,16)
34. Number of Interior Gateway Protocol (IGP) packets
35. Number of Exterior Gateway Protocol (EGP) packets
36. Number of incomplete packets
37. Packet size (B)
Tools
Download datasets
Publications
Questions: Please contact Zhida Li at <zhidal at sfu.ca>.