Unwanted traffic detected yesterday

Page updated .

As part of my cybersecurity research project, the dashboard on this page presents a summary of the unwanted traffic information recorded by my honeypot. I chose the SSH & Telnet Cowrie honeypot running on a Raspberry Pi 400 computer as my basic setup. The Cowrie data is meshed with information from two external providers: MaxMind GeoLite2 is used to obtain the geolocation and autonomous system details of the IP addresses from where unwanted traffic tagetting the honeypot originates; VirusTotal is used to obtain details on the malware files identified as used in attacks against the honeypot. More details on my honeypot configuration can be found here.

The following definitions and assumptions are used throughout this document:

Unwanted traffic is all traffic to the honeypot that is not in response to requests made by the honeypot. "Unrequested traffic" would be an alternative way to describe it. Unwanted traffic is divided into scans and attacks.
Scans are instances of unwanted traffic that do not attempt to log into the honeypot. Scans include harmless activity, such as that originating from search engines, web crawlers, and non-malicious scanners of IP addresses and ports like those used for research. Scans also include more nefarious activity such as the type of malicious reconnaissance that is often a prelude to attacks.
Attacks are instances of unwanted traffic that attempt to log into the honeypot by presenting credentials in the form of username and password pairs.
Successful attacks are those attacks that manage to log into the honeypot by presenting a username and password combination accepted by the honeypot. Successful attacks get access to a remote shell on the honeypot.
Active attacks are the subset of successful attacks where the attacker, once on the remote shell, executes Linux commands or runs malware code. By contrast, passive attacks terminate the connection to the remote shell right after logging in. This latter pattern is often seen in malicious reconnaissance work.
Created malware are malware samples/files that are created by the attacker on the honeypot through the use of commands and redirection during the attack session.
Transferred malware are external malware files that are placed on the honeypot through downloads originating from within the honeypot or uploads originating from outside. The difference is that the URLs of the malware samples are known and captured during downloads, but not during uploads. Examples of commands used in malware transfers are curl, wget, ftp, scp, etc.

Unwanted Traffic Indicators
Attacks Indicators

All Attacks
Successful Attacks

Malware Indicators

Interactions
Files

Malware Lists

Files
Distribution Sites

Credentials List
Protocols List
IP Addresses Lists

Sources of Unwanted Traffic
Sources of Scans
Sources of Attacks

Traffic Trends

Unwanted Traffic Trend
Scans Trend
Attacks Trend

Distributions by Country of Origin

Unwanted Traffic Distribution
Scans Distribution
Attacks Distribution

Distributions by Autonomous System of Origin

Unwanted Traffic Distribution
Scans Distribution
Attacks Distribution

Geolocation Map

A connection or session is defined as a unique interaction by a third-party entity with the honeypot. It captures all the activity (i.e., traffic flowing back and forth) between the initial connection and the termination of the session. Unwanted traffic connections in Table 1 are divided into scans and attacks. Also provided is the number of unique IP addresses from where unwanted traffic originated:

Attacks in Table 2 are broken down into successful (i.e., the username and password combination was accepted by the honeypot) and unsuccessfull attacks. (NOTE: The successful/unsuccessful attack distribution depends on the honeypot configuration, as explained here.) Unique IP Addresses is the number of unique IP addresses from where attacks (of both kinds) originated:

Successful attacks in Table 3 are broken down into active (i.e., with commands or malware programs executed on the remote shell) and passive attacks. Unique IP Addresses is the number of unique IP addresses from where successful attacks originated:

In Table 4, the number of malware interactions is divided into the number of file creations through command redirection, and the number of file transfers through uploads and downloads. In general, redirection us used to create auxiliary files such as those that gather information on the target system, while uploads and downloads are used to bring malware payloads to the honeypot:

Table 5, shows the distribution of unique malware files — both created and transferred — and the URLs of malware download sites:

Table 6 shows the SHA-256 hashes, prevalence (expressed as the number of attacks in which they were used), type, and VirusTotal assessment of the 10 most-common files used in attacks. NA indicates that VirusTotal information was not available at the time of the check. This could mean either that the VirusTotal feed providers did not have information on the file (e.g., a new sample), or that the VirusTotal lookup could not be completed. Two files dominate the list: a846...f8f2 is an OpenSSH RSA public key file confirmed by the VirusTotal feed providers to be part of a malicious Trojan script. 01ba...546b is a one-byte line break file originally reported to be a false positive but now considered to be malicious. The file tries to overwrite /etc/hosts.deny in an attempt to allow all future connections:

Table 7 shows the 10 most-used sites from where malware utilized in attacks was downloaded:

Table 8 shows the 10 most-used combinations of usernames and passwords attempted on the honeypot. In addition to expected terms (e.g., root, admin), some interesting credentials were detected, such as 345gs5662d34 and variants. The research community is unclear about the origin and purpose of those values. It has been proposed that they might be strings unique enough to be able to be used as breadcrumbs that can be searched for to get a sense of how hacker activity is being tracked. Or perphaps the strings are how more traditional or default username or password values get translated into ASCII text when typed into a keyboard of non-Latin script:

The distribution of unwanted traffic connections by protocol is shown in Table 9. This confirms thats — at least in my honeypot — SSH is preferred over Telnet as a vector of attack by more than 4:1:

Table 10 lists the top 10 most-active IP addresses in terms of unwanted connections — scans or attacks — . For each IP address, the contry of operation and autonomous system number (ASN) is shown. In general, IP addresses that were the origin of very large numbers of unwanted connections had short bursts of activity, and most of them were cleaned up within 24 hours. There were exceptions to this. A handful of IP addresses showed sustained, high-volume activity for over 72 hours. Those IP addresses were reported to their respective service providers, with the clean-up taking place within 24 hours after reporting. Geolocating IP addresses is not an exact science, so this information should be handled with care. At this point in our research, we are not trying to ascertain whether an IP address was the absolute origin of unwanted traffic, or that of an intermediary, such as a VPN or proxy:

Table 11 shows the information corresponding to the top 10 most-active IP addresses that were the origin of scans:

Table 12 shows the information corresponding to the top 10 most-active IP addresses that were the origin of attacks:

Chart 1 displays the trend over time of the unwanted traffic received on the honeypot, in each of its two categories of scans and attacks. The time between data points in the chart is hours. The volume of attacks is very volatile and shows different patterns over time. In contrast, there is a fairly consistent baseline of scans that started on honeypot fireup and spans the entire data series. I especulate that that traffic is the result of always-on scanning over the entire IPv4 address space, in both its harmless and malicious variants. The periods of time where the number of attacks dropped and remained low were the result of controlled experiments that changed the IP address of the honeypot to understand the impact of the target IP address on traffic collection:

Chart 2 displays the trend of scans over time. Although there are spikes, a consistent baseline of scans can be observed over the entire data collection period. In a future research, I will attempt to differentiate between harmless and malicious scans by cross-referencing the list of IP addresses from where scans originated with 1) a list of known IP addresses of reputable and/or known scanners and crawlers (e.g., Censys, Shodan, Googlebot, Bingbot, etc.), and 2) lists of known malicious IP addresses from services such as VirusTotal, AbuseIPDB, etc.:

Chart 3 displays the trend of attacks over time. Distinct traffic patterns have been observed. A pattern characterized by a low baseline of attacks was observed during the first month and a half of honeypot operation. I believe that this was a sort of "honeypot warmup" period during which it was discovered by bots performing reconnaissance. After about 6 weeks, the baseline of attacks increased significantly to a new level. This honeypot warmup pattern was observed every time the IP address of the honeypot changed: January 19, 2024; May 23, 2024; and Novermber 7, 2024. It is interersting to note that the warmup time appears to decrease over time: 6 weeks, 4 weeks, 3 weeks, and 2 weeks. Sporadic, sort-lived spikes of high attack activity was observed throughout. The traffic in those spikes originated from a very small number of IP addresses; some of the spikes were attributed to just one suspect IP address. The spike pattern seems to indicate that most of the IP addresses from were new high volumes of attacks originate are detected and cleaned up within 24 hours. I did report to the corresponding networks a small number of IP addresses responsible for high volumes of attacks over periods of time longer than 72 hours. All flagged IP addresses ceased their malicious activity — at least that visibile through the honeypot — within 24 hours of reporting. Some, but not all, of the network operators contacted replied confirming their cleanup activities:

Chart 4 shows the top 10 countries that were the origin of unwanted traffic (scans and attacks):

Chart 5 shows the top 10 countries that were the origin of scans:

Chart 6 shows the top 10 countries that were the origin of attacks:

Chart 7 shows the top 10 autonomous systems that were the origin of unwanted traffic (scans and attacks):

Chart 8 shows the top 10 autonomous systems that were the origin of scans:

Chart 9 shows the top 10 autonomous systems that were the origin of attacks:

The following map shows the place of origin of the unwanted traffic (scans and attacks) received in my honeypot. The map is interactive. Click on the layer icon on the bottom-left corner to toggle the legend. With the legend on, click on the visualize icon next to each of the the three types of traffic to view scans, attacks, all, or combinations of the above. Geolocation information was obtained by using the MaxMind GeoLite2 databases on the IP addresses from where unwanted traffic originated. As noted earlier, this information should be handled with caution, as it is subject to the usual limitations of IP address geolocation and attribution. It is also possible that the IP addresses were used with a VPN or proxy service to mask the true origin of the traffic.