Unwanted traffic information presented as dashboards containing metrics, lists, charts, and maps. Updated daily.
Check if an IP address or autonomous system (AS) is in our database as possible origin of unwanted traffic.
JSON-based REST API to extract information from our database of unwanted traffic.
My daily contributions of suspect IP addresses to AbuseIPDB.
Introduction
Page updated .
As part of my cybersecurity research project, the dashboard on this page presents a summary of the unwanted traffic information recorded by
my honeypot. I chose the SSH & Telnet
Cowrie honeypot running on a
Raspberry Pi 400 computer as my basic setup. The Cowrie data is
meshed with information from two external providers: MaxMind GeoLite2
is used to obtain the geolocation and autonomous system details of the IP addresses from where unwanted traffic tagetting the honeypot originates;
VirusTotal is used to obtain details on the malware files identified as used in attacks against the honeypot.
More details on my honeypot configuration can be found here.
The following definitions and assumptions are used throughout this document:
Unwanted traffic is all traffic to the honeypot that is not in response to requests made by the honeypot. "Unrequested traffic" would be an alternative
way to describe it. Unwanted traffic is divided into scans and attacks.
Scans are instances of unwanted traffic that do not attempt to log into the honeypot. Scans include harmless activity, such as that originating from
search engines, web crawlers, and non-malicious scanners of IP addresses and ports like those used for research. Scans also include more nefarious
activity such as the type of malicious reconnaissance that is often a prelude to attacks.
Attacks are instances of unwanted traffic that attempt to log into the honeypot by presenting credentials in the form of username and password pairs.
Successful attacks are those attacks that manage to log into the honeypot by presenting a username and password combination accepted by the honeypot.
Successful attacks get access to a remote shell on the honeypot.
Active attacks are the subset of successful attacks where the attacker, once on the remote shell, executes Linux commands or runs malware code. By contrast,
passive attacks terminate the connection to the remote shell right after logging in. This latter pattern is often seen in malicious reconnaissance work.
Created malware are malware samples/files that are created by the attacker on the honeypot through the use of commands and redirection during the attack session.
Transferred malware are external malware files that are placed on the honeypot through downloads originating from within the honeypot or uploads originating
from outside. The difference is that the URLs of the malware samples are known and captured during downloads, but not during uploads. Examples of commands used in malware
transfers are curl, wget, ftp, scp, etc.
A connection or session is defined as a unique interaction by a third-party entity with the honeypot. It captures all the activity (i.e., traffic flowing back and
forth) between the initial connection and the termination of the session. Unwanted traffic connections in Table 1 are divided into scans and attacks.
Also provided is the number of unique IP addresses from where unwanted traffic originated:
Table 1 — Unwanted traffic indicators
2. Attacks Indicators
Attacks in Table 2 are broken down into successful (i.e., the username and password combination was accepted by the honeypot) and unsuccessfull attacks.
(NOTE: The successful/unsuccessful attack distribution depends on the honeypot configuration, as explained here.)
Unique IP Addresses is the number of unique IP addresses from where attacks (of both kinds) originated:
Table 2 — Overall attacks indicators
Successful attacks in Table 3 are broken down into active (i.e., with commands or malware programs executed on the remote shell) and passive attacks.
Unique IP Addresses is the number of unique IP addresses from where successful attacks originated:
Table 3 — Successful attacks indicators
3. Malware Indicators
In Table 4, the number of malware interactions is divided into the number of file creations through command redirection, and the number of file transfers through uploads and
downloads. In general, redirection us used to create auxiliary files such as those that gather information on the target system, while uploads and downloads are used to bring malware
payloads to the honeypot:
Table 4 — Malware interactions indicators
Table 5, shows the distribution of unique malware files — both created and transferred — and the URLs of malware download sites:
Table 5 — Malware files indicators
4. Malware Lists
Table 6 shows the SHA-256 hashes, prevalence (expressed as the number of attacks in which they were used), type, and VirusTotal assessment of the 10 most-common files used
in attacks. NA indicates that VirusTotal information was not available at the time of the check. This could mean either that the VirusTotal feed providers did not have information on the
file (e.g., a new sample), or that the VirusTotal lookup could not be completed. Two files dominate the list: a846...f8f2 is an OpenSSH RSA public key file
confirmed by the VirusTotal feed providers to be part of a malicious Trojan script. 01ba...546b is a one-byte line break file originally reported to be a false
positive but now considered to be malicious. The file tries to overwrite /etc/hosts.deny in an attempt to allow all future connections:
Table 6 — Top 10 malware files
Table 7 shows the 10 most-used sites from where malware utilized in attacks was downloaded:
Table 7 — Top 10 malware distribution sites
4. Credentials List
Table 8 shows the 10 most-used combinations of usernames and passwords attempted on the honeypot. In addition to expected terms (e.g.,
root, admin), some interesting credentials were detected, such as
345gs5662d34 and variants. The research community is unclear about the origin and purpose of those values. It has been
proposed that they might be strings unique enough to be able to be used as breadcrumbs that can be searched for to get a sense of how hacker activity is being
tracked. Or perphaps the strings are how more traditional or default username or password values get translated into ASCII text when typed into a keyboard of
non-Latin script:
Table 8 — Top 10 username/password combinations
6. Protocols List
The distribution of unwanted traffic connections by protocol is shown in Table 9. This confirms
thats — at least in my honeypot — SSH is preferred over Telnet as a vector of attack by more than 4:1:
Table 9 — Distribution of unwanted connections by protocol
7. IP Addresses Lists
Table 10 lists the top 10 most-active IP addresses in terms of unwanted connections — scans or attacks — . For
each IP address, the contry of operation and autonomous system number (ASN) is shown. In general, IP addresses that were the origin of very large numbers of
unwanted connections had short bursts of activity, and most of them were cleaned up within 24 hours. There were exceptions to this. A handful of IP addresses
showed sustained, high-volume activity for over 72 hours. Those IP addresses were reported to their respective service providers, with the
clean-up taking place within 24 hours after reporting. Geolocating IP addresses is not an exact science, so this information should be handled with care. At this
point in our research, we are not trying to ascertain whether an IP address was the absolute origin of unwanted traffic, or that of an intermediary, such as a
VPN or proxy:
Table 10 — Details of top 10 IP addresses that were the origin of unwanted traffic
Table 11 shows the information corresponding to the top 10 most-active IP addresses that were the origin of scans:
Table 11 — Details of top 10 IP addresses that were the origin of scans
Table 12 shows the information corresponding to the top 10 most-active IP addresses that were the origin of attacks:
Table 12 — Details of top 10 addresses that were the origin of attacks
8. Traffic Trends
Chart 1 displays the trend over time of the unwanted traffic received on the honeypot, in each of its two categories of scans and attacks. The time between
data points in the chart is hours. The volume of attacks is very volatile and shows different patterns over time. In contrast,
there is a fairly consistent baseline of scans that started on honeypot fireup and spans the entire data series. I especulate that that traffic is the result
of always-on scanning over the entire IPv4 address space, in both its harmless and malicious variants. The periods of time where the number of attacks dropped and
remained low were the result of controlled experiments that changed the IP address of the honeypot to understand the impact of the target IP address on traffic collection:
Chart 1 — Trend of unwanted traffic
Chart 2 displays the trend of scans over time. Although there are spikes, a consistent baseline of scans can be observed over the entire data collection
period. In a future research, I will attempt to differentiate between harmless and malicious scans by cross-referencing the list of IP addresses from where scans
originated with 1) a list of known IP addresses of reputable and/or known scanners and crawlers (e.g., Censys, Shodan, Googlebot, Bingbot, etc.), and 2)
lists of known malicious IP addresses from services such as VirusTotal, AbuseIPDB, etc.:
Chart 2 — Trend of scans
Chart 3 displays the trend of attacks over time. Distinct traffic patterns have been observed. A pattern characterized by a low baseline of attacks was
observed during the first month and a half of honeypot operation. I believe that this was a sort of "honeypot warmup" period during which it was discovered by bots
performing reconnaissance. After about 6 weeks, the baseline of attacks increased significantly to a new level. This honeypot warmup pattern was observed every time
the IP address of the honeypot changed: January 19, 2024; May 23, 2024; and Novermber 7, 2024. It is interersting to note that the warmup time appears to decrease
over time: 6 weeks, 4 weeks, 3 weeks, and 2 weeks. Sporadic, sort-lived spikes of high attack activity was observed throughout. The traffic in those spikes originated
from a very small number of IP addresses; some of the spikes were attributed to just one suspect IP address. The spike pattern seems to indicate that most of the IP
addresses from were new high volumes of attacks originate are detected and cleaned up within 24 hours. I did report to the corresponding networks a small number of IP
addresses responsible for high volumes of attacks over periods of time longer than 72 hours. All flagged IP addresses ceased their malicious activity — at
least that visibile through the honeypot — within 24 hours of reporting. Some, but not all, of the network operators contacted replied confirming
their cleanup activities:
Chart 3 — Trend of attacks
9. Distributions by Country of Origin
Chart 4 shows the top 10 countries that were the origin of unwanted traffic (scans and attacks):
Chart 4 — Top 10 countries of origin of unwanted traffic
Chart 5 shows the top 10 countries that were the origin of scans:
Chart 5 — Top 10 countries of origin of scans
Chart 6 shows the top 10 countries that were the origin of attacks:
Chart 6 — Top 10 countries of origin of attacks
10. Distributions by Autonomous System of Origin
Chart 7 shows the top 10 autonomous systems that were the origin of unwanted traffic (scans and attacks):
Chart 7 — Top 10 ASNs of unwanted traffic
Chart 8 shows the top 10 autonomous systems that were the origin of scans:
Chart 8 — Top 10 ASNs of scans
Chart 9 shows the top 10 autonomous systems that were the origin of attacks:
Chart 9 — Top 10 ASNs of attacks
11. Geolocation Map
The following map shows the place of origin of the unwanted traffic (scans and attacks) received in my honeypot. The map is interactive. Click on the layer
icon on the bottom-left corner to toggle the legend. With the legend on, click on the visualize icon next to each of the the three types of traffic to view scans,
attacks, all, or combinations of the above.
Geolocation information was obtained by using the MaxMind GeoLite2 databases on the IP addresses from where unwanted traffic originated.
As noted earlier, this information should be handled with caution, as it is subject to the usual limitations of IP address geolocation and attribution.
It is also possible that the IP addresses were used with a VPN or proxy service to mask the true origin of the traffic.