Honeypot Set Up

As part of my cybersecurity research, I implemented a fully functional, completely automated honeypot data collection and analysis environment in my home lab. This page provides step-by-step directions of my installation and configuration work:

Skip table of contents and pretty picure

  1. The Honeypot
    1. Choose the honeypot host
    2. Install the system dependencies
    3. Create a user account
    4. Get the Cowrie code
    5. Set up a Python virtual environment
    6. Configure Cowrie
    7. Customize Cowrie
    8. Forward listening ports
    9. Start Cowrie
  2. The Data Repository
    1. Wrangle the Data
    2. Install MySQL
    3. Create a MySQL database for Cowrie
    4. Install Apache, PHP and phpMyAdmin
    5. Import the Cowrie data into the MySQL database
    6. Configure the MySQL REST Service (MRS)
    7. Verify the MRS Core API
    8. Reference: The MRS JSON filter grammar

My honeypot environment
My honeypot environment

Part 1: The Honeypot

I chose Cowrie as the key component of my unwanted traffic detection infrastructure. Cowrie is a superb medium/high interaction honeypot designed to log brute-force attempts and shell interactions launched by attackers over both SSH and Telnet. Cowrie is very popular among both researchers and enthusiasts due to an optimal combination of rich capabilities and ease of use. It is open-source and is backed by an active community led by Michel Oosterhof, the project's maintainer, creator, and main developer.

1. Choose the honeypot host

You need to start by choosing a Linux system where to install the honeypot. Since Cowrie is very efficient in its resource consumption, I opted for a tiny Raspberry Pi 400 computer as the Cowrie host.

2. Install the system dependencies

Install the system dependencies on the Cowrie host:


 $ sudo apt-get install git python3-virtualenv libssl-dev libffi-dev build-essential libpython3-dev python3-minimal authbind virtualenv 
      

3. Create a user account

Installing a user without a password is not an absolute requirement, but it is recommended by the Cowrie authors:


 $ sudo adduser --disabled-password cowrie
 Adding user 'cowrie' ...
 Adding new group 'cowrie' (1002) ...
 Adding new user 'cowrie' (1002) with group 'cowrie' ... 
 Changing the user information for cowrie
 Enter the new value, or press ENTER for the default
 Full Name []:
 Room Number []:
 Work Phone []:
 Home Phone []:
 Other []:
 Is the information correct? [Y/n]

 $ sudo su - cowrie
      

4. Get the Cowrie code

Clone the cowrie project from GitHub:


 $ git clone http://github.com/cowrie/cowrie
 Cloning into 'cowrie'...
 remote: Counting objects: 2965, done.
 remote: Compressing objects: 100% (1025/1025), done.
 remote: Total 2965 (delta 1908), reused 2962 (delta 1905), pack-reused 0 
 Receiving objects: 100% (2965/2965), 3.41 MiB | 2.57 MiB/s, done.
 Resolving deltas: 100% (1908/1908), done.
 Checking connectivity... done.

 $ cd cowrie
       

5. Set up a Python virtual environment

Technically speaking, this step is not needed, but it is highly recommended to ensure that package updates on the Cowrie host system will not cause incompatibilities with the honeypot operation:


 $ pwd
 /home/cowrie/cowrie
 
 $ python -m venv cowrie-env
 New python executable in ./cowrie/cowrie-env/bin/python 
 Installing setuptools, pip, wheel...done.
      

After you install the virtual environment, activate it and install required packages:


 $ source cowrie-env/bin/activate
 (cowrie-env) $ python -m pip install --upgrade pip
 (cowrie-env) $ python -m pip install --upgrade -r requirements.txt 
      

6. Configure Cowrie

The Cowrie configuration is stored in the cowrie/etc/cowrie.cfg file. To run the honeypot with a standard configuration, there is no need to change anything. By default, Cowrie accepts traffic over SSH. I wanted the honeypot to also accept traffic over Telnet and change the default ports 22 and 23, so I modified the configuration file as follows:


 [telnet]
 enabled = true
 ...
 [proxy]
 backend_ssh_port = 2022
 backend_telnet_port = 2023 
      

I also wanted to change the default user configurations and the list of credentials accepted to login to the remote shell. These changes are made by modifing the cowrie/etc/userdb.txt file. Each line in the file consists of three fields separated by the : character, where:

As an example, the following settings configure a username admin that accepts all passwords except 1) only numeric characters, 2) the case-sensitive string admin, and 3) the case-insensitive string honeypot:

 admin:x:!admin
 admin:x:!/^[0-9]+$/
 admin:x:!/honeypot/i 
 admin:x:*
      

7. Customize Cowrie

Optionally, you can change the look-and-feel of the Cowrie interface to make it look more realistic. A number of files allow you to do that:

8. Forward listening ports

As we saw above, I configured Cowrie to accept SSH traffic over port 2022 and Telnet traffic over port 2023. In order to preserve the fidelity of the decoy, I opened ports 22 and 23 on the router and forward their traffic to ports 2022 and 2023, respectively, on the system hosting Cowrie.

9. Start Cowrie

Start the honeypot by calling the cowrie/bin/cowrie executable that is part of the Cowrie distribution. An existing virtual environment is preserved if activated, otherwise Cowrie will attempt to load the environment called cowrie-env that we created earlier:


 bin/cowrie start
 Activating virtualenv "cowrie-env"
 Starting cowrie with extra arguments [] ... 
      

Part 2: The Data Repository

I opted for MySQL running on Windows as the warehouse of my Cowrie-generated data because it's a powerful and mature technology, offers nice API-based data querying and manipulation functionality, can be installed locally, and — best of all — is open-source and free. Originally, I went with a Splunk Enterprise solution. Although proprietary, Splunk is free if you keep your data volume under 500 MB per day. That worked fine for about six months, but then one day the honeypot experienced a spike in traffic of 2.5 TB, which allowed me to see the ugly side of the freemium business model and Splunk's heavy-handed approach to user management: "pay up or be blocked for a month." As both options were unacceptable to me, I ditched Splunk, moved to mySQL, and never looked back. If you are interested, my old Splunk-based installation and configuration notes are available here.

Cowrie offers a set of instructions to send its data to a MySQL database that are significantly simpler than the process described on this page. Those instructions are a great starting point. Ultimately, I decided to go with a custom configuration to go beyond the bare-bones capabilities offered out-of-the-box by Cowrie. Specifically, I needed the ability to import a modified version of the Cowrie feed and a way to analyze the data through an API. I also needed to run MySQL on a separate Windows host. The instructions below provide that functionality.

2. Wrangle the Data

You can use the Cowrie data stream as it comes from the honeypot in the form of daily JSON files. This is essentially an event-based feed, where every session or unwanted interaction with the honeypot is broken down into a series of events that constitute an attack: E.g., connect, attempt to login, execute commands on the shell, create or upload/download files to the honeypot, disconnect, etc.

I opted for an alternative view of the unwanted traffic based on sessions, not events. To do this, I translated the default Cowrie feed into a new one where the main unit of information, which will later be stored as a row in a SQL database, is the session, not the event. This required merging all the events corresponding to the same session into a single "row". This work was part of the data normalization I did when I was using Splunk as my data repository; you can find the details here. As a reminder, data normalization is the process used to reorganize or ‘massage’ the data so that it’s easier, faster to work with it. It involves reducing/eliminating data redundancy and ensuring that the data dependencies are implemented in a way that takes into account the constraints of the underlying database that holds the data. This allows the data to be queried and analyzed more easily. Splunk does not use a conventional database, so the normalization that resulted in the new session-based feed was all that was needed. But MySQL, our new data store solution, utilizes a SQL database. For SQL data, normalization often requires splitting large tables into smaller ones and linking them through relationships. And that's exactly what we had to do. To understand why, let's look at a key aspect of our new session-based feed:

Single-valued fields are easy to handle and require no additional processing. They are implemented directly as columns in a SQL database table. But the SQL rules prohibit/greatly restrict the use of lists in table columns. The solution to this challenge is to do SQL normalization of the data in the Cowrie feed as follows:

Notice that the commands field is a bit of an anomaly. Technically speaking, it's a comma-separated list of Linux commands. But, as I use it as a single entity, I'm currently treating it as a single-valued long string field (i.e., as a column in the primary table). I may change this arrangement at a later time and implement the commands field as a separate secondary table by breaking it down to its individual Linux commands.

Finally, I decided to leave out the new feed a few fields in the original data stream that add limited value to my research.

2. Install MySQL

I installed the Windows version of MySQL from the MySQL Community Downloads area. At the time of my install, the latest available version of the installer was 8.0.36. Installation is straightforward and self-explanatory:

After the above installation and configuration is done, the Windows MySQL service will start, and both the MySQL Shell and MySQL Workbench will automatically launch.

3. Create a MySQL database for Cowrie

Next, you need to create a MySQL database to host your Cowrie data. The steps are as follows:

Congratulations! You now have a MySQL database ready to ingest your Cowrie data.

4. Install Apache, PHP and phpMyAdmin

This step is not needed but if, like me, you want to have the ability to view your MySQL-hosted Cowrie data through a web interface and are used to configure MySQL with the good old phpMyAdmin, you may want to consider it. I find that the easiest way to install phpMyAdmin on Windows is by using an XAMPP (Cross Apache MariaDB PHP Perl) distribution by Apache Friends. You can download their Windows installer from their website. At the time of my install, the latest available version of the Windows installer was 8.2.12. Installation is easy:

At this point, Apache, PHP and phpMyAdmin, together with the useful XAMP Control Panel, are installed on your system. We now need to tie together the earlier MySQL installation with the recent phpMyAdmin installation:

5. Import the Cowrie data into the MySQL database

Instead of a more elaborate message broker-based architecture, I implemented a simpler system taking advantage of the fact that Cowrie saves the traffic it collects in daily JSON files. I wrote a program that runs every day under the Windows Task Scheduler with the following high-level logic:

Hooray! We now have our Cowrie data — nicely normalized into sessions — available in a MySQL database. Take a few minutes to celebrate.

6. Configure the MySQL REST Service (MRS)

We now have a MySQL database that can be accessed through three different interfaces: MySQL Shell, MySQL Workbench, and browser-based phpMyAdmin. In this next step, we'll add a fourth one in the form of a Visual Studio Code extension. Although technically speaking this is not required, it will significantly simplify the process of setting up access to the Cowrie data on MySQL through a REST API. For this, we'll use the MySQL REST Service (MRS), a technology that enables fast and secure HTTPS access for your MySQL data. Implemented as a MySQL Router feature, MRS provides the ability to publish RESTful web services for interacting with the data stored in MySQL solutions. I use it to programmatically extract the Cowrie data stored in MySQL as part of my analytics workflow. Although MRS can be configured directly from the MySQL Shell, it's much easier to use the Visual Studio Code extension. I'm assuming that you are familiar with Visual Studio Code and use it for some or all your code editing activities, so we won't go over its installation, which is straightforward.

We are done! Our data should now be available at the https://localhost:8443/honeypot/v1 URI through the following endpoints.

7. Verify the MRS Core API

As the final step, we are going to test that the Cowrie data we imported into our MySQL database is indeed available through the endpoints. From Visual Studio Code, right-click DATABASE CONNECTIONS\Cowrie\MySQL REST Service\honeypot\v1\sessions and select Open REST Object Request Path in Web Browser. A new tab should open on your browser displaying the first 25 Cowrie sessions/attacks in JSON format.

For this to work, make sure that a MySQL Router instance is running. You can start MySQL Router from a command line terminal with the aide of the following Bash script:


#!/bin/bash
 
declare +i -r MSRCONF="c:/Users/YOUR_WINDOWS_USER_ID/AppData/Roaming/MySQL/mysqlsh-gui/plugin_data/mrs_plugin/router_configs/1/mysqlrouter"
declare +i -r MSRPATH="c:/Users/YOUR_WINDOWS_USER_ID/.vscode/extensions/oracle.mysql-shell-for-vs-code-1.14.2-win32-x64/router"
declare +i    pid=""
export        PATH="${PATH}:${MSRPATH}/lib"
export        ROUTER_PID="${MSRCONF}/mysqlrouter.pid"

pid=`ps -W | grep mysqlrouter | awk '{print $1}'`
if [ ! "${pid}" = "" ]
then
   echo "MySQL Router is already running with PID = ${pid}"
   exit 0
else
   "${MSRPATH}/bin/mysqlrouter.exe" -c "${MSRCONF}/mysqlrouter.conf" > /dev/null 2>&1 &
   disown %-
   pid=`ps -W | grep mysqlrouter | awk '{print $1}'`
   if [ ! "${pid}" = "" ]
   then
      echo "MySQL Router is running with PID = ${pid}"
      exit 0
   else
      echo "Error: MySQL Router could not be started"
      exit 1
   fi
fi
      

For completeness, you can stop MySQL Router with the following script:
 

#!/bin/bash
 
declare +i -r MSRCONF="c:/Users/YOUR_WINDOWS_USER_ID/AppData/Roaming/MySQL/mysqlsh-gui/plugin_data/mrs_plugin/router_configs/1/mysqlrouter"
declare +i    pid=""
 
pid=`ps -W | grep mysqlrouter | awk '{print $1}'`
if [ ! "${pid}" = "" ]
then
   echo "MySQL Router is running with PID = ${pid}"
   env kill -f ${pid} > /dev/null 2>&1
   pid=`ps -W | grep mysqlrouter | awk '{print $1}'`
   if [ "${pid}" = "" ]
   then
      rm -f ${MSRCONF}/mysqlrouter.pid
      echo "MySQL Router is no longer running"
      exit 0
   else
      echo "Error: MySQL Router could not be stopped"
      exit 1
   fi
else
   echo "MySQL Router is not running"
   exit 0
fi
      

We can also check the API using the curl command from a terminal window or shell script. The following are examples of curl command invocations that extract Cowrie data from the MySQL database using the MySQL REST API:


# Show the first 25 Cowrie sessions
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' | jq
 
# Show the first 25 scans, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'q={"type":"scan"}' | jq

# Show the first 25 attacks, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'q={"type":"attack"}' | jq

# Show the first 25 attacks with successful logins, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/attempts' -G --data-urlencode 'q={"login":"true"}' | jq

# Show the first 25 attacks with successful logins, client filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' | jq '.items[] | select(.credentials?[]?.login == "true")'

# Show the first 25 attacks with unsuccessful logins, client filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' | jq '.items[] | select(.credentials? | length > 0 and all(.login == "false"))'

# Show sessions 10,001 to 10,500, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'offset=10000&limit=500' | jq

# Show session # 12,345, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'q={"id":12345}' | jq

# Show the first 25 sessions that originated from IP addresses operating in Spain, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'q={"srcCountry":"Spain"}' | jq

# Show the first 25 sessions that originated from IP addresses operating in either Singapore or France, server filtering
curl -s -k 'https://localhost:8443/honeypot/v1/sessions' -G --data-urlencode 'q={"$or":[{"srcCountry":"Singapore"},{"srcCountry":"France"}]}' | jq
      

And that's all. We covered a lot of ground and should be ready to go. Happy threat hunting!

Reference: The MRS JSON filter grammar

The last example in the previous section shows how to combine filter clauses with a logical operator (in the example, $or). The complete specification of the JSON filter grammar supported by the MySQL REST Service is as follows:


 FilterObject { orderby , asof, wmembers }

 orderby
    "$orderby": {orderByMembers}

 orderByMembers
    orderByProperty
    orderByProperty , orderByMembers

 orderByProperty
    columnName : sortingValue

 sortingValue
    "ASC"
    "DESC"
    "-1"
    "1"
    -1
    1

 asof
    "$asof": date
    "$asof": "datechars"
    "$asof": scn
    "$asof": +int

 wmembers
    wpair
    wpair , wmembers

 wpair
    columnProperty
    complexOperatorProperty

 columnProperty
    columnName : string
    columnName : number
    columnName : date
    columnName : simpleOperatorObject
 
 columnName : complexOperatorObject
    columnName : [complexValues]

 columnName
    "\p{Alpha}[[\p{Alpha}]]([[\p{Alnum}]#$_])*$"

 complexOperatorProperty
    complexKey : [complexValues]
    complexKey : simpleOperatorObject 

 complexKey
    "$and"
    "$or"

 complexValues
    complexValue , complexValues

 complexValue
    simpleOperatorObject
    complexOperatorObject
    columnObject

 columnObject
    {columnProperty}

 simpleOperatorObject
    {simpleOperatorProperty}

 complexOperatorObject
    {complexOperatorProperty}

 simpleOperatorProperty
    "$eq" : string | number | date
    "$ne" : string | number | date
    "$lt" :  number | date
    "$lte" : number | date
    "$gt" : number | date
    "$gte" : number | date
    "$instr" : string 
    "$ninstr" : string
    "$like" : string
    "$null" : null
    "$notnull" : null
    "$between" : betweenValue
    "$like": string

 betweenValue
    [null , betweenNotNull]
    [betweenNotNull , null]
    [betweenRegular , betweenRegular]

 betweenNotNull
    number
    date
    
 betweenRegular
    string
    number
    date

 string 
    JSONString
 
 number
    JSONNumber
 
 date
    {"$date":"datechars"}
 
 scn
    {"$scn": +int}

 datechars is an RFC3339 date format in UTC (Z)
        
 JSONString
    ""
    " chars "
 
 chars
    char
    char chars
 
 char
    any-Unicode-character except-"-or-\-or-control-character
    \"
    \\
    \/
    \b
    \f
    \n
    \r
    \t
    \u four-hex-digits

 JSONNumber
    int
    int frac
    int exp
    int frac exp
 
 int
    digit
    digit1-9 digits 
    - digit
    - digit1-9 digits
 
 frac
    . digits
 
 exp
    e digits
 
 digits
    digit
    digit digits
 
 e
    e
    e+
    e-
    E
    E+
    E-