Gentoo Cloud Production Server Tricks

Published on Author Artem ButusovLeave a comment

Intro

Here I will try to describe some production server tricks to keep production server self maintainable.

Tricks below are good for small standalone server where central monitoring system is useless overhead.

Here I will assume that you have:
– Gentoo
– OpenRC
– AWS
– vixie-cron
– syslog-ng
– mailx
– ansi2html
– monit
– ntpd
– awscli

Working sendmail

Here I will assume that you have working sendmail so emails could go out to admin email and admin will be able to respond in time.

Backup block devices

You need to backup all block devices to snapshots to be able to recover everything if something will fail.

Create special user and security key in AWS console and just give him permissions to use EC2 snapshots. Also it’s recommended to lock that user to IP address of server.

The following IAM policy could be used:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:DescribeSnapshots"
            ],
            "Resource": "*",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "A.B.C.D/32"
                }
            }
        }
    ]
}

Then you need to configure that user:

aws configure

There are a lot of available utilities to make rsnapshot like backups fro block devices in AWS.

You could use that utility: https://github.com/sormy/aws-ec2-rsnapshot

The script below will run each day, create snapshot, try to sync fs before snapshot and keep 7 last daily snapshots.

/etc/cron.daily/aws-ec2-rsnapshot:

#!/bin/bash

export HOME=/root
output=$(aws-ec2-rsnapshot artembutusov.com/daily/root 7 vol-AABBCCDD sync)
if [ $? != 0 ]; then
    echo "$output" | mailx -s "Unable to create volume snapshot" root
fi

The script below will run each week, create snapshot, try to sync fs before snapshot and keep 8 last weekly snapshots (2 months).

/etc/cron.weekly/aws-ec2-rsnapshot:

#!/bin/bash

export HOME=/root
output=$(aws-ec2-rsnapshot artembutusov.com/weekly/root 8 vol-AABBCCDD sync)
if [ $? != 0 ]; then
    echo "$output" | mailx -s "Unable to create volume snapshot" root
fi

Both scripts will not show any errors if snapshot creation was successfully completed but will show all details if it’s failed.

Keep updated portage tree

You should have updated portage tree all time.

There are some reasons for that
– you will be able to merge fresh package anything in any time
– you will be able to detect if system is too outdated
– IMPORTANT: you will be able to identify if system is affected by security vulnerabilities published in GLSA

I have also eix installed so the easies way is to put script in cron.

/etc/cron.daily/eix-sync:

#!/bin/bash

eix-sync > /dev/null 2>&1

GLSA check

We need to check GLSA every day and notify admin if system is vulnerable.

The script below will check for GLSA and send email if something was found.

/etc/cron.daily/glsa-check:

#!/bin/bash

glsa=$(glsa-check -t -n -v all 2>&1)
[[ "$glsa" =~ "This system is not affected by any of the listed GLSAs" ]] || (
    echo "$glsa" | mailx -s "GLSA check failed" root
)

or if you want color email notification:

#!/bin/bash

glsa=$(glsa-check -t -v all 2>&1)
[[ "$glsa" =~ "This system is not affected by any of the listed GLSAs" ]] || (
    echo "$glsa" | ansi2html | mailx -a "Content-Type: text/html" -s "GLSA check failed" root
)

PS: Coloring requires ansi2html package installed.

SSL check (cron)

It’s very important to refresh SSL in time and do not get any problems for your services due to outdated SSL.

Install the following script.

/usr/local/bin/ssl-check:

#!/bin/bash

cert_path=$1
days=${2:-1}

if [ -z "$cert_path" ]; then
    echo "Usage: ssl-check <certificate_path> [days_to_notify]"
    exit 0
fi

if [ ! -f "$cert_path" ]; then
    echo "Unable to find file: $cert_path"
    exit 1
fi

now_time=$(date +%s)
cert_cname=$(openssl x509 -text -noout -in "$cert_path" | grep 'Subject:.* CN=' | sed 's/^.*=//')
cert_end_ts=$(openssl x509 -text -noout -in "$cert_path" | grep "Not After" | sed 's/\s*Not After : //')
cert_end_time=$(date -d "$cert_end_ts" +%s)
cert_expire_days=$(( ($cert_end_time - $now_time) / 60 / 60 / 24 ))

if [ "$cert_expire_days" -le 0 ]; then
    echo "SSL certificate $cert_cname is expired"
    exit 2
elif [ "$cert_expire_days" -le "$days" ]; then
    echo "SSL certificate $cert_cname will expire in $cert_expire_days day(s)"
    exit 3
else
    echo "SSL certificate $cert_cname will expire in $cert_expire_days day(s)"
    exit 0
fi

The script below will check daily local certificate and notify admin if certificate will expire in 7 days.

/etc/cron.daily/ssl-check:

#!/bin/bash

output=$(ssl-check /etc/letsencrypt/live/artembutusov.com/cert.pem 7)
if [ $? != 0 ]; then
    echo "$output" | mailx -s "SSL check failed" root
fi

SSL check (monit)

Certificate check could be implemented even easier with monit but it will require active service with that SSL.

/etc/monit.d/ssl-check:

check host ssl-artembutusov-com with address artembutusov.com
    if failed
        port 443
        protocol https
        and certificate valid > 7 days
    then alert

Monit check

If you are using monit then monit will be able to monitor almost everything except situations when monit is died for some reason =). So we need a simple check script which will let us know if monit died.

/etc/cron.hourly/monit-check:

#!/bin/bash

ps -p $(cat /var/run/monit.pid 2> /dev/null) > /dev/null 2>&1 || (
    echo "Monit is not running" | mailx -s "Monit is not running" root
)

NTP sync

Bad time could create a lot of problems sometimes so we need to be sure that time is good.

/etc/cron.hourly/ntp-sync:

#!/bin/bash

ntpd -gq > /dev/null 2>&1

Check updates

I most cases we update system when we need new features or system is vulnerable or system is too old.

First issue is always manual fix, second issue is addressed by GLSA check and last issue should be monitored.

The system became outdated when attempt to update world with latest portage creates problems for portage.

We could detect that by script below.

/etc/cron.weekly/check-updates

#!/bin/bash

output=$(emerge --update --deep --newuse --color n world -vp 2>&1)
if [ $? != 0 ]; then
    echo "$output" | mailx -s "There are new conflicting updates available for system" root
elif [[ ! "$output" =~ "Total: 0 packages" ]]; then
    echo "$output" | mailx -s "There are new updates available for system" root
fi

The same but with coloring:

#!/bin/bash

output=$(emerge --update --deep --newuse --color y world -vp 2>&1)
if [ $? != 0 ]; then
    echo "$output" | ansi2html | mailx -a "Content-Type: text/html" -s "There are new conflicting updates available for system" root
elif [[ ! "$output" =~ "Total: 0 packages" ]]; then
    echo "$output" | ansi2html | mailx -a "Content-Type: text/html" -s "There are new updates available for system" root
fi

PS: Coloring requires ansi2html package installed.

syslog alerts

Some processes could write in syslog very interesting messages which should be delivered to admin.

The script below required for syslog-ng to deliver messages to admin.

/usr/local/bin/syslog-alert-sender:

#!/bin/bash

strwrap () {
    local str="$1"
    local width="${2-80}"
    if [ "${#str}" -gt "$width" ]; then
        echo -n "${str:0:$width}..."
    else
        echo -n "$str"
    fi
}

while read line; do
    echo "$line" | mailx -s "$(strwrap "$line")" "root"
done < /dev/stdin

Apple Mail client could hang if it will get too long “Subject” header on email, so we need to wrap subject and limit it to 80 characters.

In the example below I would like to deliver all messages with level from error to emergency to admin.

/etc/syslog-ng/syslog-ng.conf:

...
filter sshd_ignore { level(err); program("^sshd$"); };
filter monit_ignore { level(err); program("^monit$"); };
filter alert { level(err..emerg); not filter(sshd_ignore); not filter(monit_ignore); };
destination alert_sender { program("syslog-alert-sender"); };
log { source(src); filter(alert); destination(alert_sender); };

I guess you don’t want to get an email each time bots trying to brute-force your ssh password, so sshd is excluded here. You need to also exclude monit, otherwise each time monit will have an issue you will start getting a lot of emails from your server (for each error entry in log).

By default syslog do not store message level in log so it will be very hard to identify level and may be create some kind of ignore rule without that information.

Format of syslog-ng logging could be changed like below.

/etc/syslog-ng/syslog-ng.conf:

...
template full_message { template("$ISODATE $HOST $FACILITY.$LEVEL $MSGHDR$MSG\n"); };

options {
    ...
    file-template(full_message);
    proto-template(full_message);
};
...

ssh fail2ban

We need to protect SSH from suspicious activity:

Install syslog-ng, fail2ban and iptables.

/etc/fail2ban/jail.d/sshd.conf:

[sshd]
enabled = true
logpath = /var/log/messages
action = iptables[name=SSH, port=ssh, protocol=tcp]

portage tree (monit)

If for some reason portage tree will became outdated we need to notify admin.

/etc/monit.d/portage-tree:

check file portage-tree path /usr/portage/metadata/timestamp.chk
    # allow 5 cycles for portage update (file could be unavailable)
    if timestamp > 2 days within 5 cycles then alert

service status (monit)

If for some reason any service will crash we need to notify admin.

/etc/monit.d/rc-status:

check program rc-status with path "/bin/rc-status --crashed --nocolor"
    if status == 0 then alert

free space (monit)

Missing free space could freeze whole server so we need to monitor that.

/etc/monit.d/rootfs:

check filesystem rootfs with path /dev/xvda1
    if space usage > 80% then alert

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.