vps-dedicated

Server Monitoring Tools – htop, Netdata, and Uptime Checks

Monitoring a VPS or dedicated server - what to watch, the tools worth installing, and the right uptime/alerting setup so you find out before customers do.

5 min read

A VPS that’s “fine until it’s down” doesn’t notice problems early. Monitoring catches creeping issues (slow memory leaks, runaway processes, disk filling up) before they cause outages, and alerts you when actual outages occur. This guide covers practical monitoring on a Linux VPS — live tools like htop, dashboards like Netdata, and external uptime services that watch from outside.

What to monitor

  • CPU usage — High usage may indicate runaway processes or attacks.
  • Memory — Out-of-memory kills processes randomly.
  • Disk space — Full disk = everything breaks.
  • Disk I/O — High I/O wait = slow site even when CPU is fine.
  • Network — Spikes indicate attacks; sustained high indicates legitimate traffic growth.
  • Service status — Is httpd / nginx / mysql / php-fpm running?
  • External reachability — Can the internet reach your site?

Different tools handle each layer. No single tool does everything well.

Live interactive: htop

top is the classic; htop is the friendlier version with colors, scrolling, and tree view. Install:

# AlmaLinux / RHEL / CloudLinux
dnf install htop -y

# Ubuntu / Debian
apt install htop -y

Run: htop

What you see:

  • CPU bars per core at top.
  • Memory and swap bars.
  • Load averages.
  • Process list sorted by CPU (default).

Keyboard:

  • F6 — change sort (try MEM%, RES, TIME).
  • F5 — tree view (show parent-child process relationships).
  • F4 — filter (only show processes matching).
  • k — kill selected process.
  • q — quit.

htop tells you what’s happening right now. For “what happened yesterday” you need persistent monitoring.

Persistent monitoring: Netdata

Netdata is a real-time monitoring agent with a built-in web dashboard. Installs in minutes, gives you per-second metrics for the past hour, longer history with paid cloud.

bash <(curl -Ss https://my-netdata.io/kickstart.sh)

Visit http://your-vps-ip:19999 in browser.

Out of the box you get charts for:

  • CPU, memory, disk, network per second.
  • Per-application resource usage.
  • Web server stats (if Apache/nginx detected).
  • Database stats (if MySQL/PostgreSQL detected).
  • Disk I/O patterns.
  • Container stats if Docker is running.

Security note: Netdata exposes a web dashboard on port 19999 by default. Restrict access:

  • Firewall rule allowing only your IP.
  • Or proxy behind nginx with basic auth.
  • Or use Netdata Cloud (free tier) which gives you an authenticated dashboard from outside without exposing port 19999.

Resource footprint: small. Designed to be on production servers.

External uptime monitoring

Critical separate layer: services that check your site FROM the outside and alert when it’s unreachable. A monitoring agent running on the server can’t tell you when the server itself is offline — that’s what external uptime monitoring is for.

Free tier options:

  • UptimeRobot — 50 checks free at 5-minute intervals.
  • BetterStack (formerly Better Uptime) — 10 checks at 3-minute intervals free.
  • Hetrixtools — generous free tier with multi-location.
  • StatusCake — free tier 10 checks.

Setup: add your URL, configure alert email/SMS/Slack/Telegram notifications. When site doesn’t respond, you get pinged.

For mission-critical sites: paid tier with 1-minute checks, multiple geographic locations, escalation policies.

What to check externally

  • HTTP/HTTPS — main site loads.
  • HTTPS certificate — alert before cert expires (most services include this).
  • Specific endpoint — homepage might be cached/CDN-served; check a dynamic endpoint (e.g. /wp-login.php for WordPress).
  • Keyword check — verify expected text appears in response (catches “site loads but is broken” cases).
  • SMTP — your mail server responding on port 25.
  • SSH — your server reachable on port 22 (good early warning if site is down).

Log aggregation (advanced)

For multi-server setups or complex sites, centralizing logs makes troubleshooting much faster:

  • Grafana Loki — open source, self-hosted.
  • ELK Stack (Elasticsearch + Logstash + Kibana) — heavier but capable.
  • Cloudwatch / Datadog / Splunk — commercial, expensive, very capable.

Overkill for single-VPS setups. Once you’re managing 5+ servers, worth considering.

Custom alerts via simple cron

For specific custom alerts not covered by monitoring tools:

# Example: alert if disk > 90% full
# /usr/local/bin/disk-alert.sh

#!/bin/bash
USAGE=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$USAGE" -gt 90 ]; then
    echo "Disk on $(hostname) is at ${USAGE}%" | mail -s "DISK ALERT" alert@youremail.com
fi

Cron entry:

0 */6 * * * /usr/local/bin/disk-alert.sh

Crude but effective for catching specific thresholds.

A practical small-VPS monitoring setup

  1. Install htop for live debugging.
  2. Install Netdata for trend visibility and short-term history.
  3. Set up UptimeRobot watching homepage HTTPS, with email + Telegram alerts.
  4. Add a custom cron disk-alert script for the 90% threshold.
  5. Subscribe to your hosting provider’s status page (or follow @iWebVault on social for incident updates).

Total time: 30 minutes. Coverage: surprisingly comprehensive.

For managed hosting customers

If you’re on managed hosting (shared, reseller, managed VPS) — most server-side monitoring is handled by iWebVault. You’re alerted to issues we detect.

But add external uptime monitoring anyway. Even managed providers can’t always catch every type of outage; an independent watcher catches what internal monitoring misses.

Common questions

“How often should I check monitoring dashboards?” Set up alerts so you don’t need to check proactively. If alerts are happening regularly enough that you’d be checking dashboards anyway, fix the underlying issues.

“What’s a normal load average?” Depends on CPU count. Rule of thumb: load below number of cores = healthy. Load 2x cores = stressed. Load 5x+ = problems.

“My monitoring agent itself crashed.” Whose monitor watches the monitor? External uptime checks. They notice when the server is down regardless of agent status.

“How many alert channels should I have?” Email primary; SMS or Telegram secondary for critical alerts. Slack/Discord for team notification. Don’t have so many channels that you ignore them.

“What about Imunify360 / fail2ban — aren’t those monitoring?” Security monitoring, slightly different. They catch attacks; performance monitoring catches resource exhaustion.

What’s next

Monitoring is one of those things that feels optional until your first 3 AM outage. Set it up while everything is working calmly. The 30-minute investment to install Netdata + UptimeRobot pays dividends every time something goes sideways — turning “site has been down for hours, why didn’t I notice” into “alerted within 5 minutes, recovered in 15”.

Was this helpful?