Posted 16 Hours Ago Job ID: 2114816 14 quotes received

Senior DevOps Engineer

Fixed PriceUnder $250
Quotes (14)  ·  Premium Quotes (0)  ·  Invited (0)  ·  Hired (0)

  Send before: January 12, 2026

Send a Quote

Programming & Development Programming & Software

Summary

SUMMARY
We’re building a simple, reliable failover setup so our business keeps running if our on-prem server goes down. Mattermost is the most mission critical service.

This job starts with a clear Phase 1 delivery task: set up a warm standby VPS using Docker, Cloudflare, and cloud backups to Cloudflare R2 buckets. If we work well together, there’s ongoing on-call and ad-hoc DevOps work afterwards.

CORE HOURS AND AVAILABILITY
Preferred availability is 09:00 to 00:00 ICT daily.
We don’t need constant presence, but we do need quick responses when we reach out.

PHASE 1 — VPS FAILOVER SETUP ON DOCKER PLUS CLOUDFLARE PLUS CLOUD BACKUPS

GOAL
Create a warm standby VPS that can take over the mission critical services quickly with a single switch, so the team can continue working during an outage.

WHAT YOU WILL SET UP
A) VPS BASE AND SECURITY
Ubuntu Server 24.04 LTS or 22.04 LTS
Docker and Docker Compose installed cleanly
Firewall enabled and locked down
SSH hardened using key-only access
Clean folder structure under /opt/shozzle/service-name

B) CLOUDFLARE CONNECTIVITY
We use shozzle.com subdomains already.
You will set up the routing so services are reachable safely, using either Cloudflare Tunnel or DNS plus proxy as appropriate.
The objective is a single, simple failover switch.

C) CLOUD BACKUPS USING CLOUDFLARE R2 BUCKETS
This is mandatory. Backups must go offsite to Cloudflare R2 buckets as the primary destination.
Optional secondary copy to Google Drive can be configured if time allows.

You must implement backups for the stateful services and prove restore works.

MISSION CRITICAL SERVICES FOR THE VPS
We have around 50 containers overall, but Phase 1 is only the business continuity core. Target is no more than eight services:

Reverse proxy using Nginx Proxy Manager or Traefik
SSO and access control using Authelia
PostgreSQL database
Mattermost team chat
Tasks using Plane or Vikunja, choose one and justify
Documentation using BookStack
Passwords using Vaultwarden
Monitoring and alerts using Uptime Kuma

Hard rule: do not attempt to migrate everything. Phase 1 is core continuity only.

DELIVERABLES AND PROOF
To keep this simple and objective, please provide the following before sign-off:

A short written assessment
Export container and volume inventory
Confirm where stateful data lives
Confirm fix-in-place versus rebuild approach for the current on-prem deployment
Confirm the final VPS service list

Working deployments on the VPS
Working URLs for each service
Docker ps output from the VPS
Proof services survive a VPS reboot and restart automatically

Backups and restore proof
Automated backups configured for
PostgreSQL daily dumps
Mattermost database, config, and file uploads
BookStack database and uploads
Vaultwarden volume
Tasks database
Authelia configuration

Backups stored in Cloudflare R2 buckets
Restore test completed from R2 to the VPS with proof

Failover and failback runbook
A short runbook explaining exactly how to
switch traffic to the VPS
verify services
switch back to on-prem after recovery

PHASE 2 — ONGOING ON-CALL AND AD-HOC DEVOPS
If Phase 1 goes well, we would like to keep you on for ongoing support as needed, including:

Docker maintenance and upgrades
Monitoring and alert tuning
Security patching and hardening
Monthly DR test and validation
General VPS support across our estate, including WordPress, MERN, WHMCS, and cPanel
Xibo digital signage related systems as needed

REQUIRED SKILLS
Linux system administration on Ubuntu or Debian
Docker and Docker Compose troubleshooting
Reverse proxy configuration and SSL
PostgreSQL backup and restore
Monitoring and alerting systems
Cloudflare DNS, SSL, tunnels
Clear documentation and runbook writing

BONUS SKILLS
Coolify or similar self-hosted deploy platforms
WordPress operations, WHMCS, cPanel
Xibo or signage platforms
Automation scripting using bash or python
rclone and S3 compatible storage such as Cloudflare R2

HOW TO APPLY
Please answer these three questions in your proposal:

Confirm you understand Phase 1 is limited to the mission critical core only
Share one example of a DR or failover setup you’ve built and how backups and switching worked
Confirm availability within 09:00 to 00:00 ICT and typical response time $50 initial part then ongoing adhoc


... Show more
Marc S United Kingdom