Skip to main content

Self Hosting Setup

How to set up and maintain a PostgreSQL database on AWS tailored to support TargetBoard as well as requirements and best practices for setting up and maintaining a virtual machine to run your ETL pipelines.

Guidelines for Setting Up and Maintaining the Required Resources to Self-Host Your Data and Connect to TargetBoard

This document outlines the how to set up and maintain a PostgreSQL database on AWS tailored to support TargetBoard as well as requirements and best practices for setting up and maintaining a virtual machine to run your ETL pipelines (e.g., dbt, Airbyte, Stitch, custom Docker workloads). Adhering to these guidelines will ensure optimal performance, security, and compatibility with TargetBoard services.

As you will be hosting your own database and Virtual Machine, you are responsible for their security and maintenance.

Setting Up Your Database

Database Type and Version
  • Database Engine: Use PostgreSQL hosted on AWS RDS or Amazon Aurora for PostgreSQL.
  • Version: Minimum required version is 15; version 16 is preferred.
Instance Specifications
  • Minimal Compute Resources:
    • CPU: Minimum of 2 vCPUs.
    • Memory: At least 8 GB RAM.
  • Minimal Storage:
    • Allocate 100 GB of storage.
    • Enable storage auto-scaling to accommodate future growth.
AWS Region
  • The database can be deployed in any region. However, for optimal performance, select between these two options:
    • New York region: us-east-1 or us-east-2
    • Frankfurt region: eu-central-1, for GDPR compliance.

Network and Security Configuration

VPC and Subnet Setup
  • Deploy the database within its own Virtual Private Cloud (VPC).
  • Use private subnets to prevent direct internet access to the database instance.
Security Groups and IP Whitelisting
Encryption and Security
  • Encryption at Rest:
    • Enable storage encryption using AWS Key Management Service (KMS).
  • Encryption in Transit:
    • Enforce SSL connections to encrypt data in transit.
  • Database Credentials:
    • Use strong passwords and consider integrating with AWS Secrets Manager for credential management.
Access and Permissions
  • Provide TargetBoard with an admin user account with necessary privileges for database management.

Maintenance and Updates

Maintenance Windows
  • Schedule maintenance windows during off-peak hours (Sunday or Friday).
  • Ensure these windows are communicated to all stakeholders.
Updates and Patching
  • Enable automatic security updates, patching, and general database maintenance.
Backups and Recovery
  • Automated Backups:
    • Enable automated backups with a retention period that meets business requirements.
    • We recommend maintaining a retention period of at least 1 month.
  • Point-In-Time Recovery (PITR):
    • Ensure PITR is enabled to recover the database to any point within the backup retention period.
  • Amazon CloudWatch:
    • Set up CloudWatch metrics and alarms for performance monitoring.
  • Enhanced Monitoring:
    • Enable enhanced monitoring for granular insights.
  • Logging:
    • Configure PostgreSQL to log sufficient details for auditing and troubleshooting.
    • Enable AWS CloudTrail to log API calls made in your AWS account.
Performance Optimization
  • Enable Regular Maintenance Tasks:
    • Schedule tasks like VACUUM, ANALYZE, and REINDEX to maintain database performance.
    • Usually, these are set by default with AWS Aurora for PostgreSQL.

Setting Up Your Virtual Machine

VM Type and Specifications

Minimal Compute Resources
  • CPU:
    • Minimum of 2 vCPUs.
    • Memory: At least 8 GB RAM.
  • Disk:
    • Allocate a minimum of 256 GB disk storage.
    • Enable storage auto-scaling (if supported by the cloud provider).
  • Supported Platforms:
    • AWS, Google Cloud, Azure, or equivalent.
Operating System
  • Recommended OS: Linux (Debian).
  • Keep the OS minimal (disable unnecessary packages and services).
Disk and Storage
  • Use SSD-backed storage for ETL workloads.
  • Configure file system with sufficient I/O throughput for Docker containers.
  • Enable disk monitoring and alerts for low free space.

Network and Security Configuration

Firewall and IP Whitelisting
  • Deploy the VM behind a firewall.
  • Restrict inbound traffic only to whitelisted IPs used by TargetBoard.
    • TargetBoard VPN:
      • 69.55.59.137
  • If the database is hosted on-premises, ensure one of the following:
    • It resides on the same VPC as the VM.
    • Alternatively, its IP address is explicitly added to the whitelist above.
Outbound Ports
  • The VM requires outbound access on the following ports/protocols:
    • HTTP (80)
    • HTTPS (443)
    • UDP (for services that require lightweight data transfer or DNS resolution)
    • TCP (for postgres connection - whitelist the postgres port).
Encryption and Security
  • Encryption at Rest:
    • Enable disk encryption with provider-managed keys (AWS KMS, GCP KMS, etc.).
  • Encryption in Transit:
    • Enforce TLS/SSL for all connections.
    • Avoid plain-text passwords in environment files.
Access and Permissions
  • Provide TargetBoard support team with a restricted SSH user account if access is required.
  • Enforce key-based authentication (no password login).
  • Apply role-based access controls for multi-user environments.

Maintenance and Updates

Maintenance Windows
  • Schedule maintenance windows during off-peak hours (Friday or Sunday)
  • Communicate downtime to stakeholders.
Updates and Patching
  • Apply automatic security updates to the OS.
  • Regularly update Docker and Docker Compose to the latest stable versions.
Backups and Recovery
  • VM Snapshots:
    • Take weekly full VM snapshots as a minimum.
    • Retain backups for at least 1 month.
  • Recovery:
    • Test VM restoration quarterly to ensure business continuity.
  • Monitoring:
    • Use provider tools (CloudWatch, Stackdriver, or equivalent).
    • Track CPU, memory, disk usage, and network throughput.
  • Logging:
    • Centralized logs are recommended (Cloud Logging, ELK, or Datadog).
    • Enable auditing of SSH access.
Containerization
  • Install Docker and Docker Compose for ETL workloads.
  • Keep container images updated with security patches.

How did we do?

Installing Airbyte (On-Prem) and Connecting Sources

Contact