Managed Instance technical documentation
Operations
Please review the Managed Instances operations guide for instructions.
Release process
SOC2/CI-100
Sourcegraph upgrades every test and customer instances according to SLA.
The release process is performed in steps:
- New version is released via release guild
- GitHub issue in Sourcegraph Customer repository with the
mi2 env create-tracking-issue -e prod $TARGET_VERSION
command - GitHub issue is labeled with
team/cloud
and Cloud Team is automatically notified to perform Managed Instances upgrade. Label is part of the template. - Cloud team performs upgrade of all instances in given order:
Stage | Working days since release | Action | Condition not met? |
---|---|---|---|
1 | 0-2 | Upgrade internal instances by Cloud Team (incl. demo and rctest) | |
2 | 3-4 | Time for verification by Sourcegraph teams | New patch created -> start from 1st stage |
3 | 5-6 | Upgrade: 30% trials 10% customers | New patch created -> upgrade internal in 1 working day and start from 2nd stage |
4 | 7-8 | Upgrade: 100% trials 40% customers | New patch created -> upgrade internal in 1 working day and start from 3rd stage |
5 | 9-10 | Upgrade: 100% customers | New patch created -> upgrade internal in 1 working day and start from 3rd stage |
After upgrade of every single instances uptime checks are verified. This includes automated monitoring
Sample upgrade:
- tracking issue - 5.1.4.
- GitHub Pull Requests for 5.1.4 upgrade
Release process for patch releases
With bi-weekly patch release schedule, Cloud Team is using simplified release process to ensure Cloud customers can obtain patch as soon as possible.
Known limitations of managed instances
Sourcegraph managed instances are now running on Kubernetes, specifically GKE, today.
Current Cloud architecture has been tested to support a workload of >100000 repositories (440GB Git storage) and 10000 simulated users on a n2-standard-32
VM.
Security
- Isolation: Each managed instance is created in an isolated GCP project with heavy gcloud access ACLs and network ACLs for security reasons.
- Admin access: Both the customer and Sourcegraph personnel will have access to an application-level admin account. Learn more about how we ensure secure access to your instance.
- VM/SSH access: Only Sourcegraph personnel will have access to the actual GCP environment, this is done securely through GCP IAP TCP proxy access only. Sourcegraph personnel can make changes or provide data from the environment upon request by the customer.
- Inbound network access: The customer may choose between having the deployment be accessible via the public internet and protected by their SSO provider, or for additional security have the deployment restricted to an allowlist of IP addresses only (such as their corporate VPN, etc.). Filtering of the IP allowlist is performed by our WAF provider, Cloudflare. Notes, in addition to the customer provided IP allowlist, traffic from well-known public code hosts (e.g. GitHub.com) is also permitted to access selected Sourcegraph endpoints to ensure functionality of certain features.
- Outbound network access: The Sourcegraph deployment will have unfettered egress TCP/ICMP access, and customers will need to allow the Sourcegraph deployment to contact their code host. This can be done by having their code-host be publicly accessible, or by allowing the static IP of the Sourcegraph deployment to access their code host.
- Web Application Firewall (WAF) protections: All managed instances are proxied through Cloudflare and leverage security features such as rate limiting and the Cloudflare WAF.
Access can be requested in #it-tech-ops WITH manager approval.
Monitoring and alerting
SOC2/CI-86 SOC2/CI-25
Each managed instance is created in an isolated GCP project, with exclusive resources.
Metrics are visible to Sourcegraph employees with access in a centralized GCP metrics scoping project project. All metrics can be seen in scoped projects dashboard.
Every customer managed instance has alerts configured:
- cloud provider-managed uptime check is configured in dedicated GCP managed instance project
- instance performance metrics alerts are configured in the scoped project for all managed instances
- additional v2.0 infrastructure pefrormance metrics configured per instance
- application performance metrics - based on application log events
Alerting flow:
-
When alert is triggered, it is sent to Opsgenie channel:
-
From Opsgenie, alert is sent to on-call Cloud and Slack channels (
#opsgenie
,#alerts-managed-instances
. -
On-call Cloud engineer has to decide, what is the alert type and if incident should be opened and follow the procedure to perform the incident. On-call Cloud engineer should use the generated managed instances operations to check, assess and repair broken managed instance.
-
When alert is closed via incident resolution, post-mortem actions has to be assigned and performed.
Opsgenie alerts Sample managed instance incident - customer XXX is down.
List Trials
Please visit go/cloud-ops