Resume
Dimitrios Andreas Kafetzis
Site Reliability and Platform Engineering
WORK EXPERIENCE
Senior Site Reliability Engineer (SRE) | Platform Reliability Engineer
Sept 2024 - Present Adyen - Singapore
Joined as a member of the global SRE team.
- Eased the onboarding by structuring and writing an intro-to-platform course.
Site Reliability Engineer (SRE) | Platform Engineer
Nov 2022 - Aug 2024 VegaSolutions / TokkaLabs / Native / RangeProtocol - Singapore
Joined as the first dedicated SRE/Platform engineer under Vega Solutions bulding a platform to support multiple tenants. The last months of my journey with these companies, June - August 2024, my employment moved under Tokka Labs.
- Designed and implemented new system architecture (AWS -> AWS) improving externally exposed service latency below 500ms (from ~1s) and supporting multiple regions (provisioning a new region with full services takes less than 2 days).
- Introduced shared services such as SSO, Vault eliminating the need for different accounts per service.
- Created template integrations for instrumenting nodejs and python services against Prometheus, Grafana and Jaeger improving visibility and allowing engineers to investigate production issues in a matter of minutes.
- Refactored helm templates and value files by introducing separation and hierarchy simplifying them and easing their maintenance. This allowed application engineers to automatically generate them and reduce the introduction of new services from a couple of days to a few minutes.
- Pushed for a cloud native architecture by provisioning databases inside k8s and advocating for k8s only deployments. Succeeded in deploying small mysql, postgres/timescale and redis clusters in k8s.
Site Reliability Engineer (SRE)
Sept 2021 - Oct 2022 (1yr 1mo) Bytedance - Global E-commerce TikTok - Singapore
Joined the newly formed SRE team supporting the Global E-commerce function of TikTok.
- Implemented a Kubernetes sidecar measuring the availability SLI of more than 200 HTTP services.
- Analysed SLI based alerts and pushed for adoption of error budget burn rate approach.
- Designed and implemented the MVP version of an incident record tool aiming to reduce the manual bookkeeping activities during oncall cases and build a knowledge base based on them.
Principal Integration Engineer | Site Reliability Engineer
Aug 2020 - Sep 2021 (1 yr 2 mos) OpenBet - Singapore
- Defined the initial process for an SRE rota across two teams located in multiple locations.
- Updated existing Grafana dashboards and created new ones to visualise our production problems across the stack (haproxy, applications, database).
Principal Software Engineer.
Sep 2018 - Jul 2020 (1 yr 11 mos) OpenBet - Singapore
- Led a team of 3 engineers with the goal to harden the funding flow ensuring eWallet integrity resulting in a drop of 1 incident per week to 1 per year.
- Enhanced the performance of slow database queries bring the number of production incidents due to human design error down to zero (0).
- Investigated more than 20 high profile production issues and presented 4 sensitive incidents to C level stakeholders.
- Migrated 80 million lottery bets to renewed, more robust data model.
Senior Software Engineer
Oct 2015 - Aug 2018 (2 yrs 11 mos) OpenBet - Singapore
- Coordinated development of cross functional features amongst teams of different locations with a goal to rollout our solution to a new tier 1 customer.
- Owned the data migration into the new platform including 100 million rows of transaction history and user activity and completed in under 3 hours.
- Designed distributed queries for Informix database materialising the reporting needs of the Finance and Trading departments.
- Influenced the release process from code to production increasing the delivery capacity of the team.
- Rewrote the ansible deployment scripts reducing to zero the release introduced incidents.
Software Engineer
Nov 2012 - Sep 2015 (2 yrs 11 mos) Rizariou 10, Chalandri, Greece
- Wrote and performed the extraction of one billion rows during uptime and no impact to operations.
- Integrated with Paypal & Safecharge payment providers tokenizing credit card information removing the need for PCI compliance.
EDUCATION
I have not completed tertiary education.
Attended Electrical Engineering and Computer Science, University of Patras, Greece during the period of Sep 2003 - Jul 2009.