What platform teams should be doing
• Sep 15, 2020
First of all, what exactly is a platform anyway?
Enabling modern software development and true DevOps demands that development teams rely on infrastructure to deploy their containerized apps. In some organizations, this is handled by individual development teams, but more and more organizations are setting up platform teams to establish a platform consisting of a basic cloud-native stack built around Kubernetes.
Development teams can then deploy, run, upgrade, decommission, secure, and monitor for mission-critical aspects of their business. For example, if we look at our customer adidas, the platform team doesn't run the shop, rather, they build and run the platform that enables the deployment and scaling of the shop.
Building the platform from scratch
Many platform teams believe their primary goal is to build the cloud-native stack and don’t realize this is like reinventing the wheel. While understanding of the entire stack helps in quickly identifying and resolving issues that impact uptime, the goal should be to attach the customer-specific part of the platform to standardized APIs without lock-in.
In many cases, the teams don’t actually have the resources to cover every aspect of ‘the platform’ while still delivering on what would really give their software development teams a competitive edge. It’s also important to remember that ‘the platform’ as we see it is a moving target, and not too long ago included mainframes in on-premises data centers.
Making a difference
Platform teams are uniquely positioned to accelerate the cloud-native journey and drive transformation by providing development teams with repetitive services. Development teams can easily and cost-effectively consume these services thus saving time, energy, and budget that project teams would have had to invest otherwise. As a result, project teams can accelerate their development, delivery, and innovation cycles for their digital products.
Where platform teams provide real value:
The platform team builds and enforces processes to ensure that security is consistent.
Working together with the security department, the platform team establishes governance rules and guidelines and makes sure that security concerns are addressed. It’s not just about implementing security, it’s about actively managing security in the right way.
This would lead to the following results:
- CVEs and security issues being addressed rapidly
- Everything scanned and monitored where necessary
- Alerts about threads being taken care of
- All of these tasks being coordinated with the right groups of people
If you take into account that you are running in three locations, with 15 clusters each, and automated new cluster launches… You start seeing the complexity of this part alone.
The platform team sets it up and can make it as complex as you want or need.
Setting up a highly automated CI/CD pipeline empowers the DevOps teams to test, deploy, verify, and improve their digital product. For example, our customers are constantly adapting the entire CI/CD pipeline so it works better with Kubernetes.
One of the most difficult aspects of the entire thing is the deployment pipeline. The CI/CD will take your code and possibly compile it, check for security compliance, automatically run tests, and analyze those tests, etc. The platform team automates the integration, testing, and deployment processes for fast and frequent updates across identical dev, staging, and production environments.
The platform team provides a universal and tightly integrated observability stack.
This supports the DevOps teams in identifying, analyzing, and resolving issues based on root-cause analysis, to continuously improve and deliver on a faster time-to-market. To achieve this is a manual and tedious process that includes making sure the right logs end up in the right place. With thresholds, it must be clear who should be alerted and when exactly.
To illustrate the complexity of this, one of the teams at our customer Deutsche Telekom built their entire monitoring with our provided tooling, and it still took almost three weeks for the initial setup and configuration. For example, this could include choosing what to log, what to graph, when to alert, etc. In addition, it’s vital to keep in mind that this is a setup that will only evolve with deepening knowledge leading to more insights and better, more targeted alerts.
In terms of the tasks related to observability, a high-functioning platform team can be expected to implement and continuously improve:
- Logging — of applications and business data for rapid discovery of issues
- Monitoring — critical components and interfaces
- Alerting — based on clever thresholds allowing for preventative actions
- Tracing — to help identify root-causes
- Supporting — development teams in identifying problems and improving application resiliency
The platform team collaborates with other teams to build applications that make the best use of technology.
The bigger the organization, the more teams will work on the systems, which in turn will mean that more learning needs to happen across teams. This knowledge will need to be collected and shared systematically. The platform team will need to stay in the loop in terms of team discussions related to tools, tasks, and what works, and perhaps more importantly, what doesn’t work. The platform team can be conceived of as bouncers — they make sure the good stuff stays and the bad stuff is booted out.
The platform team should build tooling that makes things easier.
They need to treat the development teams as their customers and build tooling that makes their lives easier. This, of course, is a never-ending story. Applying cloud-native principles to application development is essential to fully transition and benefit from the DevOps model. In order to understand and tackle the development teams’ challenges, the platform team must be available to collaborate and share. However, the impact of this collaboration and sharing is dependent on their knowledge of the cloud-native stack and broader ecosystem.
That being said, it also depends on the size of the platform team — either a Kubernetes platform team or a general platform team. We’ve collaborated with customers who have a split platform team — one taking care of developing new things (e.g. tooling to allow for easier movement of an application from cluster A to cluster B or changing the CDN) and an operations team taking care of the day-to-day work including upgrades, monitoring, and more.
To achieve top marks in this category, the platform team should tick these boxes:
- Training — development teams in cloud-native principles and the technology stack so they can make the best use of it
- Educating — teams and giving recommendations to change applications to ensure better scaling, improved reliability, and seamless upgrades while avoiding infrastructure issues
- Guaranteeing — automation is in place to provision clusters with fully secured, versioned, configured, and mandatory apps installed, and reconciliation if unintended changes are performed
- Encouraging — risk-free experimentation by providing a playground for development teams so that they can easily try out new technologies that will help them advance their offerings
- Allocating cost (or not)
The platform team must allocate costs to different teams and metrics.
In many cases, the platform team isn’t a profit or cost center, and it needs to allocate costs to different teams at the same time. In the cloud world, deciding on a fixed budget for the next year in August doesn’t really work. The challenge lies in identifying cost drivers and savings and allocating costs to propel cloud-native adoption and foster reuse and adoption of common technologies.
Cross charging and the red tape associated with budgeting can be a huge blocker for innovation. Over the years, we’ve seen companies strategically offering a platform (Kubernetes, Big Data, or otherwise) for free in an effort to drive wide-scale adoption.
Being ahead of the curve
The platform team should make their teams feel successful and strike the delicate balance between giving them what they want and educating them about where they need to go.
The most important thing is to make sure that customers make smart choices. The platform you are building isn’t meant to be a reflection of saying ‘Yes!’ to everything, but rather about making sure you get the right stuff right.
It’s also important to try stuff and innovate, which means there might be a downtime period, and this is something that needs to be navigated. It’s not necessarily a bad thing if you embrace the journey. It’s not the old corporate process whereby you buy a product, install software, and then you’re done. The entire point is that you’re continuously innovating. The cloud-native journey is never complete. The technology landscape is continually changing: new ones appear and others reach their end-of-life. The platform team must occupy the present (today’s demands) while peering into the future (What will the teams need in two years?) and provide support all along the way. You don’t need to do that if you have a finite business model. If you don’t need to innovate, this isn’t for you. However, if you do need to innovate, the platform offers immense value as long as you know that the right platform is never complete.