We’re going to deliver our first ever OpenShift Commons Gathering live online with Q&A, and take the Gatherings to a even wider global audience.
We will still share all of our main stage sessions, including OpenShift 4 and Kubernetes Release Update and Road Map with Clayton Coleman, and all of our engineering project leads will still be delivering their “State of” Deep Dive talks. We’re working to enable our case study speakers and other guest speakers to share their talks as well.
Thank you for your enthusiasm for and participation in the OpenShift Commons community. We couldn’t do this without the ongoing support of our members, sponsors, speakers and staff.
This is the second part of a two-part blog series discussing software deployment options on OpenShift leveraging Helm and Operators. In the previous part we’ve discussed the general differences between the two technologies. In this blog we will specifically the advantages and disadvantages of deploying a Helm chart directly via the helm tooling or via an Operator.
Part II – Helm charts and Helm-based Operators
Users that already have invested in Helm to package their software stack, now have multiples options to deploy this on OpenShift: With the Operator SDK, Helm users have a supported option to build an Operator from their charts and use it to create instances of that chart by leveraging a Custom Resource Definition. With Helm v3 in Tech Preview and helm binaries shipped by Red Hat, users and software maintainers now have the ability to use helm directly on OpenShift cluster.
When comparing Helm charts and Helm-based Operators, in principle the same considerations as outlined in the first part of this series apply. The caveat is that in the beginning, the Helm-based Operator does not possess advanced lifecycle capabilities over the standalone chart itself. There however are still advantages.
With helm-based Operators, a Kubernetes-native interface exists for users on cluster to create helm releases. Using a Custom Resource, an instance of the chart can be created in a namespace and configured through the properties of the resource. The allowed properties and values are the same as the values.yaml of the chart, so users familiar with the chart don’t need to learn anything new. Since internally a Helm-based Operator uses the Helm libraries for rendering, any type of chart type and helm feature is supported. Users of the Operator however don’t necessarily need the helm CLI to be installed but just kubectl to be present in order to create an instance. Consider the following example:
The Custom Resource Cockroachdb is owned by the CockroachDB operator which has been created using the CockroachDB helm chart. The entire .spec section can essentially be a copy and paste from the values.yaml of the chart. Any value supported by the chart can be used here. Values that have a default are optional.
The Operator will transparently create a release in the same namespace where the Custom Resource is placed. Updates to this object cause the deployed Helm release to be updated automatically. This is in contrast to Helm v3, where this flow originates from the client side and installing and upgrading a release are two distinct commands.
Image may be NSFW. Clik here to view.
While a Helm-based Operator does not magically extend the lifecycle management capabilities of Helm it does provide a more native Kubernetes experience to end users, which interact with charts like with any other Kubernetes resource.
Everything concerning an instance of a Helm chart is consolidated behind a Custom Resource. As such, access to those can be enforced restricted via standard Kubernetes RBAC, so that only entitled users can deploy Image may be NSFW. Clik here to view.certain software, irrespective of their privileges in a certain namespace. Through tools like the Operator Lifecycle Manager, a selection of vetted charts can be presented as a curated catalog of helm-based Operators.
As the Helm-based Operator constantly applies releases, manual changes to chart resources are automatically rolled back and configuration drift is prevented. This is different to using helm directly, where deleted objects are not detected and modified chart resources are only merged and not rolled back. The latter does not happen until a user runs the helm utility again. Dealing with a Kubernetes Custom Resources however may also present itself as the easier choice in GitOps workflows where only kubectl tooling is present.
When installed through the Operator Lifecycle Manager, a Helm-based Operator can also leverage other Operators services, by expressing a dependency to them. Manifests containing Custom Resources owned by other Operators can simply be made part of the chart. For example, the above manifest creating a CockroachDB instance could be shipped as part of another helm chart, that deploys an application that will write to this database.
When such charts are converted to an Operator as well, OLM will take care of installing the dependency automatically, whereas with Helm this is the responsibility of the user. This is also true for any dependencies expressed on the cluster itself, for example when the chart requires certain API or Kubernetes versions. These may even change over the lifetime of a release. While such out-of-band changes would go unnoticed by Helm itself, OLM constantly ensures that these requirements are fulfilled or clearly signals to the user when they are not.
On the flip side, a new Helm-based Operator has to be created, published to catalog and updated on cluster whenever a new version of the chart becomes available. In order to avoid the same security challenges Tiller had in Helm v2, the Operator should not run with global all-access privileges. Hence, the RBAC of the Operator is usually explicitly constrained by the maintainer according to the least-privilege principle.
The SDK attempts to generate the Operator’s RBAC rules automatically during conversion from a chart but manual tweaks might be required. The conversion process at a high level looks like this:
Image may be NSFW. Clik here to view.
Restricted RBAC now applies to Helm v3: chart maintainers need to document the required RBAC for the chart to be deployed since it can no longer be assumed that cluster-admin privileges exist through Tiller.
Quite recently the Operator-SDK moved to Helm v3. This is a transparent change for both users and chart maintainers. The SDK will automatically convert existing v2 releases to v3 once an updated Operator is installed.
In summary: end users that have an existing Helm charts at hand can now deploy on OpenShift using helm tooling, assuming they have enough permissions. Software maintainers can ship their Helm charts unchanged now to OpenShift users as well.
Using the Operator SDK they get more control over the user and admin experience by converting their chart to an Operator. While the resulting Operator eventually deploys the chart in the same way the Helm binary would, it plays along very well into with the rest of the cluster interaction using just kubectl, Kubernetes APIs and proper RBAC, which also drives GitOps workflows. On top of that there is transparent updating of installed releases and constant remediation of configuration drift. Clusters
Helm-based Operators also integrate well with other Operators through the use of OLM and its dependency model, avoiding re-inventing how certain software is deployed. Finally for ISVs, Helm-based Operators present an easy entry into the Operator ecosystem without any change required to the chart itself.
Some time ago, I published an article about the idea of self-hosting a load balancer within OpenShift to meet the various requirements for ingress traffic (master, routers, load balancer services). Since then, not much has changed with regards to the load balancing requirements for OpenShift. However, in the meantime, the concept of operators, as an approach to capture automated behavior within a cluster, has emerged. The release of OpenShift 4 fully embraces this new operator-first mentality.
Prompted by the needs of a customer, additional research on this topic was performed on the viability of deploying a self-hosted load balancer via an operator.
The requirement is relatively simple: an operator watches for the creation of services of type LoadBalancer and provides load balancing capabilities by allocating a load balancer in the same cluster for which the service is defined.
Image may be NSFW. Clik here to view.
In the diagram above, an application is deployed with a LoadBalancer type of service. The hypothetical self-hosted load balancer operator is watching for those kinds of services and will react by instructing a set of daemons to expose the needed IP in an HA manner (creating effectively a Virtual IP [VIP]). Inbound connections to that VIP will be load balanced to the pods of our applications.
In OpenShift 4, by default, the router instances are fronted by a LoadBalancer type of service, so this approach would also be applicable to the routers.
In Kubernetes, a cloud provider plugin is normally in charge of implementing the load balancing capability of LoadBalancer services, by allocating a cloud-based load balancing solution. Such an operator as described previously would enable the ability to use LoadBalancer services in those deployments where a cloud provider is not available (e.g. bare metal).
Metallb
Metallb is a fantastic bare metal-targeted operator for powering LoadBalancer types of services.
In layer 2 mode, one of the nodes advertises the load balanced IP (VIP) via either the ARP (IPv4) or NDP (IPv6) protocol. This mode has several limitations: first, given a VIP, all the traffic for that VIP goes through a single node potentially limiting the bandwidth. The second limitation is a potentially very slow failover. In fact, Metallb relies on the Kubernetes control plane to detect the fact that a node is down before taking the action of moving the VIPs that were allocated to that node to other healthy nodes. Detecting unhealthy nodes is a notoriously slow operation in Kubernetes which can take several minutes (5-10 minutes, which can be decreased with the node-problem-detector DaemonSet).
In BGP mode, Metallb advertises the VIP to BGP-compliant network routers providing potentially multiple paths to route packets destined to that VIP. This greatly increases the bandwidth available for each VIP, but requires the ability to integrate Metallb with the router of the network in which it is deployed.
Based on my tests and conversations with the author, I found that the layer 2 mode of Metallb is not a practical solution for production scenarios as it is typically not acceptable to have failover-induced downtimes in the order of minutes. At the same time, I have found that the BGP mode instead would much better suit production scenarios, especially those that require very large throughput.
Back to the customer use case that spurred this research. They were not allowed to integrate with the network routers at the BGP level, and it was not acceptable to have a failover downtime of the order of minutes.
What we needed was a VIP managed with the VRRP protocol, so that it could failover in a matter for milliseconds. This approach can easily be accomplished by configuring the keepalived service on a normal RHEL machine. For OpenShift, Red Hat has provided a supported container called ose-keepalived-ipfailover with keepalived functionality. Given all of these considerations, I decided to write an operator to orchestrate the creation of ipfailover pods.
Keepalived Operator
The keepalived operator works closely with OpenShift to enable self-servicing of two features: LoadBalancer and ExternalIP services.
It is possible to configure OpenShift to serve IPs for LoadBalancer services from a given CIDR in the absence of a cloud provider. As a prerequisite, OpenShift expects a network administrator to manage how traffic destined to those IPs reaches one of the nodes. Once reaching a node, OpenShift will make sure traffic is load balanced to one of the pods selected by that given service.
Similarly for ExternalIPs, additional configurations must be provided to specify the CIDRs range users are allowed to pick ExternalIPs from. Once again, a network administrator must configure the network to send traffic destined to those IPs to one of the OpenShift nodes.
The keepalived operator plays the role of the network administrator by automating the network configuration prerequisites.
Image may be NSFW. Clik here to view.
When LoadBalancer services or services with ExternalIPs are created, the Keeplived operator will allocate the needed VIPs on a portion of the nodes by adding additional IPs on the node’s NICs. This will draw the traffic for those VIPs to the selected nodes.
VIPs are managed by a cluster of ipfailover pods via the VRRP protocol, so in case of a node failure, the failover of the VIP is relatively quick (in the order of hundreds of milliseconds).
Installation
To install the Keepalived operator in your own environment, consult the documentation within the GitHub repository.
Conclusions
The objective of this article was to provide an overview of options for self-hosted load balancers that can be implemented within OpenShift. This functionality may be required in those scenarios where a cloud provider is not available and there is a desire to enable self-servicing capability for inbound load balancers.
Neither of the examined approaches allows for the definition of a self-hosted load balancer for the master API endpoint. This remains an open challenge especially with the new OpenShift 4 installer. I would be interested in seeing potential solutions in this space.
In 2016, CoreOS coined the term, Operator. They started a movement about a whole new type of managed application that achieves automated Day-2 operations with a user-experience that feels native to Kubernetes.
Since then, the extensions mechanisms that underpin the Operator pattern, have evolved significantly. Custom Resource Definitions, an integral part of any Operator, became stable, got validation and a versioning feature that includes conversion. Also, the experience the Kubernetes community gained when writing and running Operators accumulated critical mass. If you’ve attended any KubeCon in the past 2 years, you will have noticed the increased coverage and countless sessions focusing on Operators.
The popularity that Operators enjoy, is based on the possibility to achieve a cloud-like service experience for almost any workload available wherever your cluster runs. Thus, Operators are striving to be the world’s best provider of their workload as-a-service.
But what actually does make for a good Operator? Certainly the user experience is an important pillar, but it is mostly defined through the interaction between the cluster user running kubectl and the Custom Resources that are defined by the Operator.
This is possible with Operators being extensions of the Kubernetes control plane. As such, they are global entities that run on your cluster for a potentially very long time, often with wide privileges. This has some implications that require forethought.
They are covering recommendations concerning the design of an Operator as well as behavioral best practices that come into play at runtime. They reflect a culmination of experience from the Kubernetes community writing Operators for a broad range of use cases. In particular, the observations the Operator Framework community made, when developing tooling for writing and lifecycling Operators.
Some highlights include the following development practices:
One Operator per managed application
Multiple operators should be used for complex, multi-tier application stacks
CRD can only be owned by a single Operator, shared CRDs should be owned by a separate Operator
One controller per custom resource definition
As well as many others.
With regard to best practices around runtime behavior, it’s noteworthy to point out these:
Do not self-register CRDs
Be capable of updating from a previous version of the Operator
Be capable of managing an Operand from an older Operator version
Use CRD conversion (webhooks) if you change API/CRDs
There are additional runtime practices (please, don’t run as root) in the document worth reading.
This list, being a community effort, is of course open to contributions and suggestions. Maybe you are planning to write an Operator in the near future and are wondering how a certain problem would be best solved using this pattern? Or you recently wrote an Operator and want to share some of your own learnings as your users started to adopt this tool? Let us know via GitHub issues or file a PR with your suggestions and improvements. Finally, if you want to publish your Operator or use an existing one, check out OperatorHub.io.
A common request from OpenShift users has long been to raise the number of pods per node. OpenShift has set the limit to 250 starting with the first Kubernetes-based release (3.0) through 4.2, but with very powerful nodes, it can support many more than that.
This blog describes the work we did to achieve 500 pods per node, starting from initial testing, bug fixes and other changes we needed to make, the testing we performed to verify function, and what you need to do if you’d like to try this.
Background
Computer systems have continued unabated their relentless progress in computation power, memory, storage capacity, and I/O bandwidth. Systems that not long ago were exotic supercomputers are now dwarfed in their capability (if not physical size and power consumption) by very modest servers. Not surprisingly, one of the most frequent questions we’ve received from customers over the years is “can we run more than 250 pods (the — until now — tested maximum) per node?”. Today we’re happy to announce that the answer is yes!
In this blog, I’m going to discuss the changes to OpenShift, the testing process to verify our ability to run much larger numbers of pods, and what you need to do if you want to increase your pod density.
Goals
Our goal with this project was to run 500 pods per node on a cluster with a reasonably large number of nodes. We also considered it important that these pods actually do something; pausepods, while convenient for testing, aren’t a workload that most people are interested in running on their clusters. At the same time, we recognized that the incredible variety of workloads in the rich OpenShift ecosystem would be impractical to model, so we wanted a simple workload that’s easy to understand and measure. We’ll discuss this workload below, which you can clone and experiment with.
Initial Testing
Early experiments on OpenShift 4.2 identified issues with communication between the control plane (in particular, the kube-apiserver) and the kubelet when attempting to run nodes with a large number of pods. Using a client/server builder application replicated to produce the desired number of pods, we observed that the apiserver was not getting timely updates from the kubelet when pods came into existence, resulting in problems such as networking not coming up for pods and the pods (when they required networking) failing as a result.
Our test was to run many replicas of the application to reproduce the requisite number of pods. We observed that up to about 380 pods per node, the applications would start running normally. Beyond that, we would see some pods remain in Pending state, and some start but terminate. Pods terminating that are expected to run do so because the pod itself decides to terminate. There were no messages in the logs identifying particular problems; the pods appeared to be starting up correctly, but the code within the pods was failing, resulting in the pods terminating. Studying the application, the most likely reason the pods would terminate was that the client pod would be unable to connect to the server, indicating that it did not have a network available.
As an aside, we observed that the kubelet declared the pods to be Running very quickly; the delay was in the apiserver realizing this. Again, there were no log messages in either the kubelet or the apiserver logs indicating any issue. The network team requested that we collect logs from the openshift-sdn that manages pod networking; that too showed nothing out of the ordinary. Indeed, even using host networking didn’t help.
To simplify the test, we wrote a much simpler client/server deployment, where the client would simply attempt to connect to the server until it succeeded rather than failing, using only two nodes. The client pods logged the number of connection attempts made and the elapsed time before success. We ran 500 replicas of this deployment, and found that up to about 450 pods total (225 per node), the pods started up and quickly went into Running state. Between 450 and 620, the rate of pods transitioning to Running state slowed down, and actually stalled out for about 10 minutes, after which the backlog cleared at a rate of about 3 pods/minute until eventually (after a few more hours) all of the pods were running. This supported the hypothesis that there was nothing really wrong with the kubelet; the client pods were able to start running, but most likely timed out connecting to the server, and did not retry.
On the hypothesis that the issue was rate of pod creation, we tried adding sleep 30 between creating each client-server pair. This staved off the point at which pod creation slowed down to about 375 pods/node, but eventually the same problem happened. We tried another experiment placing all of the pods within one namespace, which succeeded — all of the pods quickly started and ran correctly. As a final experiment, we used pause pods (which do not use the network) with separate namespaces, and hit the same problem, starting at around 450 pods (225/node). So clearly this was a function of the number of namespaces, not the number of pods; we had established that it was possible to run 500 pods per node, but without being able to use multiple namespaces, we couldn’t declare success.
Fixing the problem
By this point, it was quite clear that the issue was that the kubelet was unable to communicate at a fast enough rate with the apiserver. When that happens, the most obvious issue is the kubelet throttling transmission to the apiserver per the kubeAPIQPS and kubeAPIBurst kubelet parameters. These are enforced by Go rate limiting. The defaults that we inherit from upstream Kubernetes are 5 and 10, respectively. This allows the kubelet to send at most 5 queries to the apiserver per second, with a short-term burst rate of 10. It’s easy to see how under a heavy load that the kubelet may need a greater bandwidth to the apiserver. In particular, each namespace requires a certain number of secrets, which have to be retrieved from the apiserver via queries, eating into those limits. Additional user-defined secrets and configmaps only increase the pressure on this limit.
The throttling is used in order to protect the apiserver from inadvertent overload by the kubelet, but this mechanism is a very broad brush. However, rewriting it would be a major architectural change that we didn’t consider to be warranted. Therefore, the goal was to identify the lowest safe settings for KubeAPIQPS and KubeAPIBurst.
Experimenting with different settings, we found that the setting QPS/burst to 25/50 worked fine for 2000 pods on 3 nodes with a reasonable number of secrets and configmaps, but 15/30 didn’t.
The difficulty in tracking this down is that there’s nothing in either the logs or Prometheus metrics identifying this. Throttling is reported by the kubelet at verbosity 4 (v=4 in the kubelet arguments), but the default verbosity, both upstream and within OpenShift, is 3. We didn’t want to change this globally. Throttling had been seen as a temporary, harmless condition, hence its being relegated to a low verbosity level. However, with our experiments frequently showing throttling of 30 seconds or more, and this leading to pod failures, it clearly was not harmless. Therefore, I opened https://github.com/kubernetes/kubernetes/pull/80649, which eventually merged, and then pulled it into OpenShift in time for OpenShift 4.3. While this alone would not solve throttling, it greatly simplifies diagnosis. Adding throttling metrics to Prometheus would be desirable, but that is a longer-term project.
The next question was what to set the kubeAPIQPS and kubeAPIBurst values to. It was clear that 5/10 wouldn’t be suitable for larger numbers of pods. We decided that we wanted some safety margin above the tested 25/50, hence settled on 50/100 following node scaling testing on OpenShift 4.2 with these parameters set.
Another piece of the puzzle was the watch-based configmap and secret manager for the kubelet. This allows the kubelet to set watches on secrets and configmaps supplied by the apiserver, which in the case of items that don’t change very often, are much more efficient for the apiserver to handle, as it caches the watched objects locally. This change, which didn’t make OpenShift 4.2, would enable the apiserver to handle a heavier load of secrets and configmaps, easing the potential burden of the higher burst/QPS values. If you’re interested in the details of the change, in Go 1.12, the details are here, under net/http
To summarize, we made the following changes between OpenShift 4.2 and 4.3 to set the stage for scaling up the number of pods:
Change the default kubeAPIQPS from 5 to 50.
Change the default kubeAPIBurst from 10 to 100.
Change the default configMapAndSecretChangeDetectionStrategy from Cache to Watch.
Testing 500 pods/node
The stage was now set to actually test 500 pods/node as part of OpenShift 4.3 scaling testing. The questions we had to decide were:
What hardware do we want to use?
What OpenShift configuration changes would be needed?
How many nodes do we want to test?
What kind of workload do we want to run?
Hardware
A lot of pods, particularly with many namespaces, can put considerable stress on the control plane and the monitoring infrastructure. Therefore, we deemed it essential to use large nodes for the control plane and monitoring infrastructure. As we expected the monitoring database to have very large memory requirements, we placed (as is our standard practice) the monitoring stack on a separate set of infrastructure nodes rather than sharing that with the worker nodes. We settled on the following, using AWS as our underlying platform:
Master Nodes
The master nodes were r5.4xlarge instances. r5 instances are memory-optimized, to allow for large apiserver and etcd processes. The instance type consists of:
CPU: 16 cores, Intel Xeon Platinum 3175
Memory: 128 GB
Storage: EBS (no local storage), 4.75 Gbps
Network: up to 10 Gbps.
Infrastructure Nodes
The infrastructure nodes were m5.12xlarge instances. m5 instances are general purpose. The instance type consists of:
CPU: 48 cores, Intel Xeon Platinum 8175
Memory: 192 GB
Storage: EBS (no local storage), up to 9.5 Gbps
Network: 10 Gbps
Worker Nodes
The worker nodes were m5.2xlarge. This allows us to run quite a few reasonably simple pods, but typical application workloads would be heavier (and customers are interested in very big nodes!). The instance type consists of:
CPU: 8 cores, Intel Xeon Platinum 8175
Memory: 16 GB
Storage: EBS (no local storage), 4.75 Gbps
Network: up to 10 Gbps
Configuration Changes
The OpenShift default for maximum pods per node is 250. Worker nodes have to contain parts of the control infrastructure in addition to user pods; there are about 10 such control pods per node. Therefore, to ensure that we could definitely achieve 500 worker pods per node, we elected to set maxPods to 520 using a custom KubeletConfig using the procedure described here
This requires an additional configuration change. Every pod on a node requires a distinct IP address allocated out of the host IP range. By default, when creating a cluster, the hostPrefix is set to 23 (i. e. a /23 net), allowing for up to 510 addresses — not quite enough. So clearly we had to set hostPrefix to 22 for this test in the install-config.yaml used to install the cluster.
In the end, no other configuration changes from stock 4.3 were needed. Note that if you want to run 500 pods per node, you’ll need to make these two changes yourself, as we did not change the defaults.
How many nodes do we want to test?
This is a function of how many large nodes we believe customers will want to run in a cluster. We settled on 100 for this test.
What kind of workload do we want to run?
Picking the workload to run is a matter of striking a balance between the test doing something interesting and being easy to set up and run. We settled on a simple client-server workload in which the client sends blocks of data to the server which the server returns, all at a pre-defined rate. We elected to start at 25 nodes, then follow up with 50 and 100, and use varying numbers of namespaces and pods per namespace. Large numbers of namespaces typically stress the control plane more than the worker nodes, but with a larger number of pods per worker node, we didn’t want to discount the possibility that that would impact the test results.
Test Results
We used ClusterBuster to generate the necessary namespaces and deployments for this test to run.
ClusterBuster is a simple tool that I wrote to generate a specified number of namespaces, and secrets and deployments within those namespaces. There are two main types of deployments that this tool generates: pausepod, and client-server data exchange. Each namespace can have a specified number of deployments, each of which can have a defined number of replicas. The client-server can additionally specify multiple client containers per pod, but we didn’t use this feature. The tool uses oc create and oc apply to create objects; we created 5 objects per oc apply, running two processes concurrently. This allows the test to proceed more quickly, but we’ve found that it also creates more stress on the cluster. ClusterBuster labels all objects it creates with a known label that makes it easy to clean up everything with
oc delete ns -l clusterbuster
In client-server mode, the clients can be configured to exchange data at a fixed rate for either a fixed number of bytes or for a fixed amount of time. We used both here in different tests.
We ran tests on 25, 50, and 100 nodes, all of which were successful; the “highest” test (i. e. greatest number of namespaces) in each sequence was:
25 node pausepod: 12500 namespaces each containing one pod.
25 node client-server: 2500 namespaces each containing one client-server deployment consisting of four replica client pods and one server (5 pods/deployment). Data exchange was at 100 KB/sec (in each direction) per client, total 10 MB in each direction per client.
50 node pausepod: 12500 namespaces * 2 pods.
50 node client-server: 5000 namespaces, one deployment with 4 clients + server, 100 KB/sec, 10 MB total.
100 node client-server: 5000 namespaces, one deployment with 9 clients + server, 100 KB/sec for 28800 seconds. In addition, we created and mounted 10 secrets per namespace.
Test Data
I’m going to cover what we found doing the 100 node test here, as we didn’t observe anything during the smaller tests that was markedly different (scaled appropriately).
We collected a variety of data during the test runs, including Prometheus metrics and another utility from my OpenShift 4 tools package (monitor-pod-status), with Grafana dashboards to monitor cluster activity. monitor-pod-status strictly speaking duplicates what we can get from Grafana, but it’s in an easy to read textual format. Finally, I used yet another tool clusterbuster-connstat to retrieve log data left by the client pods to analyze the rate of data flow.
Test Timings
The time required to create and tear down the test infrastructure is a measure of how fast the API and nodes can perform operations. This test was run with a relatively low parallelism factor, and operations didn’t lag significantly.
Operation
Approximate Time (minutes)
Create namespaces
4
Create secrets
43
Create deployments
34
Exchange data
480
Delete pods and namespaces
29
One interesting observation during the pod creation time is that pods were being created at about 1600/minute, and at any given time, there were about 270 pods in ContainerCreating state. This indicates that the process of pod creation took about 10 seconds per pod throughout the run.
Networking
The expected total rate of data exchange is 2 * Nclients * XferRate. In this case, of the 50,000 total pods, 45,000 were clients. At .1 MB/sec, this would yield an expected aggregate throughput of 9000 MB/sec (72,000 Mb/sec). The aggregate expected transfer rate per node would therefore be expected to be 720 Mb/sec, but as we’d expect on average about 1% of the clients to be colocated with the server, the actual average network traffic would be slightly less. In addition, we’d expect variation due to the number of server pods that happened to be located per node; in the configuration we used, each server pod handles 9x the data each client pod handles.
I inspected 5 nodes at random; each node showed a transfer rate during the steady state data transfer of between 650 and 780 Mbit/sec, with no noticeable peaks or valleys, which is as expected. This is nowhere near the 10 Gbps limit of the worker nodes we used, but the goal of this test was not to stress the network.
Quis custodiet ipsos custodes?
With apologies to linguistic purists, the few events we observed were related to Prometheus. During the tests, one of the Prometheus replicas typically used about 130 Gbytes of RAM, but a few times the memory usage spiked toward 300 Gbytes before ramping down over a period of several hours. In two cases, Prometheus crashed; while we don’t have records of why, we believe it likely that it ran out of memory. The high resource consumption of Prometheus reinforces the importance of robust monitoring infrastructure nodes!
Future Work
We have barely scratched the surface of pod density scaling with this investigation. There are many other things we want to look at, over time:
Even more pods: as systems grow even more powerful, we can look at even greater pod densities.
Adding CPU and memory requests: investigate the interaction between CPU/memory requests and large numbers of pods.
Investigate the interaction with other API objects: raw pods per node is only part of what stresses the control plane and worker nodes. Our synthetic test was very simple, and real-world applications will do a lot more. There are a lot of other dimensions we can investigate:
Number of configmaps/secrets: very large numbers of these objects in combination with many pods can stress the QPS to the apiserver, in addition to the runtime and the Linux kernel (as each of these objects must be mounted as a filesystem into the pods).
Many containers per pod: this stresses the container runtime.
Probes: these likewise could stress the container runtime.
More workloads: the synthetic workload we used is easy to analyze, but is hardly representative of every use people make of OpenShift. What would you like to see us focus on? Leave a comment with your suggestions.
More nodes: 100 nodes is a starting point, but we’re surely going to want to go higher. We’d also like to determine whether there’s a curve for maximum number of pods per node vs. number of nodes.
Bare metal: typical users of large nodes are running on bare metal, not virtual instances in the cloud.
Credits
I’d like to thank the many members of the OpenShift Perf/Scale, Node, and QE teams who worked with me on this, including (in alphabetical order) Ashish Kamra, Joe Talerico, Ravi Elluri, Ryan Phillips, Seth Jennings, and Walid Abouhamad.
Recently in Milan, Red Hat presented an Italian edition of the OpenShift Commons gathering. The event brings together experts from all over the world to discuss open source projects that support the OpenShift and Kubernetes ecosystem as well as to explore best practice for native-cloud application development and getting business value from container technologies at scale. Presenting in Milan were three organizations leading the way: SIA, Poste Italiene and Amadaus.*
Amadeus’ OpenShift infrastructure
Amadeus Software Engineer Salvatore Dario Minonne spoke of the five-year relationship between Red Hat and Amadeus. “In the fall of 2014 we got to know the Red Hat engineering team in Raleigh in the United States. Our teams got their hands on the first versions of OpenShift and started a fruitful collaboration with Red Hat that has become a true engineering partnership. We continue to contribute our use cases to the community, to help drive open source innovation that meets our real-world needs,” said Minonne.
“Not all Amadeus applications are in the cloud,” added Minonne, underlining that their infrastructure is a hybrid of public and private cloud, and there is a careful consideration when migrating workloads to the cloud.
“At Amadeus,” said Minonne, “We are looking closely into multicloud, not just to avoid vendor lock-in, but also to mitigate the risks of impact if something goes wrong with a provider, and to give us the ability to spin down a particular cluster if it is buggy or there is a security issue.”
Minonne talked about the change in mindset required with a move to hybrid cloud. “Software development and management practices must also change, to mitigate compatibility issues that might occur with applications not originally designed for the cloud. In fact, many Kubernetes resources have been created precisely to reduce these incompatibilities.”
Poste Italiane
Pierluigi Sforza and Paolo Gigante, Senior Solutions Architects working in Poste Italiane’s IT Technological Architecture Group, spoke to the OpenShift Milan Commons audience about how Poste Italiane has accelerated its digital transformation efforts in the last year.
Sforza emphasised how they are embracing a DevOps philosophy along with their increased use of open source, which has involved building a closer relationship with Red Hat. Gigante added that the rise in open source at Poste Italiene “reflects the current technology landscape, where rapidly evolving competition, increased digitalization and changing customer expectations require faster time to market, which is one area where proprietary technologies from traditional vendors often fall short.”
Sforza added that, “the need for agility and speed of delivery sometimes necessitates taking a risk in trying less mature technologies, starting by experimenting with the open source community and then relying on trusted vendors, such as Red Hat, to have the levels of security and stability needed to go into production.”
Poste Italiane has been adapting its legacy infrastructures and processes to the new world of DevOps and containerization. This laid the foundation for new projects, such as an adaptation it has made to its financial platform in line with the PSD2 directive. “With OpenShift, we were able to create a reliable, high performance platform perfectly adapted to our needs, in order to meet another of our major business goals: to be at the forefront of innovation,” said Sforza.
The organization’s infrastructure modernization implicates the migration of some workloads off the mainframe. Sforza explained: “Where it makes sense, we are aiming to move monolithic workloads to a containerized infrastructure using microservices, which is more cost effective and gives us greater scalability. This will help us manage applications more efficiently and provide a more slick end-user experience, especially given the rise in customers using our digital channels.”
SIA
SIA is headquartered in Milan and operates in 50 countries. SIA is a European leader in the design, construction and management of technology infrastructures and services for financial institutions, central banks, public companies and government entities, focusing on payments, e-money, network services and capital markets.
Nicola Nicolotti, a Senior System Administrator at SIA, explained how they are supporting customers with the move to containers: “the traditional waterfall approach is often not compatible with the adoption of new technologies, which require a deeper level of integration. However, many traditionally structured organizations face multiple difficulties when adopting new technologies and putting changes into practice, so we aim to help them understand what those challenges might be as well as the corresponding solutions that can help them meet their business objectives.”
Matteo Combi, SIA Solution Architect, emphasised the importance of collaboration when working with the open source community – not just via software development. “When we participated in Red Hat Summit in Boston, we recognised the value in sharing diverse experiences at an international level. Being able to compare different scenarios enables us to develop new ideas to improve our use of the technology itself as well as how it can be applied to meet our business goals.”
* Customer insights in this post originally appeared in Italian as part of a special feature in ImpresaCity magazine, issue #33, October 2019, available to read here.
When OCS 4.2 GA was released, I was thrilled to finally test and deploy it in my lab. I read the documentation and saw that only vSphere and AWS installations are currently supported. My lab is installed in an RHV environment following the UPI Bare Metal documentation so, in the beginning, I was a bit disappointed. I realized that it could be an interesting challenge to find a different way to use it and, well, I found it during my day by day late night fun. All the following procedures are unsupported.
Prerequisites
An OCP 4.2.x cluster installed (the current latest version is 4.2.14)
The possibility to create new local disks inside the VMs (if you are using a virtualized environment) or servers with disks that can be used
Issues
The official OCS 4.2 installation in vSphere requires a minimum of 3 nodes which use 2TB volume each (a PVC using the default “thin” storage class) for the OSD volumes + 10GB for each mon POD (3 in total using always a PVC). It also requires 16 CPU and 64GB RAM for node.
Use case scenario
bare-metal installations
vSphere cluster
without a shared datastore
you don’t want to use the vSphere dynamic provisioner
without enough space in the datastore
without enough RAM or CPU
other virtualized installation (for example RHV which is the one used for this article)
Challenges
create a PVC using local disks
change the default 2TB volumes size
define a different StorageClass (without using a default one) for the mon PODs and the OSD volumes
define different limits and requests per component
Solutions
use the local storage operator
create the ocs-storagecluster resource using a YAML file instead of the new interface. That means also add the labels to the worker nodes that are going to be used by OCS
Procedures
Add the disks in the VMs. Add 2 disks for each node. 10GB disk for mon POD and 100GB disk for the OSD volume Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view.
Repeat for the other 2 nodes
The disks MUST be in the same order and have the same device name in all the nodes. For example, /dev/sdb MUST be the 10GB disk and /dev/sdc the 100GB disk in all the nodes.
[root@utility ~]# for i in {1..3} ; do ssh core@worker-${i}.ocp42.ssa.mbu.labs.redhat.com lsblk | egrep "^sdb.*|sdc.*$" ; done
sdb 8:16 0 10G 0 disk
sdc 8:32 0 100G 0 disk
sdb 8:16 0 10G 0 disk
sdc 8:32 0 100G 0 disk
sdb 8:16 0 10G 0 disk
sdc 8:32 0 100G 0 disk
[root@utility ~]#
Install the Local Storage Operator. Here the official documentation
Then install the operator from the OperatorHub Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view.
Wait for the operator POD up&running
[root@utility ~]# oc get pod -n local-storage
NAME READY STATUS RESTARTS AGE
local-storage-operator-ccbb59b45-nn7ww 1/1 Running 0 57s
[root@utility ~]#
The Local Storage Operator works using the devices as reference. The LocalVolume resource scans the nodes which match the selector and creates a StorageClass for the device.
Do not use different StorageClass names for the same device.
We need the Filesystem type for these volumes. Prepare the LocalVolume YAML file to create the resource for the mon PODs which use /dev/sdb
You can notice the “monPVCTemplate” section in which we define the StorageClass “local-sc” and in the section “storageDeviceSets” the different storage sizes and the StorageClass “localblock-sc” used by OSD volumes.
Now we can create the resource
[root@utility ~]# oc create -f ocs-cluster-service.yaml
storagecluster.ocs.openshift.io/ocs-storagecluster created
[root@utility ~]#
During the creation of the resources, we can see how the PVCs created are bounded with the Local Storage PVs
[root@utility ~]# oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
rook-ceph-mon-a Bound local-pv-68faed78 10Gi RWO local-sc 13s
rook-ceph-mon-b Bound local-pv-b640422f 10Gi RWO local-sc 8s
rook-ceph-mon-c Bound local-pv-780afdd6 10Gi RWO local-sc 3s
[root@utility ~]# oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-5c4e718c 100Gi RWO Delete Available localblock-sc 28m
local-pv-68faed78 10Gi RWO Delete Bound openshift-storage/rook-ceph-mon-a local-sc 34m
local-pv-6a58375e 100Gi RWO Delete Available localblock-sc 28m
local-pv-780afdd6 10Gi RWO Delete Bound openshift-storage/rook-ceph-mon-c local-sc 34m
local-pv-b640422f 10Gi RWO Delete Bound openshift-storage/rook-ceph-mon-b local-sc 33m
local-pv-d6db37fd 100Gi RWO Delete Available localblock-sc 28m
[root@utility ~]#
And now we can see the OSD PVCs and the PVs bounded
Our installation is now complete and OCS fully operative.
Now we can browse the noobaa management console (for now it only works in Chrome) and create a new user to test the S3 object storage Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view.
Test it with your preferred S3 client (I use Cyberduck in my windows desktop which I’m using to write this article) Image may be NSFW. Clik here to view. Image may be NSFW. Clik here to view.
Create something to check if you can write Image may be NSFW. Clik here to view.
It works!
Set the ocs-storagecluster-cephfs StorageClass as the default one
Check the PVC created and wait for the new POD up&running
[root@utility ~]# oc get pvc -n openshift-image-registry
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
image-registry-storage Bound pvc-ba4a07c1-3d86-11ea-ad40-001a4a1601e7 100Gi RWX ocs-storagecluster-cephfs 12s
[root@utility ~]# oc get pod -n openshift-image-registry
NAME READY STATUS RESTARTS AGE
cluster-image-registry-operator-655fb7779f-pn7ms 2/2 Running 0 36h
image-registry-5bdf96556-98jbk 1/1 Running 0 105s
node-ca-9gbxg 1/1 Running 1 35h
node-ca-fzcrm 1/1 Running 0 35h
node-ca-gr928 1/1 Running 1 35h
node-ca-jkfzf 1/1 Running 1 35h
node-ca-knlcj 1/1 Running 0 35h
node-ca-mb6zh 1/1 Running 0 35h
[root@utility ~]#
Test it in a new project test
[root@utility ~]# oc new-project test
Now using project "test" on server "https://api.ocp42.ssa.mbu.labs.redhat.com:6443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app django-psql-example
to build a new example application in Python. Or use kubectl to deploy a simple Kubernetes application:
kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node
[root@utility ~]# podman pull alpine
Trying to pull docker.io/library/alpine...Getting image source signatures
Copying blob c9b1b535fdd9 doneCopying config e7d92cdc71 doneWriting manifest to image destination
Storing signaturese7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a
[root@utility ~]# podman login -u $(oc whoami) -p $(oc whoami -t) $REGISTRY_URL --tls-verify=false
Login Succeeded!
[root@utility ~]# podman tag alpine $REGISTRY_URL/test/alpine
[root@utility ~]# podman push $REGISTRY_URL/test/alpine --tls-verify=false
Getting image source signatures
Copying blob 5216338b40a7 done
Copying config e7d92cdc71 done
Writing manifest to image destination
Storing signatures
[root@utility ~]# oc get is -n test
NAME IMAGE REPOSITORY TAGS UPDATED
alpine default-route-openshift-image-registry.apps.ocp42.ssa.mbu.labs.redhat.com/test/alpine latest 3 minutes ago
[root@utility ~]#
The registry works!
Other Scenario
If your cluster is deployed in vSphere and uses the default “thin” StorageClass but your datastore isn’t big enough, you can start from the OCS installation.
When it comes to creating the OCS Cluster Service, create a YAML file with your desired sizes and without storageClassName (it will use the default one).
You can also remove the “monPVCTemplate” if you are not interested in changing the storage size.
[root@utility ~]# oc apply -f ocs-cluster-service-modified.yaml
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
storagecluster.ocs.openshift.io/ocs-storagecluster configured
We have to wait for the operator which reads the new configs and applies them
Now you can enjoy your brand-new OCS 4.2 in OCP 4.2.x\
Things changed if you think about OCS 3.x, for example, the use of the PVCs instead of using directly the disks attached and for now, there are a lot of limitations for sustainability and supportability reasons.\
We will wait for a fully supported installation for these scenarios.
UPDATES
The cluster used to write this article has been updated from 4.2.14 to 4.2.16 and then from 4.2.16 to 4.3.0.
The current OCS setup is still working Image may be NSFW. Clik here to view.
Our very own Burr Sutter has produced a video explaining how Kubernetes and OpenShift relate to one another, and why OpenShift is Kubernetes, not a fork there of.
With Role Based Access Control, we have an OpenShift-wide tool to determine the actions (or verbs) each user can perform against each object in the API. For that, rules are defined combining resources with the API verbs into sets called roles, and with the role binding we attribute those rules to users. Once we have those Users or Service Accounts, we can attribute them to particular resources to give them access to those actions. For example, a Pod may be able to delete a ConfigMap, but not a Secret when running under a specific Service Account. That’s an upper level control plane feature that doesn’t take into account the underlay node permission model, meaning the Unix permission model, and some of it’s newer kernel accouterments.
So, the container platform is protected with good RBAC practices from it’s created object,s but the node may not be. That is where a Pod may not be able to delete an object in etcd using the API because it’s restricted by RBAC, but it may delete important files in the system and even stop kubelet if properly programmed for that. To prevent this scenario, SCCs (Security Context Constraints) can come to the rescue.
Linux Processes and Privileges
Before going into deep waters with SCCs, let’s go back in time and take a look at some of the key concepts Linux brings to us regarding processes. A good start is entering the command man capabilities on a Linux terminal. That’s the manual page that contains very important fundamentals to understand the goal behind the SCCs.
The first important distinction that we need to do is between privileged and unprivileged processes. While privileged processes will have user ID 0 being the superuser or root, unprivileged processes will have non-zero user IDs. Privileged processes bypass kernel permission checks. That means that the actions that a process or thread can perform on operating systems objects such as files, directories, symbolic links, pseudo filesystems (procfs, cgroupfs, sysfs etc.) and even memory objects such as shared memory regions, pipes and sockets… Those actions are unlimited and not verified by the system. Meaning, the kernel won’t check user, group or others permissions (taking from the Unix permission model UGO – user, group and others) to grant access to that specific object in behalf of the process.
If we look at the list of running processes on a Linux system using the command ps -u root we will find very important processes such as systemd for example that has the PID 1 and is responsible for bootstrapping the user space in most distributions and initializing most common services. For that it needs non restricted access to the system.
Unprivileged processes, though, are subject to full permission checking based on process credentials (user ID, group ID and supplementary group list etc.). The kernel will make an iterative check under each category user, groups and others trying to match the user and group credentials on the running process with the target object’s permissions in order to grant or deny access. Keep in mind that this is not the service account in OpenShift. This is the system’s user that runs the container process if we want to speak containers.
After kernel 2.2 the concept of capabilities was introduced. In order to have more flexibility and enable the use of superuser or root features in a granular way, those super privileges were broken into small pieces that can be enabled or disabled independently. That is what we call capabilities. We can take a deeper look on http://man7.org/linux/man-pages/man7/capabilities.7.html
As an example, let’s say that we have an application that needs special networking configurations. Let’s say that we need to configure one interface, open a port on the system’s firewall, create a NAT rule for that and punt a new custom route on the system’s routing table. But you don’t need to make arbitrary changes to any file in the system. We can set CAP_NET_ADMIN instead of running the process as a privileged one.
Beyond privileges and capabilities we have SELinux and AppArmor that are both kernel security modules that can be added on top of capabilities to get even more fine grained security rules by using access control security policies or program profiles. In addition, we have Seccomp which is a secure computing mode kernel facility that reduces the available system calls to the kernel for a given process.
Finally, adding to all that, we still have interprocess communications, privilege escalation and access to the host namespace when we begin to talk about containers. That is out of scope here at this point but…
How does that translate to containers?
That said, we come back to containers and ask: what are containers again? They are processes segregated by namespaces and cgroups and on that note they have all the same security features described above. So how do we create containers with those security features then?
Let’s first take a look at what is the smallest piece of software that creates the container process: runc. As its definition on the github page says, it’s a tool to spawn and run containers according to the OCI specification. It’s the default choice for OCI runtimes although we have others such as kata containers. In order to use runc, we need to have a file system image and a bundle with the configuration for the process. The short story on the bundle is we must put a json formatted specification for the container where all the configurations will be taken into account. Check this part of it’s documentation: https://github.com/opencontainers/runtime-spec/blob/master/config.md#linux-process
From there we have fields such as apparmorProfile, capabilities or selinuxLabel. We can set user ID, group ID and supplementary group IDs. What tool then automates the process of getting the file system ready and passing down those parameters for us?
We can use podman, for example, for testing or development, running isolated containers or pods. It allows us to do it with special privileges as we show below:
Privileged bash terminal:
sudo podman run --privileged -it registry.access.redhat.com/rhel7/rhel /bin/bash
Process ntpd with privilege to change the system clock:
sudo podman run -d --cap-add SYS_TIME ntpd
Ok. Cool. But when it comes the time to run those containers on Kubernetes or OpenShift how do we configure those capabilities and security features?
Inside the OpenShift platform CRI-O container engine is the one that runs and manages containers. It is compliant with the Kubernetes Container Runtime Interface (CRI). It complies with kubelet rules in order to give it a standard interface to call the container engine and all the magic is done automating runc behind the scenes while allowing other features to be developed under the engine itself.
Following the workflow above to run a pod in Kubernetes or OpenShift, we’ll first make an API call to kubernetes asking to run a particular Pod. It could come from an oc command or from code, for example. Then the API will process that request and store it in etcd; the pod will be scheduled for a specific node since the scheduler watches those events; finally, kubelet, in that node, will read that event and call the container runtime (CRI-O) with all the parameters and options requested to run the pod. I know it’s very summarized. But the important thing here is that we need to pass parameters down to the API in order to have our Pod with the desired privileges configured. In the example below a new pod gets scheduled to run in node 1.
Image may be NSFW. Clik here to view.
What goes into that yaml file in order to request those privileges? Two different objects are implemented under the Kubernetes API: PodSecurityContext and SecurityContext. The first one, obviously, related to Pods and the second one related to the specific container. They are part of their respective types. So you can find those fields on Pod and Container Specs on yaml manifests. With that they can be applied to an entire Pod, no matter how many containers are there or to specific containers into that Pod. Then the SecurityContext settings take precedence over the PodSecurityContext ones. You can find the security context source code under https://github.com/kubernetes/api/blob/master/core/v1/types.go.
Here we can find a few examples on how to configure security contexts for Pods. Below I present the first three fields of the SecurityContext object.
type SecurityContext struct {
// The capabilities to add/drop when running containers.
// Defaults to the default set of capabilities granted by the container runtime.
// +optional
Capabilities *Capabilities `json:"capabilities,omitempty" protobuf:"bytes,1,opt,name=capabilities"`
// Run container in privileged mode.
// Processes in privileged containers are essentially equivalent to root on the host.
// Defaults to false.
// +optional
Privileged *bool `json:"privileged,omitempty" protobuf:"varint,2,opt,name=privileged"`
// The SELinux context to be applied to the container.
// If unspecified, the container runtime will allocate a random SELinux context for each
// container. May also be set in PodSecurityContext. If set in both SecurityContext and
// PodSecurityContext, the value specified in SecurityContext takes precedence.
// +optional
SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty" protobuf:"bytes,3,opt,name=seLinuxOptions"`
<...>
}
Here is an example of a yaml manifest configuration with capabilities on securityContext field:
Ok. Now what? We have an idea on how to give super powers to a container or Pod even though they may be RBAC restricted. How can we control this behavior?
Security Context Constraints
Finally we get back to our main subject. How can I make sure that a specific Pod or Container doesn’t request more than what it should request in terms of process privileges and not only OpenShift object privileges under it’s API?
That’s the role of Security Context Constraints. To check beforehand if the system can pass that pod or container configuration request, with privileged or custom security context, further onto the cluster API that will end up running a powerful container process. To have a taste on what a SCC looks like here is an example:
oc get scc restricted -o yaml
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: MustRunAs
groups:
- system:authenticated
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: restricted denies access to all host features and requires
pods to be run with a UID, and SELinux context that are allocated to the namespace. This
is the most restrictive SCC and it is used by default for authenticated users.
creationTimestamp: "2020-02-08T17:25:39Z"
generation: 1
name: restricted
resourceVersion: "8237"
selfLink: /apis/security.openshift.io/v1/securitycontextconstraints/restricted
uid: 190ef798-af35-40b9-a980-0d369369a385
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
type: MustRunAsRange
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
That above is the default SCC that has pretty basic permissions and will accept Pod configurations that don’t request special security contexts. Just by looking at the name of the fields we can have an idea on how many features it can verify before letting a workload with containers pass by the API and get scheduled.
In conclusion, we have at hand a tool that allows an OpenShift admin to decide whether an entire pod can run in privileged mode, have special capabilities, access directories and volumes on the host namespace, use special SELinux contexts, what ID the container process can use among other features before the Pod gets requested to the API and passed to the container runtime process.
In the next blog posts we’ll explore each field of an SCC, explore their underlying Linux technology, present the prebuilt ones and understand their relationship with the RBAC system to grant or deny special security contexts declared under Pod’s or container’s Spec field. Stay tuned!
Just deployed the EFK Logging-Stack on top of OpenShift and now you cannot see all the logs on Kibana?
Suddenly the infra nodes start to hang and it is running out of resources? In this OpenShift Commons Briefing, Red Hat’s Gabriel Ferraz Stein shows us how to check the installation from the EFK Logging-Stack, how to to better capacity planning to not run out of resources and also effectively work with Red Hat Support Services to solve the Logging-Stack issues.
The briefing, Gabriel covers the many aspects from the EFK Logging-Stack installed on top of OpenShift Container Platform 3.x/4.x including:
What are exactly the components from the EFK Logging-Stack?
What functions do they have?
What are the most common recommendations and requirements to deploy a EFK Logging-Stack safely and in a reliable way?
How should I calculate the capacity from your resources?
Best practices and the sizing options
Debugging the EFK Logging Stack
Generate a Logging-Dump to help you understand all the components from the dump
If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.
I am sure many of you are as excited as we about cloud native development, and one of the hot topics in the space is Serverless. With that in mind let’s talk about our most recent release of OpenShift Serverless that includes a number of features and functionalities that definitely improve the developer experience in Kubernetes and really enable many interesting application patterns and workloads.
For the uninitiated, OpenShift Serverless is based on the open source project Knative and helps developers deploy and run almost any containerized workload as a serverless workload. Applications can scale up or down (to zero) or react to and consume events without lock-in concerns. The Serverless user experience can be integrated with other OpenShift services, such as OpenShift Pipelines, Monitoring and Metering. Beyond autoscaling and events, it also provides a number of other features, such as:
Image may be NSFW. Clik here to view.
Immutable revisions allows you to deploy new features: performing canary, A/B or blue-green testing with gradual traffic rollout with no sweat and following best practices.
Image may be NSFW. Clik here to view.
Ready for the hybrid cloud: Truly portable serverless running anywhere OpenShift runs, that is on-premises or on any public cloud. Leverage data locality and SaaS when needed.
Image may be NSFW. Clik here to view.
Use any programming language or runtime of choice. From Java, Python, Go and JavaScript to Quarkus, SpringBoot or Node.js.
One of the most interesting aspects of running serverless containers is that it offers an alternative to application modernization that allows users to reuse investments already made and what is available today. If you have a number of web applications, microservices or RESTful APIs built as containers that you would like to scale up and down based on the number of HTTP requests, that’s a perfect fit. But if you also would like to build new event driven systems that will consume Apache Kafka messages or be triggered by new files being uploaded to Ceph (or S3), that’s possible too. Autoscaling your containers to match the number of requests can improve your response time, offering a better quality of service and increase your cluster density by allowing more applications to run, optimizing resources usage.
New Features in 1.5.0 – Technology Preview
Based on Knative 0.12.1 – Keeping up with the release cadence of the community, we already include Knative 0.12 in Serving, Eventing and kn – the official Knative CLI. As with anything we ship as a product at Red Hat, this means we have validated these components on a variety of different platforms and configurations OpenShift runs.
Use of Kourier – By using Kourier we can maintain the list of requirements to get Serverless installed in OpenShift to a minimal, with low resource consumption, faster cold-starts and avoiding impact on non-serverless workloads running on the same namespace. In combination with fixes we implemented in OpenShift 4.3.5 the time to create an application from a pre-built container improved between 40-50% depending on the container image size.
Before Kourier
Image may be NSFW. Clik here to view.
After Kourier
Image may be NSFW. Clik here to view.
Disconnected installs (air gapped) – Given the request of several customers that want to benefit from serverless architectures and its programming model but in controlled environments with restricted or no internet access, we are enabling the OpenShift Serverless operator to be installed in disconnected OpenShift clusters. The kn CLI, used to manage applications in Knative, is also available to download from the OpenShift cluster itself, even in disconnected environments.
Image may be NSFW. Clik here to view.
The journey so far
We already have OpenShift Serverless being deployed and used on a number of Openshift clusters by a variety of customers during the Technology Preview. These clusters are running on a number of different providers such as on premises with bare metal hardware or virtualized systems, or on the cloud running on AWS or Azure. These environments exposed our team to a number of different configurations that you really only get by running hybrid cloud solutions which enables us to cover a wide net during this validation period and take this feedback back to the community, improving quality and usability.
Install experience and upgrades with the Operator
Image may be NSFW. Clik here to view.
The Serverless operator deals with all the complexities of installing Knative on Kubernetes, offering a simplified experience. It takes it one step further by enabling an easy path to upgrades and updates, which are also delivered over-the-air and that can be applied automatically, making system administrators rest assured that they can receive CVEs and bug fixes to production systems. For those concerned with automatic updates, they can also opt for manually applying those as well.
Integration with Console
With the integration with OpenShift console, users have the ability to configure traffic distribution using the UI as an alternative to use kn, the CLI. Traffic split lets users perform a number of different techniques to roll out new versions and new features on their applications, the most common ones being A/B testing, canary or dark launches. By letting users visualize this using the topology view they can get quickly an understanding of the architecture and deployment strategies being used and course correct if needed.
Image may be NSFW. Clik here to view.
The integration with the console provides a good visualization for event sources connected to services. The screenshot below for examples has a service (kiosk) consuming messages from Apache Kafka, while two other applications (frontend) are scaled down to zero.
Deploy your first application and use Quarkus
To deploy your first serverless container using the CLI (kn), download the client and from a terminal execute:
[markito@anakin ~]$ kn service create greeter --image quay.io/rhdevelopers/knative-tutorial-greeter:quarkus
Creating service 'greeter' in namespace 'default':
0.133s The Route is still working to reflect the latest desired specification.
0.224s Configuration "greeter" is waiting for a Revision to become ready.
5.082s ...
5.132s Ingress has not yet been reconciled.
5.235s Ready to serve.
Service 'greeter' created to latest revision 'greeter-pjxfx-1' is available at URL:
http://greeter.default.apps.test.mycluster.org
This will create a Knative Service based on the container image provided. Quarkus, a Kubernetes native Java stack, is a perfect fit for building serverless applications in Java, given its blazing fast startup time and low memory footprint, but Knative can also run any other language or runtime. Creating a Knative Service object will manage multiple Kubernetes objects commonly used to deploy an application, such as Deployments, Routes and Services, providing a simplified experience for anyone getting started with Kubernetes development, with the added benefit of making it autoscale based on the number of requests and all other benefits already mentioned on this post.
You can also follow the excellent Knative Tutorial for more scenarios and samples.
The journey so far has been exciting and we have been contributing to the Knative community since its inception. I would also like to send a big “thank you” to our team across engineering, QE and documentation for keeping up with the fast pace of the serverless space; they have been doing phenomenal work.
In this blog we would like to demonstrate how to use the new NVIDIA GPU operator to deploy GPU-accelerated workloads on an OpenShift cluster.
The new GPU operator enables OpenShift to schedule workloads that require use of GPGPUs as easily as one would schedule CPU or memory for more traditional not accelerated workloads. Start by creating a container that has a GPU workload inside it and request the GPU resource when creating the pod and OpenShift will take care of the rest. This makes deployment of GPU workloads to OpenShift clusters straightforward for users and administrators as it is all managed at the cluster level and not on the host machines. The GPU operator for OpenShift will help to simplify and accelerate the compute-intensive ML/DL modeling tasks for data scientists, as well as help running inferencing tasks across data centers, public clouds, and at the edge. Typical workloads that can benefit from GPU acceleration include image and speech recognition, visual search and several others.
We assume that you have an OpenShift 4.x cluster deployed with some worker nodes that have GPU devices.
In order to expose what features and devices each node has to OpenShift we first need to deploy the Node Feature Discovery (NFD) Operator (see here for more detailed instructions).
Once the NFD Operator is deployed we can take a look at one of our nodes; here we see the difference between the node before and after. Among the new labels describing the node features, we see:
feature.node.kubernetes.io/pci-10de.present=true
This indicates that we have at least one PCIe device from the vendor ID 0x10de, which is for Nvidia. These labels created by the NFD operator are what the GPU Operator uses in order to determine where to deploy the driver containers for the GPU(s).
However, before we can deploy the GPU Operator we need to ensure that the appropriate RHEL entitlements have been created in the cluster (see here for more detailed instructions). After the RHEL entitlements have been deployed to the cluster, then we may proceed with installation of the GPU Operator.
The GPU Operator is currently installed via helm chart, so make sure that you have helm v3+ installed. Once you have helm installed we can begin the GPU Operator installation.
1. Add the Nvidia helm repo:
$ helm repo add nvidia https://nvidia.github.io/gpu-operator"nvidia" has been added to your repositories
2. Update the helm repo:
$ helm repo updateHang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repositoryUpdate Complete. ⎈ Happy Helming!⎈
This command will watch the gpu-operator-resources namespace as the operator rolls out on the cluster. Once the installation is completed you should see something like this in the gpu-operator-resources namespace.
We can see that both the nvidia-driver-validation and the nvidia-device-plugin-validation pods have completed successfully and we have four daemonsets, each running the number of pods that match the node label feature.node.kubernetes.io/pci-10de.present=true. Now we can inspect our GPU node once again.
Here we can see the latest changes to our node which now include Capacity, Allocatable and Allocatable Resources for a new resource called nvidia.com/gpu. As we see above, since our GPU node only has one GPU we can see that reflected.
Now that we have the NFD Operator, cluster entitlements, and the GPU Operator deployed we can assign workloads that will use the GPU resources.
Let’s begin by configuring Cluster Autoscaling for our GPU devices. This will allow us to create workloads that request GPU resources and then will automatically scale our GPU nodes up and down depending on the amount of requests pending for these devices.
The metadata name should be a unique MachineAutoscaler name, and the MachineSet name at the end of the file should be the value of an existing MachineSet.
Looking at our cluster, we check what MachineSets are available:
$ oc get machinesets -n openshift-machine-apiNAME DESIRED CURRENT READY AVAILABLE AGEsj-022820-01-h4vrj-worker-us-east-1a 1 1 1 1 4h45msj-022820-01-h4vrj-worker-us-east-1b 1 1 1 1 4h45msj-022820-01-h4vrj-worker-us-east-1c 1 1 1 1 4h45m
In this example the third MachineSet sj-022820-01-h4vrj-worker-us-east-1c is the one that has GPU nodes.
We can now start to deploy RAPIDs using shared storage between multiple instances. Begin by creating a new project:
$ oc new-project rapids
Assuming you have a StorageClass that provides ReadWriteMany functionality like OpenShift Container Storage with cephfs, we can create a PVC to attach to our RAPIDs instances. (‘storageClassName` is the name of the StorageClass)
Now that we have our shared storage deployed we can finally deploy the RAPIDs template and create the new application inside our rapids namespace:
$ oc create -f 0004-rapids_template.yamltemplate.template.openshift.io/rapids created$ oc new-app rapids--> Deploying template "rapids/rapids" to project rapids RAPIDS --------- Template for RAPIDS A RAPIDS pod has been created. * With parameters: * Number of GPUs=1 * Rapids instance number=1--> Creating resources ... service "rapids" created route.route.openshift.io "rapids" created pod "rapids" created--> Success Access your application via route 'rapids-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com' Run 'oc status' to view your app.
In a browser we can now load the route that the template created above: rapids-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com
Image may be NSFW. Clik here to view.
Image shows example notebook running using GPUs on OpenShift
We can also see on our GPU node that RAPIDs is running on it and using the GPU resource:
$ oc describe gpu node
Given we have more than one person that wants to run Jupyter playbooks, lets create a second RAPIDs instance with its own dedicated GPU.
$ oc new-app rapids -p INSTANCE=2--> Deploying template "rapids/rapids" to project rapids RAPIDS --------- Template for RAPIDS A RAPIDS pod has been created. * With parameters: * Number of GPUs=1 * Rapids instance number=2--> Creating resources ... service "rapids2" created route.route.openshift.io "rapids2" created pod "rapids2" created--> Success Access your application via route 'rapids2-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com' Run 'oc status' to view your app.
But we just used our only GPU resource on our GPU node, so the new deployment of rapids (rapids2) is not schedulable due to insufficient GPU resources.
$ oc get pods -n rapidsNAME READY STATUS RESTARTS AGErapids 1/1 Running 0 30mrapids2 0/1 Pending 0 2m44s
If we look at the event state of the rapids2 pod:
$ oc describe pod/rapids -n rapids...Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/9 nodes are available: 9 Insufficient nvidia.com/gpu. Normal TriggeredScaleUp 44s cluster-autoscaler pod triggered scale-up: [{openshift-machine-api/sj-022820-01-h4vrj-worker-us-east-1c 1->2 (max: 6)}]
We just need to wait for the ClusterAutoscaler and MachineAutoscaler to do their job and scale up the MachineSet as we see above. Once the new node is created:
$ oc get no NAME STATUS ROLES AGE VERSION(old nodes)...ip-10-0-167-0.ec2.internal Ready worker 72s v1.16.2
The new RAPIDs instance will deploy to the new node once it becomes Ready with no user intervention.
To summarize, the new NVIDIA GPU operator simplifies the use of GPU resources in OpenShift clusters. In this blog we’ve demonstrated the use-case for multi-user RAPIDs development using NVIDIA GPUs. Additionally we’ve used OpenShift Container Storage and the ClusterAutoscaler to automatically scale up our special resource nodes as they are being requested by applications.
As you observed, NVIDIA GPU Operator is already relatively easy to deploy using Helm and work is ongoing to support t deployments right from OperatorHub, simplifying this process even further.
Welcome to the first briefing of the “All Things Data” series of OpenShift Commons briefings. We’ll be holding future briefings on Tuesdays at 8:00am PST, so reach out with any topics you’re interested in and remember to bookmark the OpenShift Commons Briefing calendar!
In this first briefing for the “All Things Data” OpenShift Commons series, Red Hat’s Guillaume Moutier and Landon LaSmith demo’d how to easily integrate Open Data Hub and OpenShift Container Storage to build your own data science platform. When working on data science projects, it’s a guarantee that you will need different kinds of storage for your data: block, file, object.
Open Data Hub (ODH) is an open source project that provides open source AI tools for running large and distributed AI workloads on OpenShift Container Platform.
OpenShift Container Storage (OCS) is software-defined storage for containers that provides you with every type of storage you need, from a simple, single source.
If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.
OpenShift stands out as a leader with a security-focused, supported Kubernetes platform—including a foundation based on Red Hat Enterprise Linux.
But we already knew all that, the game changer for OpenShift, is the release of OCP version 4.x: OpenShift 4 is powered by Kubernetes Operators and Red Hat’s commitment to full-stack security, so you can develop and scale big ideas for the enterprise.
OpenShift started with distributed systems. It was eventually extended to IBM Power Systems, and now it is available on IBM Z. This creates a seamless user experience across major architectures such as x86, PPC and s390x!
This article’s goal is to share my experience on how to install the OpenShift Container Platform (OCP) 4.2.19 on IBM Z. We will use the minimum requirements to get our environment up and running. That said, for production or performance testing use the recommended hardware configuration from the official Red Hat documentation. The minimum machine requirements for a cluster with user-provisioned infrastructure are as follows:
The smallest OpenShift Container Platform clusters require the following hosts:
* One temporary bootstrap machine.
* Three control plane, or master, machines.
* At least two compute, or worker, machines.
The bootstrap, control plane (often called masters), and compute machines must use Red Hat Enterprise Linux CoreOS (RHCOS) as the operating system.
All the RHCOS machines require network in initramfs during boot to fetch Ignition config files from the Machine Config Server. The machines are configured with static IP addresses. No DHCP server is required.
To install on IBM Z under z/VM, we require a single z/VM virtual NIC in layer 2 mode. You also need:
* A direct-attached OSA
* A z/VM VSwitch set up.
Minimum Resource Requirements
Each cluster machine must meet the following minimum requirements, in our case, these are the resource requirements for the VMs on IBM z/VM: Image may be NSFW. Clik here to view.
For our testing purposes (and resources limitations) we used DASD model 54 for each node instead of the 120GB recommended by the official Red Hat documentation.
Make sure to install OpenShift Container Platform version 4.2 using one of the following IBM hardware:
IBM Z, versions 13, 14, or 15.
LinuxONE, any version.
Hardware Requirements
1 LPAR with 3 IFLs that supports SMT2.
1 OSA or RoCE network adapter.
Operating System Requirements
One instance of z/VM 7.1.
This is the environment that we created to install the Openshift Container Platform following the minimum resource requirements. Keep in mind that other services will be required in this environment, and you can have them either on Z or provided to the Z box from outside; DNS (name resolution), HAProxy ( our load balancer), Workstation (our client system where we would run the CLI commands for OCP), HTTPd (serving the files such as the Red Hat CoreOS image as well as the ignition files that will be generated by later sections of this guide): Image may be NSFW. Clik here to view.
Network Topology Requirements
Before you install OpenShift Container Platform, you must provision two layer-4 load balancers. The API requires one load balancer and the default Ingress Controller needs the second load balancer to provide ingress to applications. In our case, we used a single instance of HAProxy running on a Red Hat Enterprise Linux 8 VM as our load balancer.
The following haproxy configuration will help us provide the load balancer layer for our purposes, edit the /etc/haproxy/haproxy.cfg and add:
listen ingress-http
bind *:80
mode tcp
server worker0 :80 check
server worker1 :80 check
listen ingress-https
bind *:443
mode tcp
server worker0 :443 check
server worker1 :443 check
listen api
bind *:6443
mode tcp
server bootstrap :6443 check
server master0 :6443 check
server master1 :6443 check
server master2 :6443 check
listen api-int
bind *:22623
mode tcp
server bootstrap :22623 check
server master0 :22623 check
server master1 :22623 check
server master2 :22623 check
Don’t forget to open the respective ports on the system’s firewall as well as set the SELinux boolean as follows:
The following DNS records are required for an OpenShift Container Platform cluster that uses user-provisioned infrastructure. In each record, is the cluster name and is the cluster base domain that you specify in the install-config.yaml file.
Required DNS Records:
api..
This DNS record must point to the load balancer for the control plane machines. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.
api-int..
This DNS record must point to the load balancer for the control plane machines. This record must be resolvable from all the nodes within the cluster.
The API server must be able to resolve the worker nodes by the host names that are recorded in Kubernetes. If it cannot resolve the node names, proxied API calls can fail, and you cannot retrieve logs from Pods.
*.apps..
A wildcard DNS record that points to the load balancer that targets the machines that run the Ingress router pods, which are the worker nodes by default. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.
etcd-..
OpenShift Container Platform requires DNS records for each etcd instance to point to the control plane machines that host the instances. The etcd instances are differentiated by values, which start with 0 and end with n-1, where n is the number of control plane machines in the cluster. The DNS record must resolve to an unicast IPv4 address for the control plane machine, and the records must be resolvable from all the nodes in the cluster.
_etcd-server-ssl._tcp..
For each control plane machine, OpenShift Container Platform also requires a SRV DNS record for etcd server on that machine with priority 0, weight 10 and port 2380. A cluster that uses three control plane machines requires the following records:
# _service._proto.name. TTL class SRV priority weight port targ
Transfer the initramfs, kernel, parameter file, and RHCOS images to z/VM, for example with FTP.
Punch the files to the virtual reader of the z/VM guest virtual machine that is to become your bootstrap node.
Log in to CMS on the bootstrap machine.
IPL the bootstrap machine from the reader.
Once the installation of the Red Hat CoreOS finishes, make sure to re-IPL this VM so it will load the Linux OS from it’s internal DASD.
Repeat this procedure for the other machines in the cluster, which means applying the same steps for creating the Red Hat Enterprise Linux CoreOS with the respective changes to
`master0`, `master1`, `master2`, `compute0` and `compute1`.et.
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-0...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-1...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-2...
As a summary, this is how our DNS records defined in our domain zone would look like when using Bind as my DNS server :
$TTL 86400
@ IN SOA .. admin.. (
2020021813 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)
;Name Server Information
@ IN NS ..
;IP Address for Name Server
IN A
;A Record for the following Host name
haproxy IN A
bootstrap IN A
master0 IN A
master1 IN A
master2 IN A
workstation IN A
compute0 IN A
compute1 IN A
etcd-0. IN A
etcd-1. IN A
etcd-2. IN A
;CNAME Record
api. IN CNAME haproxy..
api-int. IN CNAME haproxy..
*.apps. IN CNAME haproxy..
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-0...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-1...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-2...
Don’t forget to create the reserve records for your zone as well, example of how we setup ours:
$TTL 86400
@ IN SOA .. admin.. (
2020021813 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)
;Name Server Information
@ IN NS ..
IN A
;Reverse lookup for Name Server
IN PTR ..
;PTR Record IP address to Hostname
IN PTR haproxy..
IN PTR bootstrap..
IN PTR master0..
IN PTR master1..
IN PTR master2..
IN PTR master3..
IN PTR compute0..
IN PTR compute1..
IN PTR workstation..
Where for each record will be the last octet of their IP addresses.
Make sure that your Bind9 DNS server also provides access to the outside world, a.k.a Internet access by using the parameter in your /etc/named.conf options configuration section:
For the sections Generating an SSH private key and Installing the CLI as well as Manually Creating the installation configuration files, we used the Workstation VM using RHEL8.
Generating an SSH private key and adding it to the agent
In our case, we used a Linux workstation as the base system outside of the OCP cluster. The next steps were done in this system.
If you want to perform installation debugging or disaster recovery on your cluster, you must provide an SSH key to both your ssh-agent and to the installation program.
If you do not have an SSH key that is configured for password-less authentication on your computer, create one. For example, on a computer that uses a Linux operating system, run the following command:
$ ssh-keygen -t rsa -b 4096 -N '' \
-f /
Then access the Infrastructure Provider page on the Red Hat OpenShift Cluster Manager site. If you have a Red Hat account, log in with your credentials. If you do not, create an account.
Navigate to the page for your installation type, download the installation program for your operating system, and place the file in the directory where you will store the installation configuration files:
https://…/openshift-v4/s390x/clients/ocp/latest/openshift-install-linux-4.2.18.tar.gz
Extract the installation program. For example, on a computer that uses a Linux operating system, run the following command:
$ tar xvf .tar.gz
From the Pull Secret page on the Red Hat OpenShift Cluster Manager site, download your installation pull secret as a .txt file. This pull secret allows you to authenticate with the services that are provided by the included authorities, including Quay.io, which serves the container images for OpenShift Container Platform components.
Installing the CLI
You can install the CLI in order to interact with OpenShift Container Platform using a command-line interface.
From the Infrastructure Provider page on the Red Hat OpenShift Cluster Manager site, navigate to the page for your installation type and click Download Command-line Tools.
Click the folder for your operating system and architecture and click the compressed file.
– Save the file to your file system.
https://…/openshift-v4/s390x/clients/ocp/latest/openshift-client-linux-4.2.18.tar.gz
– Extract the compressed file.
– Place it in a directory that is on your PATH.
After you install the CLI, it is available using the oc command:
$ oc
<command></command>
<command></command>
Manually creating the installation configuration file
For installations of OpenShift Container Platform that use user-provisioned infrastructure, you must manually generate your installation configuration file.
Create an installation directory to store your required installation assets in:
$ mkdir
Customize the following install-config.yaml file template and save it in the “.
Sample install-config.yaml file for bare metal
You can customize the install-config.yaml file to specify more details about your OpenShift Container Platform cluster’s platform or modify the values of the required parameters. For IBM Z, please make sure to add architecture: s390x for both compute and controlPlane nodes or the config-cluster.yaml file will be generated with AMD64.
Creating the Kubernetes manifest and Ignition config files
Because you must modify some cluster definition files and manually start the cluster machines, you must generate the Kubernetes manifest and Ignition config files that the cluster needs to make its machines.
Generate the Kubernetes manifests for the cluster:
$ ./openshift-install create manifests --dir=
WARNING There are no compute nodes specified. The cluster will not fully initialize without compute nodes.
INFO Consuming “Install Config” from target directory
Modify the //manifests/cluster-scheduler-02-config.yml Kubernetes manifest file to prevent Pods from being
scheduled on the control plane machines:
1. Open the manifests/cluster-scheduler-02-config.yml file.
2. Locate the mastersSchedulable parameter and set its value to False.
3. Save and exit the file.
The following files are generated in the directory:
.
├── auth
│ ├── kubeadmin-password
│ └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign
Copy the files master.ign, worker.ign and bootstrap.ign to the HTTPD node where you should have configured a http server (Apache) to serve these files during the creation of the Red Hat Linux CoreOS VMs.
Creating Red Hat Enterprise Linux CoreOS (RHCOS) machines
Download the Red Hat Enterprise Linux CoreOS installation files from the RHCOS image mirror
Download the following files:
* The initramfs: rhcos--installer-initramfs.img
* The kernel: rhcos--installer-kernel
* The operating system image for the disk on which you want to install RHCOS. This type can differ by virtual machine:
* rhcos--s390x-metal-dasd.raw.gz for DASD (We used the DASD version)
Create parameter files. The following parameters are specific for a particular virtual machine:
* For coreos.inst.install_dev=, specify dasda for a DASD installation.
* For rd.dasd=, specifies the DASD where RHCOS is to be installed.
The bootstrap machine ignition file is called bootstrap-0, the master ignition files are numbered 0 through 2, the worker ignition files from 0 upwards. All other parameters can stay as they are.
Example parameter file we used on our environment, bootstrap-0.parm, for the bootstrap machine:
Where = physical interface, = virtual interface alias for enc1e00 and <1100> = vlan ID
Note that for your environment the rd.znet=, rd.dasd=, coreos.inst.install_dev=, will all be different for you.
Each VM on z/VM will require access to the initramfs, kernel, and parameter (.parm) files on their internal disk. We used a common approach which is create a VM that will use it’s internal disk as a repository for all these files, and all the other VMs part of the cluster (bootstrap, master0, master1, …. worker1) will have access to this repository VMs disk (often in read-only mode) saving disk space as these files will only be used in the first stage of the process to load the files for each VM part of the cluster into the server’s memory. Each cluster VM will have a dedicated disk for the RHCOS, which is a completely separate disk (as previously covered, the model 54 ones).
Transfer the initramfs, kernel and all parameter (.parm) files to the repository VM’s local A disk on z/VM from an external FTP server:
==> ftp <VM_REPOSITORY_IP>
VM TCP/IP FTP Level 710
Connecting to <VM_REPOSITORY_IP>, port 21
220 (vsFTPd 3.0.2)
USER (identify yourself to the host):
>>>USER <username>
331 Please specify the password.
Password:
>>>PASS ********
230 Login successful.
Command:
cd <repositoryofimages>
ascii
get <parmfile_bootstrap>.parm
get <parmfile_master>.parm
get <parmfile_worker>.parm
locsite fix 80
binary
get <kernel_image>.img
get <initramfs_file>
Example of the VM definition (userid=LNXDB030) for the bootstrap VM on IBM z/VM for this installation:
USER LNXDB030 LBYONLY 16G 32G
INCLUDE DFLT
COMMAND DEFINE STORAGE 16G STANDBY 16G
COMMAND DEFINE VFB-512 AS 0101 BLK 524288
COMMAND DEFINE VFB-512 AS 0102 BLK 524288
COMMAND DEFINE VFB-512 AS 0103 BLK 524288
COMMAND DEFINE NIC 1E00 TYPE QDIO
COMMAND COUPLE 1E00 SYSTEM VSWITCHG
CPU 00 BASE
CPU 01
CPU 02
CPU 03
MACHINE ESA 8
OPTION APPLMON CHPIDV ONE
POSIXINFO UID 100533
MDISK 0191 3390 436 50 USAW01
MDISK 0201 3390 1 END LXDBC0
Where USER LNXDB030 LBYONLY 16G 32G is userid password memory definition, COMMAND DEFINE VFB-512 AS 0101 BLK 524288 is Swap definition, COMMAND DEFINE NIC 1E00 TYPE QDIO is NIC definition, COMMAND COUPLE 1E00 SYSTEM VSWITCHG is vswitch couple, MDISK 0191 3390 436 50 USAW01 is where you put the EXEC to run, MDISK 0201 3390 1 END LXDBC0 is the mdisk mod54 for the RHCOS.
Punch the files to the virtual reader of the z/VM guest virtual machine that is to become your bootstrap node.
Log in to CMS on the bootstrap machine.
IPL CMS
Create the exec file to punch the other files (kernel, parm file, initramfs) to start the linux installation on each linux servers part of Openshift cluster using the mdisk 191, this example shows the bootstrap exec file:
/* EXAMPLE EXEC FOR OC LINUX INSTALLATION */
TRACE O
'CP SP CON START CL A *'
'EXEC VMLINK MNT3 191 <1191 Z>'
'CL RDR'
'CP PUR RDR ALL'
'CP SP PU * RDR CLOSE'
'PUN KERNEL IMG Z (NOH'
'PUN BOOTSTRAP PARM Z (NOH'
'PUN INITRAMFS IMG Z (NOH'
'CH RDR ALL KEEP NOHOLD'
'CP IPL 00C'
The line EXEC VMLINK MNT3 191 <1191 Z> shows that the disk from the repository VM will be linked to this VM’s EXEC process, making the files we already transferred to the the repository VM’s local disk available to the VM where this EXEC file will be run, for example the bootstrap VM.
Call the EXEC file to start the bootstrap installation process
<BOOTSTRAP> EXEC
Once the installation of the Red Hat CoreOS finishes, make sure to re-IPL this VM so it will load the Linux OS from it’s internal DASD:
#CP IPL 201
The you will see the RHCOS loading from it’s internal mode 54 dasd disk:
Red Hat Enterprise Linux CoreOS 42s390x.81.20200131.0 (Ootpa) 4.2"
SSH host key: <SHA256key>"
SSH host key: : <SHA256key>"
SSH host key: <SHA256key>"
eth0.1100: ,<ipaddress> fe80::3ff:fe00:9a"
bootstrap login:
Repeat this procedure for the other machines in the cluster, which means applying the same steps for creating the Red Hat Enterprise Linux CoreOS with the respective changes to master0, master1, master2, compute0 and compute1.
Make sure to include IPL 201 into the VMs definition so whenever the VM goes it will automatically IPL the disk 201 disk (RHCOS), example:
USER LNXDB030 LBYONLY 16G 32G
INCLUDE DFLT
COMMAND DEFINE STORAGE 16G STANDBY 16G
COMMAND DEFINE VFB-512 AS 0101 BLK 524288
COMMAND DEFINE VFB-512 AS 0102 BLK 524288
COMMAND DEFINE VFB-512 AS 0103 BLK 524288
COMMAND DEFINE NIC 1E00 TYPE QDIO
COMMAND COUPLE 1E00 SYSTEM VSWITCHG
CPU 00 BASE
CPU 01
CPU 02
CPU 03
IPL 201
MACHINE ESA 8
OPTION APPLMON CHPIDV ONE
POSIXINFO UID 100533
MDISK 0191 3390 436 50 USAW01
MDISK 0201 3390 1 END LXDBC0
Creating the cluster
To create the OpenShift Container Platform cluster, you wait for the bootstrap process to complete on the machines that you provisioned by using the Ignition config files that you generated with the installation program.
Monitor the bootstrap process:
After bootstrap process is complete, remove the bootstrap machine from the load balancer.
Logging in to the cluster
You can log in to your cluster as a default system user by exporting the cluster kubeconfig file. The kubeconfig file contains information about the cluster that is used by the CLI to connect a client to the correct cluster and API server. The file is specific to a cluster and is created during
OpenShift Container Platform installation.
Export the kubeadmin credentials:
$ export KUBECONFIG=/auth/kubeconfig
Verify you can run oc commands successfully using the exported configuration:
$ oc whoami
system:admin
Review the pending certificate signing requests (CSRs) and ensure that the you see a client and server request with Pending or Approved status for each machine that you added to the cluster:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master0. Ready master 3d3h v1.14.6+c383847f6
master1. Ready master 3d3h v1.14.6+c383847f6
master2. Ready master 3d3h v1.14.6+c383847f6
worker0. Ready worker 3d3h v1.14.6+c383847f6
worker1. Ready worker 3d3h v1.14.6+c383847f6
Initial Operator configuration
After the control plane initializes, you must immediately configure some Operators so that they all become available.
Watch the cluster components come online (wait until all are True in the AVAILABLE column :
Once the the file gets patched it will automatically make sure that the image-registry container follow that state.
This is how the command $ oc get co (abbreviation of clusteroperators) should look like
$ ./openshift-install --dir= wait-for install-complete
INFO Waiting up to 30m0s for the cluster to initialize…
The command succeeds when the Cluster Version Operator finishes deploying the OpenShift Container Platform cluster from Kubernetes API server.
INFO Waiting up to 30m0s for the cluster at https://api..:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root//auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps..
INFO Login to the console with user: kubeadmin, password: 3cXGD-Mb9CC-hgAN8-7S9YG
Login using a web browser: http://console-openshift-console.apps..
This article only covers the installation process, for day 2 operations, keep in mind that no storage was configured for the persistent storage workloads, I will cover that process in my next article. As for now, Red Hat Openshift 4 is ready to be explored, the following video helps familiarize with the graphical user interface from the developer perspective:
Key people that collaborated with this article:
Alexandre de Oliveira, Edi Lopes Alves, Alex Souza, Adam Young, Apostolos Dedes (Toly) and Russ Popeil
Filipe Miranda is a Senior Solutions Architect at Red Hat. The views expressed in this article are his alone, and he is responsible for the information provided in the article.
This is the second briefing of the “All Things Data” series of OpenShift Commons briefings. Future briefings are Tuesdays at 8:00am PST, so reach out with any topics you’re interested in and remember to bookmark the OpenShift Commons Briefing calendar!
In this second briefing for the “All Things Data” OpenShift Commons series, Red Hat’s Sagy Volkov gave a live demonstration of an OpenShift workload remaining online and running while Ceph storage updates and additions were being performed. This workload resilience and consistency during storage updates and additions is crucial to maintaining highly available applications in your OpenShift clusters.
If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.
OpenShift 4 was launched not quite a year ago at Red Hat Summit 2019. One of the more significant announcements was the ability for the installer to deploy an OpenShift cluster using full-stack automation. This means that the administrator only needs to provide credentials to a supported Infrastructure-as-a-Service, such as AWS, and the installer would provision all of the resources needed, e.g. virtual machines, storage, networks, and integrating them all together as well.
Over time, the full-stack automation experience has expanded to include Azure, Google Compute Platform, and Red Hat Openstack, allowing customers to deploy OpenShift clusters across different clouds and even on-premises with the same fully automated experience.
For organizations who need enterprise virtualization, but not the API-enabled, quota enforced consumption of infrastructure provided by Red Hat OpenStack, Red Hat Virtualization (RHV) provides a robust and trusted platform to consolidate workloads and provide the resiliency, availability, and manageability of a traditional hypervisor.
When using RHV, OpenShift’s “bare metal” installation experience, where there existed no testing or integration between OpenShift and the underlying infrastructure, has been the solution so far. But, the wait is over! OpenShift 4.4 nightly releases now offer the full-stack automation experience for RHV!
Image may be NSFW. Clik here to view.
Getting started with OpenShift on RHV
As you would expect from the full-stack automation installation experience, getting started is straightforward with just a few prerequisites below. You can also use the quick start guide for more thorough and details instructions.
You need a RHV deployment with RHV Manager. It doesn’t matter if you’re using a self-hosted Manager or standalone, just be sure you’re using RHV version 4.3.7.2 or later.
Until OpenShift 4.4 is generally available, you will need to download and use the nightly release of the OpenShift installer, available from https://cloud.redhat.com.
Network requirements:
DHCP is required for full-stack automated installs to assign IPs to nodes as they are created.
Identify three (3) IP addresses you can statically allocate to the cluster and create two (2) DNS entries, as below. These are used for communicating with the cluster as well as internal DNS and API access.
An IP address for the internal-only OpenShift API endpoint
An IP address for the internal OpenShift DNS, with an external DNS record of api.clustername.basedomain for this address
An IP address for the ingress load balancer, with an external DNS record of *.apps.clustername.basedomain for this address.
Create an ovirt-config.yaml file for the credentials you want to use, this file has just four lines:
For now, the last value, “ovirt_insecure”, should be “True”. As documented in this BZ, even if the RHV-M certificate is trusted by the client where openshift-install is executing from, that doesn’t mean that the pods deployed to OpenShift trust the certificate. We are working on a solution to this, so please keep an eye on the BZ for when it’s been addressed! Remember, this is tech preview :D
With the prerequisites out of the way, let’s move on to deploying OpenShift to Red Hat Virtualization!
Magic (but really automation)!
Starting the install process, as with all OpenShift 4 deployments, uses the openshift-install binary. Once we answer the questions, the process is wholly automated and we don’t have to do anything but wait for it to complete!
# log level debug isn’t necessary, but gives detailed insight to what’s
# happening
# the “dir” parameter tells the installer to use the provided directory
# to store any artifacts related to the installation
[notroot@jumphost ~] openshift-install create cluster --log-level=debug --dir=orv
? SSH Public Key /home/notroot/.ssh/id_rsa.pub
? Platform ovirt
? Select the oVirt cluster Cluster2
? Select the oVirt storage domain nvme
? Select the oVirt network VLAN101
? Enter the internal API Virtual IP 10.0.101.219
? Enter the internal DNS Virtual IP 10.0.101.220
? Enter the ingress IP 10.0.101.221
? Base Domain lab.lan
? Cluster Name orv
? Pull Secret [? for help] **********************
snip snip snip
INFO Waiting up to 30m0s for the cluster at https://api.orv.lab.lan:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/notroot/orv/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.orv.lab.lan
INFO Login to the console with user: kubeadmin, password: passw-wordp-asswo-rdpas
The result, after a few minutes of waiting, is a fully functioning OpenShift cluster, ready for the final configuration to be applied, like deploying logging and monitoring, and configuring a persistent storage provider.
From a RHV perspective, the installer has created a template virtual machine, which was used to deploy all of the member nodes, regardless of role, for the OpenShift cluster. As you saw at the end of the video, not only does the installer use this template, but the Machine API integration also makes use of it when creating new VMs when scaling the nodes. Scaling nodes manually is as easy as one command line (oc scale --replicas=# machineset)!
Image may be NSFW. Clik here to view.
Deploying OpenShift
To get started testing and trying OpenShift full-stack automated deployments to your RHV clusters, the installer can be found from the Red Hat OpenShift Cluster Manager. For now, deploying the full-stack automation experience on RHV is in developer preview, so please send us any feedback and questions you have via BugZilla. The quickest way to reach us is using “OpenShift Container Platform” as the product, with “Installer” as the component and “OpenShift on RHV” for sub-component.
While installing an OpenShift cluster on a cloud isn’t difficult, the old school developer in me wants as much of my environment as possible to be in my house and fully under my control. I have some spare hardware in my basement that I wanted to use as an OpenShift 4 installation, but not enough to warrant a full blown cluster.
CodeReady Containers, or CRC for short, is perfect for that. Rather than try to rephrase what it is, I’ll just copy it directly from their site:
“CodeReady Containers brings a minimal, preconfigured OpenShift 4.1 or newer cluster to your local laptop or desktop computer for development and testing purposes. CodeReady Containers is delivered as a Red Hat Enterprise Linux virtual machine that supports native hypervisors for Linux, macOS, and Windows 10.”
The hiccup is that while my server is in my basement, I don’t want to have to physically sit at the machine to use it. Since CRC is deployed as a virtual machine, I needed a way to get to that VM from any other machine on my home network. This blog talks about how to configure HAProxy on the host machine to allow access to CRC from elsewhere on the network.
I ran the following steps on a CentoOS 8 installation, but they should work on any of the supported Linux distributions. You’ll also need some form of DNS resolution between your client machines and the DNS entries that CRC expects. In my case, I use a Pi-hole installation running on a Raspberry Pi (which effectively uses dnsmasq as described later in this post).
It’ll become obvious very quickly when you read this, but you’ll need sudo access on the CRC host machine.
Running CRC
The latest version of CRC can be downloaded from Red Hat’s site. You’ll need to download two things:
The crc binary itself, which is responsible for the management of the CRC virtual machine
Your pull secret, which is used during creation; save this in a file somewhere on the host machine
This blog isn’t going to go into the details of setting up CRC. Detailed information can be found in the Getting Started Guide in the CRC documentation.
That said, if you’re looking for a TL;DR version of that guide, it boils down to:
crc setup
crc start -p
Make sure CRC is running on the destination machine before continuing, since we’ll need the IP address that the VM is running on.
Configuring the Host Machine
We’ll use firewalld and HAProxy to route the host’s inbound traffic to the CRC instance. Before we can configure that, we’ll need to install a few dependencies:
The CRC host machine needs to allow inbound connections on a variety of ports used by OpenShift. The following commands configure the firewall to open up those ports:
Once the firewall is configured to allow traffic into the server, HAProxy is used to forward it to the CRC instance. Before we can configure that, we’ll need to know the IP of the server itself, as well as the IP of the CRC virtual machine:
Note: If your server is running DHCP, you’ll want to take steps to ensure its IP doesn’t change, either by changing it to run on a static IP or by configuring DHCP reservations. Instructions for how to do that are outside the scope of this blog, but chances are if you’re awesome enough to want to set up a remote CRC instance, you know how to do this already.
We’re going to replace the default haproxy.cfg file, so to be safe, create a backup copy:
cd /etc/haproxy
sudo cp haproxy.cfg haproxy.cfg.orig
Replace the contents of the haproxy.cfg file with the following:
global
debug
defaults
log global
mode http
timeout connect 0
timeout client 0
timeout server 0
frontend apps
bind SERVER_IP:80
bind SERVER_IP:443
option tcplog
mode tcp
default_backend apps
backend apps
mode tcp
balance roundrobin
option ssl-hello-chk
server webserver1 CRC_IP check
frontend api
bind SERVER_IP:6443
option tcplog
mode tcp
default_backend api
backend api
mode tcp
balance roundrobin
option ssl-hello-chk
server webserver1 CRC_IP:6443 check
Note: Generally speaking, setting the timeouts to 0 is a bad idea. In this context, we set those to keep websockets from timing out. Since you are (or rather, “should”) be running CRC in a development environment, this shouldn’t be quite as big of a problem.
You can either manually change the instances of SERVER_IP and CRC_IP as appropriate, or run the following commands to automatically perform the replacements:
sudo sed -i "s/SERVER_IP/$SERVER_IP/g" haproxy.cfg
sudo sed -i "s/CRC_IP/$CRC_IP/g" haproxy.cfg
Once that’s finished, start HAProxy:
sudo systemctl start haproxy
Configuring DNS for Clients
As I said earlier, your client machines will need to be able to resolve the DNS entries used by CRC. This will vary depending on how you handle DNS. One possible option is to use dnsmasq on your client machine.
Before doing that, you’ll need to update NetworkManager to use dnsmasq. This is done by creating a new NetworkManager config file:
Again, you can either manually enter the IP of the host machine or use the following commands to replace it:
sudo sed -i "s/SERVER_IP/$SERVER_IP/g" /etc/NetworkManager/dnsmasq.d/01-crc.conf
Once the changes have been made, restart NetworkManager:
sudo systemctl reload NetworkManager
Accessing CRC
The crc binary provides subcommands for discovering the authentication information to access the CRC instance:
crc console --url
https://console-openshift-console.apps-crc.testing
crc console --credentials
To login as a regular user, run 'oc login -u developer -p developer https://api.crc.testing:6443'.
To login as an admin, run 'oc login -u kubeadmin -p mhk2X-Y8ozE-9icYb-uLCdV https://api.crc.testing:6443'
The URL from the first command will access the web console from any machine with the appropriate DNS resolution configured. The login credentials can be determined from the output of the second command.
To give credit where it is due, much of this information came from this gist by Trevor McKay.
Egress IPs is an OpenShift feature that allows for the assignment of an IP to a namespace (the egress IP) so that all outbound traffic from that namespace appears as if it is originating from that IP address (technically it is NATed with the specified IP).
This feature is useful within many enterprise environments as it allows for the establishment of firewall rules between namespaces and other services outside of the OpenShift cluster. The egress IP becomes the network identity of the namespace and all the applications running in it. Without egress IP, traffic from different namespaces would be indistinguishable because by default outbound traffic is NATed with the IP of the nodes, which are normally shared among projects.
Image may be NSFW. Clik here to view.
To clarify the concept, we can see in this diagram above containing two namespaces (A and B), each running two pods (A1, A2, B1, B2). A is a namespace whose applications can connect to a database in the company’s network. B is not authorized to do so. The A namespace is configured with an egress IP so all the pods outbound connections egress with that IP. A firewall is configured to allow connections from that IP to an enterprise database. The B namespace is not configured with an egress IP so its pods egress via using the node’s IP. Those IPs are not allowed by the firewall to connect to the database.
However, to enable this feature requires some manual steps to be properly configured. Also, when running on cloud providers, additional configuration is needed.
Reasoning about this question with a customer we realized that there was an opportunity to automate the entire process with an operator.
The egressip-ipam-operator
The purpose of the egressip-ipam-operator is to manage the assignment of egressIPs (IPAM) to namespaces and to ensure that the necessary configuration in OpenShift and the underlying infrastructure is consistent.
IPs can be assigned to namespaces via an annotation or the egressip-ipam-operator can select one from a preconfigured CIDR range.
For a bare metal deployment, the configuration would be similar to the example below:
This configuration states that nodes selected by the nodeSelector should be divided in groups based on the topology label and each group will receive egressIPs from the specified CIDR.
In this example, we have only one group which in most cases will be enough for a bare metal configuration. Having multiple groups can occur when nodes are dislocated in multiple subnets, where different CIDRs are needed to make the addresses routable. This is exactly what happens with multi AZs deployments in cloud providers (see more about this below).
Users can opt in to having their namespaces receive egress IPs by adding the following annotation to the namespace:
When this occurs, the namespace is assigned an egress IP per cidrAssignment.
In the case of bare metal, a node is selected by OpenShift to carry that egress IP.
It is also possible for the user to specify which egress IPs a namespace should have. In this case, a second annotation is needed with the following format:
The annotation value is a comma separated array of IPs. There must be exactly one IP per cidrAssignment .
AWS Support
The egress-ipam-operator can also work with Amazon Web Services (AWS). In this case, the operator has additional tasks to perform because it needs to configure the EC2 VM instances to carry the additional IPs. This is due to the fact that like in most cloud providers, the cloud provider needs to control the IPs that are assigned to VMs.
For the AWS use case,the EgressIPAM configuration appears as follows:
Here, we can see multiple cidrAssignments, one per availability zone, in which the cluster is installed. Also, notice that the topologyLabel must be specified as topology.kubernetes.io/zone to identify the availability zone. The CIDRs must be the same as the CIDRs used for the node subnet.
When a project with the opt-in node is created, the following actions occur:
One IP per cidrAssignent is assigned to the namespace
One VM per zone is selected to carry the corresponding IP.
The OpenShift nodes corresponding to the AWS VMs are configured to carry that IP.
Installation
For detailed instructions on how to install the egress-ipam-operator, see the github repository.
Conclusion
Everytime there is an automation opportunity around and about OpenShift, we should consider capturing the automation as an operator and, possibly, also consider open sourcing the resulting operator. In this case, we automated the operations around egress IPs.
Keep in mind that this operator is not officially supported by Red Hat and it is currently managed by the container Community of Practice (CoP) at Red Hat, which will provide best effort support. Feedback and contributions (for example, supporting additional cloud providers) are welcome.
In this briefing, IBM Cloud’s Chris Rosen discusses the logistics of bringing OpenShift to IBM Cloud and walk us thru how to make the most of this new offering from IBM Cloud.
Red Hat OpenShift is now available on IBM Cloud as a fully managed OpenShift service that leverages the enterprise scale and security of IBM Cloud, so you can focus on developing and managing your applications. It’s directly integrated into the same Kubernetes service that maintains 25 billion on-demand forecasts daily at The Weather Company.
Chris Rosen walks us thru how to
Enjoy dashboards with a native OpenShift experience, and push-button integrations with high-value IBM and Red Hat middleware and advanced services.
Rely on continuous availability with multizone clusters across six regions globally.
Move workloads and data more securely with Bring Your Own Key; Level 4 FIPS; and built-in industry compliance including PCI, HIPAA, GDPR, SOC1 and SOC2.
Start fast and small using one-click provisioning and metered billing, with no long-term commitment
To stay abreast of all the latest releases and events, please join the OpenShift Commons and join our mailing lists & slack channel.
What is OpenShift Commons?
Commons builds connections and collaboration across OpenShift communities, projects, and stakeholders. In doing so we’ll enable the success of customers, users, partners, and contributors as we deepen our knowledge and experiences together.
Our goals go beyond code contributions. Commons is a place for companies using OpenShift to accelerate its success and adoption. To do this we’ll act as resources for each other, share best practices and provide a forum for peer-to-peer communication.
Take OKD 4, the Community Distribution of Kubernetes that powers Red Hat OpenShift, for a test drive on your Home Lab.
Craig Robinson at East Carolina University has created an excellent blog explaining how to install OKD 4.4 in your home lab!
What is OKD?
OKD is the upstream community-supported version of the Red Hat OpenShift Container Platform (OCP). OpenShift expands vanilla Kubernetes into an application platform designed for enterprise use at scale. Starting with the release of OpenShift 4, the default operating system is Red Hat CoreOS, which provides an immutable infrastructure and automated updates. OKD’s default operating system is Fedora CoreOS which, like OKD, is the upstream version of Red Hat CoreOS.
Instructions for Deploying OKD 4 Beta on your Home Lab
For those of you who have a Home Lab, check out the step-by-step guide here helps you successfully build an OKD 4.4 cluster at home using VMWare as the example hypervisor, but you can use Hyper-V, libvirt, VirtualBox, bare metal, or other platforms just as easily.
Experience is an excellent way to learn new technologies. Used hardware for a home lab that could run an OKD cluster is relatively inexpensive these days ($250–$350), especially when compared to a cloud-hosted solution costing over $250 per month.
The purpose of this step-by-step guide is to help you successfully build an OKD 4.4 cluster at home that you can take for a test drive. VMWare is the example hypervisor used in this guide, but you could use Hyper-V, libvirt, VirtualBox, bare metal, or other platforms.
This guide assumes you have a virtualization platform, basic knowledge of Linux, and the ability to Google.
Once you’ve gain some experience with OpenShift by using the open source upstream combination of OKD and FCOS (Fedora CoreOS) to build your own cluster on your home lab, be sure to share your feedback and any issues with the OKD-WG on this Beta release of OKD in the OKD Github Repo here:https://github.com/openshift/okd