Quantcast
Viewing all 141 articles
Browse latest View live

Important OpenShift Commons Gathering Amsterdam 2020 Update: Shifts to Digital Conference

Image may be NSFW.
Clik here to view.

We’re turning OpenShift Commons Gathering/Amsterdam into a digital conference rather than a live event.

We’re going to deliver our first ever OpenShift Commons Gathering live online with Q&A, and take the Gatherings to a even wider global audience.

We will still share all of our main stage sessions, including OpenShift 4 and Kubernetes Release Update and Road Map with Clayton Coleman, and all of our engineering project leads will still be delivering their “State of” Deep Dive talks. We’re working to enable our case study speakers and other guest speakers to share their talks as well.

We will provide updates here soon and you can register here for the free virtual event and get notified with further details via email about when you can tune in and how to do so.

If you’ve purchased a ticket to the Gathering and any of the workshops, you will receive a full refund.

The full agenda will still be here.

Thank you for your enthusiasm for and participation in the OpenShift Commons community. We couldn’t do this without the ongoing support of our members, sponsors, speakers and staff.

Please Note: As of March 3, 2020, the KubeCon/Eu in Amsterdam (March 30 – April 2) is still happening. CNCF is regularly updating their site with latest Novel Coronavirus Updates here: https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/novel-coronavirus-update/
for relevant information.

The post Important OpenShift Commons Gathering Amsterdam 2020 Update: Shifts to Digital Conference appeared first on Red Hat OpenShift Blog.


Helm and Operators on OpenShift Part 2

This is the second part of a two-part blog series discussing software deployment options on OpenShift leveraging Helm and Operators. In the previous part we’ve discussed the general differences between the two technologies. In this blog we will specifically the advantages and disadvantages of deploying a Helm chart directly via the helm tooling or via an Operator.

Part II – Helm charts and Helm-based Operators

Users that already have invested in Helm to package their software stack, now have multiples options to deploy this on OpenShift: With the Operator SDK, Helm users have a supported option to build an Operator from their charts and use it to create instances of that chart by leveraging a Custom Resource Definition. With Helm v3 in Tech Preview and helm binaries shipped by Red Hat, users and software maintainers now have the ability to use helm directly on OpenShift cluster.

When comparing Helm charts and Helm-based Operators, in principle the same considerations as outlined in the first part of this series apply. The caveat is that in the beginning, the Helm-based Operator does not possess advanced lifecycle capabilities over the standalone chart itself. There however are still advantages.

With helm-based Operators, a Kubernetes-native interface exists for users on cluster to create helm releases. Using a Custom Resource, an instance of the chart can be created in a namespace and configured through the properties of the resource. The allowed properties and values are the same as the values.yaml of the chart, so users familiar with the chart don’t need to learn anything new. Since internally a Helm-based Operator uses the Helm libraries for rendering, any type of chart type and helm feature is supported. Users of the Operator however don’t necessarily need the helm CLI to be installed but just kubectl to be present in order to create an instance. Consider the following example:

apiVersion: charts.helm.k8s.io/v1alpha1
kind: Cockroachdb
metadata:
  name: example
spec:
  Name: cdb
  Image: cockroachdb/cockroach
  ImageTag: v19.1.3
  Replicas: 3
  MaxUnavailable: 1
  Component: cockroachdb
  InternalGrpcPort: 26257
  ExternalGrpcPort: 26257
  InternalGrpcName: grpc
  ExternalGrpcName: grpc
  InternalHttpPort: 8080
  ExternalHttpPort: 8080
  HttpName: http
  Resources:
    requests:
      cpu: 500m
      memory: 512Mi
  Storage: 10Gi
  StorageClass: null
  CacheSize: 25%
  MaxSQLMemory: 25%
  ClusterDomain: cluster.local
  UpdateStrategy:
    type: RollingUpdate

The Custom Resource Cockroachdb is owned by the CockroachDB operator which has been created using the CockroachDB helm chart. The entire .spec section can essentially be a copy and paste from the values.yaml of the chart. Any value supported by the chart can be used here. Values that have a default are optional.

The Operator will transparently create a release in the same namespace where the Custom Resource is placed. Updates to this object cause the deployed Helm release to be updated automatically. This is in contrast to Helm v3, where this flow originates from the client side and installing and upgrading a release are two distinct commands.

Image may be NSFW.
Clik here to view.

While a Helm-based Operator does not magically extend the lifecycle management capabilities of Helm it does provide a more native Kubernetes experience to end users, which interact with charts like with any other Kubernetes resource.

Everything concerning an instance of a Helm chart is consolidated behind a Custom Resource. As such, access to those can be enforced restricted via standard Kubernetes RBAC, so that only entitled users can deploy Image may be NSFW.
Clik here to view.
certain software, irrespective of their privileges in a certain namespace. Through tools like the Operator Lifecycle Manager, a selection of vetted charts can be presented as a curated catalog of helm-based Operators.

 

 

As the Helm-based Operator constantly applies releases, manual changes to chart resources are automatically rolled back and configuration drift is prevented. This is different to using helm directly, where deleted objects are not detected and modified chart resources are only merged and not rolled back. The latter does not happen until a user runs the helm utility again. Dealing with a Kubernetes Custom Resources however may also present itself as the easier choice in GitOps workflows where only kubectl tooling is present.

When installed through the Operator Lifecycle Manager, a Helm-based Operator can also leverage other Operators services, by expressing a dependency to them. Manifests containing Custom Resources owned by other Operators can simply be made part of the chart. For example, the above manifest creating a CockroachDB instance could be shipped as part of another helm chart, that deploys an application that will write to this database.

When such charts are converted to an Operator as well, OLM will take care of installing the dependency automatically, whereas with Helm this is the responsibility of the user. This is also true for any dependencies expressed on the cluster itself, for example when the chart requires certain API or Kubernetes versions. These may even change over the lifetime of a release. While such out-of-band changes would go unnoticed by Helm itself, OLM constantly ensures that these requirements are fulfilled or clearly signals to the user when they are not.

On the flip side, a new Helm-based Operator has to be created, published to catalog and updated on cluster whenever a new version of the chart becomes available. In order to avoid the same security challenges Tiller had in Helm v2, the Operator should not run with global all-access privileges. Hence, the RBAC of the Operator is usually explicitly constrained by the maintainer according to the least-privilege principle.

The SDK attempts to generate the Operator’s RBAC rules automatically during conversion from a chart but manual tweaks might be required. The conversion process at a high level looks like this:

Image may be NSFW.
Clik here to view.

Restricted RBAC now applies to Helm v3: chart maintainers need to document the required RBAC for the chart to be deployed since it can no longer be assumed that cluster-admin privileges exist through Tiller. 

Quite recently the Operator-SDK moved to Helm v3. This is a transparent change for both users and chart maintainers. The SDK will automatically convert existing v2 releases to v3 once an updated Operator is installed.

In summary: end users that have an existing Helm charts at hand can now deploy on OpenShift using helm tooling, assuming they have enough permissions. Software maintainers can ship their Helm charts unchanged now to OpenShift users as well. 

Using the Operator SDK they get more control over the user and admin experience by converting their chart to an Operator. While the resulting Operator eventually deploys the chart in the same way the Helm binary would, it plays along very well into with the rest of the cluster interaction using just kubectl, Kubernetes APIs and proper RBAC, which also drives GitOps workflows. On top of that there is transparent updating of installed releases and constant remediation of configuration drift. Clusters 

Helm-based Operators also integrate well with other Operators through the use of OLM and its dependency model, avoiding re-inventing how certain software is deployed. Finally for ISVs, Helm-based Operators present an easy entry into the Operator ecosystem without any change required to the chart itself.

The post Helm and Operators on OpenShift Part 2 appeared first on Red Hat OpenShift Blog.

Self-hosted Load Balancer for OpenShift: an Operator Based Approach

Introduction

Some time ago, I published an article about the idea of self-hosting a load balancer within OpenShift to meet the various requirements for ingress traffic (master, routers, load balancer services). Since then, not much has changed with regards to the load balancing requirements for OpenShift. However, in the meantime, the concept of operators, as an approach to capture automated behavior within a cluster, has emerged. The release of OpenShift 4 fully embraces this new operator-first mentality.

Prompted by the needs of a customer, additional research on this topic was performed on the viability of deploying a self-hosted load balancer via an operator.

The requirement is relatively simple: an operator watches for the creation of services of type LoadBalancer and provides load balancing capabilities by allocating a load balancer in the same cluster for which the service is defined.

Image may be NSFW.
Clik here to view.

In the diagram above, an application is deployed with a LoadBalancer type of service. The hypothetical self-hosted load balancer operator is watching for those kinds of services and will react by instructing a set of daemons to expose the needed IP in an HA manner (creating effectively a Virtual IP [VIP]). Inbound connections to that VIP will be load balanced to the pods of our applications.

In OpenShift 4, by default, the router instances are fronted by a LoadBalancer type of service, so this approach would also be applicable to the routers.

In Kubernetes, a cloud provider plugin is normally in charge of implementing the load balancing capability of LoadBalancer services, by allocating a cloud-based load balancing solution. Such an operator as described previously would enable the ability to use LoadBalancer services in those deployments where a cloud provider is not available (e.g. bare metal).

Metallb

Metallb is a fantastic bare metal-targeted operator for powering LoadBalancer types of services. 

It can work in two modes: Layer 2 and Border Gateway Protocol (BGP) mode.

In layer 2 mode, one of the nodes advertises the load balanced IP (VIP) via either the ARP (IPv4) or NDP (IPv6) protocol. This mode has several limitations: first, given a VIP, all the traffic for that VIP goes through a single node potentially limiting the bandwidth. The second limitation is a potentially very slow failover. In fact, Metallb relies on the Kubernetes control plane to detect the fact that a node is down before taking the action of moving the VIPs that were allocated to that node to other healthy nodes. Detecting unhealthy nodes is a notoriously slow operation in Kubernetes which can take several minutes (5-10 minutes, which can be decreased with the node-problem-detector DaemonSet).

In BGP mode, Metallb advertises the VIP to BGP-compliant network routers providing potentially multiple paths to route packets destined to that VIP. This greatly increases the bandwidth available for each VIP, but requires the ability to integrate Metallb with the router of the network in which it is deployed. 

Based on my tests and conversations with the author, I found that the layer 2 mode of Metallb is not a practical solution for production scenarios as it is typically not acceptable to have failover-induced downtimes in the order of minutes. At the same time, I have found that the BGP mode instead would much better suit production scenarios, especially those that require very large throughput.

Back to the customer use case that spurred this research. They were not allowed to integrate with the network routers at the BGP level, and it was not acceptable to have a failover downtime of the order of minutes. 

What we needed was a VIP managed with the VRRP protocol, so that it could failover in a matter for milliseconds. This approach can easily be accomplished by configuring the keepalived service on a normal RHEL machine. For OpenShift, Red Hat has provided a supported container called ose-keepalived-ipfailover with keepalived functionality. Given all of these considerations, I decided to write an operator to orchestrate the creation of ipfailover pods.

Keepalived Operator

The keepalived operator works closely with OpenShift to enable self-servicing of two features: LoadBalancer and ExternalIP services.

It is possible to configure OpenShift to serve IPs for LoadBalancer services from a given CIDR in the absence of a cloud provider. As a prerequisite, OpenShift expects a network administrator to manage how traffic destined to those IPs reaches one of the nodes. Once reaching a node, OpenShift will make sure traffic is load balanced to one of the pods selected by that given service.

Similarly for ExternalIPs, additional configurations must be provided to specify the CIDRs range users are allowed to pick ExternalIPs from. Once again, a network administrator must configure the network to send traffic destined to those IPs to one of the OpenShift nodes.

The keepalived operator plays the role of the network administrator by automating the network configuration prerequisites.

Image may be NSFW.
Clik here to view.

When LoadBalancer services or services with ExternalIPs are created, the Keeplived operator will allocate the needed VIPs on a portion of the nodes by adding additional IPs on the node’s NICs. This will draw the traffic for those VIPs to the selected nodes.

VIPs are managed by a cluster of ipfailover pods via the VRRP protocol, so in case of a node failure, the failover of the VIP is relatively quick (in the order of hundreds of milliseconds).

Installation

To install the Keepalived operator in your own environment, consult the documentation within the GitHub repository.

Conclusions

The objective of this article was to provide an overview of options for self-hosted load balancers that can be implemented within OpenShift. This functionality may be required in those scenarios where a cloud provider is not available and there is a desire to enable self-servicing capability for inbound load balancers.

Neither of the examined approaches allows for the definition of a self-hosted load balancer for the master API endpoint. This remains an open challenge especially with the new OpenShift 4 installer. I would be interested in seeing potential solutions in this space.

The post Self-hosted Load Balancer for OpenShift: an Operator Based Approach appeared first on Red Hat OpenShift Blog.

What makes a good Operator?

In 2016, CoreOS coined the term, Operator. They started a movement about a whole new type of managed application that achieves automated Day-2 operations with a user-experience that feels native to Kubernetes.

Since then, the extensions mechanisms that underpin the Operator pattern, have evolved significantly. Custom Resource Definitions, an integral part of any Operator, became stable, got validation and a versioning feature that includes conversion. Also, the experience the Kubernetes community gained when writing and running Operators accumulated critical mass. If you’ve attended any KubeCon in the past 2 years, you will have noticed the increased coverage and countless sessions focusing on Operators.

The popularity that Operators enjoy, is based on the possibility to achieve a cloud-like service experience for almost any workload available wherever your cluster runs. Thus, Operators are striving to be the world’s best provider of their workload as-a-service.

But what actually does make for a good Operator? Certainly the user experience is an important pillar, but it is mostly defined through the interaction between the cluster user running kubectl and the Custom Resources that are defined by the Operator. 

This is possible with Operators being extensions of the Kubernetes control plane. As such, they are global entities that run on your cluster for a potentially very long time, often with wide privileges. This has some implications that require forethought.

For this kind of application, best practices have evolved to mitigate potential issues, security risks, or simply to make the Operator more maintainable in the future. The Operator Framework Community has published a collection of these practices: https://github.com/operator-framework/community-operators/blob/master/docs/best-practices.md

They are covering recommendations concerning the design of an Operator as well as behavioral best practices that come into play at runtime. They reflect a culmination of experience from the Kubernetes community writing Operators for a broad range of use cases. In particular, the observations the Operator Framework community made, when developing tooling for writing and lifecycling Operators.

Some highlights include the following development practices:

  • One Operator per managed application
  • Multiple operators should be used for complex, multi-tier application stacks
  • CRD can only be owned by a single Operator, shared CRDs should be owned by a separate Operator
  • One controller per custom resource definition

As well as many others.

With regard to best practices around runtime behavior, it’s noteworthy to point out these:

  • Do not self-register CRDs
  • Be capable of updating from a previous version of the Operator
  • Be capable of managing an Operand from an older Operator version
  • Use CRD conversion (webhooks) if you change API/CRDs

There are additional runtime practices (please, don’t run as root) in the document worth reading.

This list, being a community effort, is of course open to contributions and suggestions. Maybe you are planning to write an Operator in the near future and are wondering how a certain problem would be best solved using this pattern? Or you recently wrote an Operator and want to share some of your own learnings as your users started to adopt this tool? Let us know via GitHub issues or file a PR with your suggestions and improvements. Finally, if you want to publish your Operator or use an existing one, check out OperatorHub.io.

The post What makes a good Operator? appeared first on Red Hat OpenShift Blog.

OpenShift Scale: Running 500 Pods Per Node

The Basics

A common request from OpenShift users has long been to raise the number of pods per node. OpenShift has set the limit to 250 starting with the first Kubernetes-based release (3.0) through 4.2, but with very powerful nodes, it can support many more than that.

This blog describes the work we did to achieve 500 pods per node, starting from initial testing, bug fixes and other changes we needed to make, the testing we performed to verify function, and what you need to do if you’d like to try this.

Background

Computer systems have continued unabated their relentless progress in computation power, memory, storage capacity, and I/O bandwidth. Systems that not long ago were exotic supercomputers are now dwarfed in their capability (if not physical size and power consumption) by very modest servers. Not surprisingly, one of the most frequent questions we’ve received from customers over the years is “can we run more than 250 pods (the — until now — tested maximum) per node?”. Today we’re happy to announce that the answer is yes!

In this blog, I’m going to discuss the changes to OpenShift, the testing process to verify our ability to run much larger numbers of pods, and what you need to do if you want to increase your pod density.

Goals

Our goal with this project was to run 500 pods per node on a cluster with a reasonably large number of nodes. We also considered it important that these pods actually do something; pausepods, while convenient for testing, aren’t a workload that most people are interested in running on their clusters. At the same time, we recognized that the incredible variety of workloads in the rich OpenShift ecosystem would be impractical to model, so we wanted a simple workload that’s easy to understand and measure. We’ll discuss this workload below, which you can clone and experiment with.

Initial Testing

Early experiments on OpenShift 4.2 identified issues with communication between the control plane (in particular, the kube-apiserver) and the kubelet when attempting to run nodes with a large number of pods. Using a client/server builder application replicated to produce the desired number of pods, we observed that the apiserver was not getting timely updates from the kubelet when pods came into existence, resulting in problems such as networking not coming up for pods and the pods (when they required networking) failing as a result.

Our test was to run many replicas of the application to reproduce the requisite number of pods. We observed that up to about 380 pods per node, the applications would start running normally. Beyond that, we would see some pods remain in Pending state, and some start but terminate. Pods terminating that are expected to run do so because the pod itself decides to terminate. There were no messages in the logs identifying particular problems; the pods appeared to be starting up correctly, but the code within the pods was failing, resulting in the pods terminating. Studying the application, the most likely reason the pods would terminate was that the client pod would be unable to connect to the server, indicating that it did not have a network available.

As an aside, we observed that the kubelet declared the pods to be Running very quickly; the delay was in the apiserver realizing this. Again, there were no log messages in either the kubelet or the apiserver logs indicating any issue. The network team requested that we collect logs from the openshift-sdn that manages pod networking; that too showed nothing out of the ordinary. Indeed, even using host networking didn’t help.

To simplify the test, we wrote a much simpler client/server deployment, where the client would simply attempt to connect to the server until it succeeded rather than failing, using only two nodes. The client pods logged the number of connection attempts made and the elapsed time before success. We ran 500 replicas of this deployment, and found that up to about 450 pods total (225 per node), the pods started up and quickly went into Running state. Between 450 and 620, the rate of pods transitioning to Running state slowed down, and actually stalled out for about 10 minutes, after which the backlog cleared at a rate of about 3 pods/minute until eventually (after a few more hours) all of the pods were running. This supported the hypothesis that there was nothing really wrong with the kubelet; the client pods were able to start running, but most likely timed out connecting to the server, and did not retry.

On the hypothesis that the issue was rate of pod creation, we tried adding sleep 30 between creating each client-server pair. This staved off the point at which pod creation slowed down to about 375 pods/node, but eventually the same problem happened. We tried another experiment placing all of the pods within one namespace, which succeeded — all of the pods quickly started and ran correctly. As a final experiment, we used pause pods (which do not use the network) with separate namespaces, and hit the same problem, starting at around 450 pods (225/node). So clearly this was a function of the number of namespaces, not the number of pods; we had established that it was possible to run 500 pods per node, but without being able to use multiple namespaces, we couldn’t declare success.

Fixing the problem

By this point, it was quite clear that the issue was that the kubelet was unable to communicate at a fast enough rate with the apiserver. When that happens, the most obvious issue is the kubelet throttling transmission to the apiserver per the kubeAPIQPS and kubeAPIBurst kubelet parameters. These are enforced by Go rate limiting. The defaults that we inherit from upstream Kubernetes are 5 and 10, respectively. This allows the kubelet to send at most 5 queries to the apiserver per second, with a short-term burst rate of 10. It’s easy to see how under a heavy load that the kubelet may need a greater bandwidth to the apiserver. In particular, each namespace requires a certain number of secrets, which have to be retrieved from the apiserver via queries, eating into those limits. Additional user-defined secrets and configmaps only increase the pressure on this limit.

The throttling is used in order to protect the apiserver from inadvertent overload by the kubelet, but this mechanism is a very broad brush. However, rewriting it would be a major architectural change that we didn’t consider to be warranted. Therefore, the goal was to identify the lowest safe settings for KubeAPIQPS and KubeAPIBurst.

Experimenting with different settings, we found that the setting QPS/burst to 25/50 worked fine for 2000 pods on 3 nodes with a reasonable number of secrets and configmaps, but 15/30 didn’t.

The difficulty in tracking this down is that there’s nothing in either the logs or Prometheus metrics identifying this. Throttling is reported by the kubelet at verbosity 4 (v=4 in the kubelet arguments), but the default verbosity, both upstream and within OpenShift, is 3. We didn’t want to change this globally. Throttling had been seen as a temporary, harmless condition, hence its being relegated to a low verbosity level. However, with our experiments frequently showing throttling of 30 seconds or more, and this leading to pod failures, it clearly was not harmless. Therefore, I opened https://github.com/kubernetes/kubernetes/pull/80649, which eventually merged, and then pulled it into OpenShift in time for OpenShift 4.3. While this alone would not solve throttling, it greatly simplifies diagnosis. Adding throttling metrics to Prometheus would be desirable, but that is a longer-term project.

The next question was what to set the kubeAPIQPS and kubeAPIBurst values to. It was clear that 5/10 wouldn’t be suitable for larger numbers of pods. We decided that we wanted some safety margin above the tested 25/50, hence settled on 50/100 following node scaling testing on OpenShift 4.2 with these parameters set.

Another piece of the puzzle was the watch-based configmap and secret manager for the kubelet. This allows the kubelet to set watches on secrets and configmaps supplied by the apiserver, which in the case of items that don’t change very often, are much more efficient for the apiserver to handle, as it caches the watched objects locally. This change, which didn’t make OpenShift 4.2, would enable the apiserver to handle a heavier load of secrets and configmaps, easing the potential burden of the higher burst/QPS values. If you’re interested in the details of the change, in Go 1.12, the details are here, under net/http

To summarize, we made the following changes between OpenShift 4.2 and 4.3 to set the stage for scaling up the number of pods:

  • Change the default kubeAPIQPS from 5 to 50.
  • Change the default kubeAPIBurst from 10 to 100.
  • Change the default configMapAndSecretChangeDetectionStrategy from Cache to Watch.

Testing 500 pods/node

The stage was now set to actually test 500 pods/node as part of OpenShift 4.3 scaling testing. The questions we had to decide were:

  • What hardware do we want to use?
  • What OpenShift configuration changes would be needed?
  • How many nodes do we want to test?
  • What kind of workload do we want to run?

Hardware

A lot of pods, particularly with many namespaces, can put considerable stress on the control plane and the monitoring infrastructure. Therefore, we deemed it essential to use large nodes for the control plane and monitoring infrastructure. As we expected the monitoring database to have very large memory requirements, we placed (as is our standard practice) the monitoring stack on a separate set of infrastructure nodes rather than sharing that with the worker nodes. We settled on the following, using AWS as our underlying platform:

  • Master Nodes

    The master nodes were r5.4xlarge instances. r5 instances are memory-optimized, to allow for large apiserver and etcd processes. The instance type consists of:

    • CPU: 16 cores, Intel Xeon Platinum 3175
    • Memory: 128 GB
    • Storage: EBS (no local storage), 4.75 Gbps
    • Network: up to 10 Gbps.
  • Infrastructure Nodes

    The infrastructure nodes were m5.12xlarge instances. m5 instances are general purpose. The instance type consists of:

    • CPU: 48 cores, Intel Xeon Platinum 8175
    • Memory: 192 GB
    • Storage: EBS (no local storage), up to 9.5 Gbps
    • Network: 10 Gbps
  • Worker Nodes

    The worker nodes were m5.2xlarge. This allows us to run quite a few reasonably simple pods, but typical application workloads would be heavier (and customers are interested in very big nodes!). The instance type consists of:

    • CPU: 8 cores, Intel Xeon Platinum 8175
    • Memory: 16 GB
    • Storage: EBS (no local storage), 4.75 Gbps
    • Network: up to 10 Gbps

Configuration Changes

The OpenShift default for maximum pods per node is 250. Worker nodes have to contain parts of the control infrastructure in addition to user pods; there are about 10 such control pods per node. Therefore, to ensure that we could definitely achieve 500 worker pods per node, we elected to set maxPods to 520 using a custom KubeletConfig using the procedure described here

% oc label --overwrite machineconfigpool worker custom-kubelet=large-pods
% oc apply -f - <<’EOF’
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: "set-max-pods"
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: large-pods
  kubeletConfig:
    maxPods: 520
EOF

This requires an additional configuration change. Every pod on a node requires a distinct IP address allocated out of the host IP range. By default, when creating a cluster, the hostPrefix is set to 23 (i. e. a /23 net), allowing for up to 510 addresses — not quite enough. So clearly we had to set hostPrefix to 22 for this test in the install-config.yaml used to install the cluster.

In the end, no other configuration changes from stock 4.3 were needed. Note that if you want to run 500 pods per node, you’ll need to make these two changes yourself, as we did not change the defaults.

How many nodes do we want to test?

This is a function of how many large nodes we believe customers will want to run in a cluster. We settled on 100 for this test.

What kind of workload do we want to run?

Picking the workload to run is a matter of striking a balance between the test doing something interesting and being easy to set up and run. We settled on a simple client-server workload in which the client sends blocks of data to the server which the server returns, all at a pre-defined rate. We elected to start at 25 nodes, then follow up with 50 and 100, and use varying numbers of namespaces and pods per namespace. Large numbers of namespaces typically stress the control plane more than the worker nodes, but with a larger number of pods per worker node, we didn’t want to discount the possibility that that would impact the test results.

Test Results

We used ClusterBuster to generate the necessary namespaces and deployments for this test to run.

ClusterBuster is a simple tool that I wrote to generate a specified number of namespaces, and secrets and deployments within those namespaces. There are two main types of deployments that this tool generates: pausepod, and client-server data exchange. Each namespace can have a specified number of deployments, each of which can have a defined number of replicas. The client-server can additionally specify multiple client containers per pod, but we didn’t use this feature. The tool uses oc create and oc apply to create objects; we created 5 objects per oc apply, running two processes concurrently. This allows the test to proceed more quickly, but we’ve found that it also creates more stress on the cluster. ClusterBuster labels all objects it creates with a known label that makes it easy to clean up everything with

oc delete ns -l clusterbuster

In client-server mode, the clients can be configured to exchange data at a fixed rate for either a fixed number of bytes or for a fixed amount of time. We used both here in different tests.

We ran tests on 25, 50, and 100 nodes, all of which were successful; the “highest” test (i. e. greatest number of namespaces) in each sequence was:

  • 25 node pausepod: 12500 namespaces each containing one pod.
  • 25 node client-server: 2500 namespaces each containing one client-server deployment consisting of four replica client pods and one server (5 pods/deployment). Data exchange was at 100 KB/sec (in each direction) per client, total 10 MB in each direction per client.

  • 50 node pausepod: 12500 namespaces * 2 pods.

  • 50 node client-server: 5000 namespaces, one deployment with 4 clients + server, 100 KB/sec, 10 MB total.

  • 100 node client-server: 5000 namespaces, one deployment with 9 clients + server, 100 KB/sec for 28800 seconds. In addition, we created and mounted 10 secrets per namespace.

Test Data

I’m going to cover what we found doing the 100 node test here, as we didn’t observe anything during the smaller tests that was markedly different (scaled appropriately).

We collected a variety of data during the test runs, including Prometheus metrics and another utility from my OpenShift 4 tools package (monitor-pod-status), with Grafana dashboards to monitor cluster activity. monitor-pod-status strictly speaking duplicates what we can get from Grafana, but it’s in an easy to read textual format. Finally, I used yet another tool clusterbuster-connstat to retrieve log data left by the client pods to analyze the rate of data flow.

Test Timings

The time required to create and tear down the test infrastructure is a measure of how fast the API and nodes can perform operations. This test was run with a relatively low parallelism factor, and operations didn’t lag significantly.

Operation Approximate Time (minutes)
Create namespaces 4
Create secrets 43
Create deployments 34
Exchange data 480
Delete pods and namespaces 29

One interesting observation during the pod creation time is that pods were being created at about 1600/minute, and at any given time, there were about 270 pods in ContainerCreating state. This indicates that the process of pod creation took about 10 seconds per pod throughout the run.

Networking

The expected total rate of data exchange is 2 * Nclients * XferRate. In this case, of the 50,000 total pods, 45,000 were clients. At .1 MB/sec, this would yield an expected aggregate throughput of 9000 MB/sec (72,000 Mb/sec). The aggregate expected transfer rate per node would therefore be expected to be 720 Mb/sec, but as we’d expect on average about 1% of the clients to be colocated with the server, the actual average network traffic would be slightly less. In addition, we’d expect variation due to the number of server pods that happened to be located per node; in the configuration we used, each server pod handles 9x the data each client pod handles.

I inspected 5 nodes at random; each node showed a transfer rate during the steady state data transfer of between 650 and 780 Mbit/sec, with no noticeable peaks or valleys, which is as expected. This is nowhere near the 10 Gbps limit of the worker nodes we used, but the goal of this test was not to stress the network.

Quis custodiet ipsos custodes?

With apologies to linguistic purists, the few events we observed were related to Prometheus. During the tests, one of the Prometheus replicas typically used about 130 Gbytes of RAM, but a few times the memory usage spiked toward 300 Gbytes before ramping down over a period of several hours. In two cases, Prometheus crashed; while we don’t have records of why, we believe it likely that it ran out of memory. The high resource consumption of Prometheus reinforces the importance of robust monitoring infrastructure nodes!

Future Work

We have barely scratched the surface of pod density scaling with this investigation. There are many other things we want to look at, over time:

  • Even more pods: as systems grow even more powerful, we can look at even greater pod densities.
  • Adding CPU and memory requests: investigate the interaction between CPU/memory requests and large numbers of pods.

  • Investigate the interaction with other API objects: raw pods per node is only part of what stresses the control plane and worker nodes. Our synthetic test was very simple, and real-world applications will do a lot more. There are a lot of other dimensions we can investigate:

    • Number of configmaps/secrets: very large numbers of these objects in combination with many pods can stress the QPS to the apiserver, in addition to the runtime and the Linux kernel (as each of these objects must be mounted as a filesystem into the pods).
    • Many containers per pod: this stresses the container runtime.
    • Probes: these likewise could stress the container runtime.
  • More workloads: the synthetic workload we used is easy to analyze, but is hardly representative of every use people make of OpenShift. What would you like to see us focus on? Leave a comment with your suggestions.
  • More nodes: 100 nodes is a starting point, but we’re surely going to want to go higher. We’d also like to determine whether there’s a curve for maximum number of pods per node vs. number of nodes.

  • Bare metal: typical users of large nodes are running on bare metal, not virtual instances in the cloud.

Credits

I’d like to thank the many members of the OpenShift Perf/Scale, Node, and QE teams who worked with me on this, including (in alphabetical order) Ashish Kamra, Joe Talerico, Ravi Elluri, Ryan Phillips, Seth Jennings, and Walid Abouhamad.

The post OpenShift Scale: Running 500 Pods Per Node appeared first on Red Hat OpenShift Blog.

Digital Transformation in Italy, Powered by OpenShift

Image may be NSFW.
Clik here to view.

Recently in Milan, Red Hat presented an Italian edition of the OpenShift Commons gathering. The event brings together experts from all over the world to discuss open source projects that support the OpenShift and Kubernetes ecosystem as well as to explore best practice for native-cloud application development and getting business value from container technologies at scale. Presenting in Milan were three organizations leading the way: SIA, Poste Italiene and Amadaus.*

Amadeus’ OpenShift infrastructure

Amadeus Software Engineer Salvatore Dario Minonne spoke of the five-year relationship between Red Hat and Amadeus. “In the fall of 2014 we got to know the Red Hat engineering team in Raleigh in the United States. Our teams got their hands on the first versions of OpenShift and started a fruitful collaboration with Red Hat that has become a true engineering partnership. We continue to contribute our use cases to the community, to help drive open source innovation that meets our real-world needs,” said Minonne.

“Not all Amadeus applications are in the cloud,” added Minonne, underlining that their infrastructure is a hybrid of public and private cloud, and there is a careful consideration when migrating workloads to the cloud. 

“At Amadeus,” said Minonne, “We are looking closely into multicloud, not just to avoid vendor lock-in, but also to mitigate the risks of impact if something goes wrong with a provider, and to give us the ability to spin down a particular cluster if it is buggy or there is a security issue.” 

Minonne talked about the change in mindset required with a move to hybrid cloud. “Software development and management practices must also change, to mitigate compatibility issues that might occur with applications not originally designed for the cloud. In fact, many Kubernetes resources have been created precisely to reduce these incompatibilities.”

Poste Italiane

Pierluigi Sforza and Paolo Gigante, Senior Solutions Architects working in Poste Italiane’s IT Technological Architecture Group, spoke to the OpenShift Milan Commons audience about how Poste Italiane has accelerated its digital transformation efforts in the last year.

Sforza emphasised how they are embracing a DevOps philosophy along with their increased use of open source, which has involved building a closer relationship with Red Hat. Gigante added that the rise in open source at Poste Italiene “reflects the current technology landscape, where rapidly evolving competition, increased digitalization and changing customer expectations require faster time to market, which is one area where proprietary technologies from traditional vendors often fall short.” 

Sforza added that, “the need for agility and speed of delivery sometimes necessitates taking a risk in trying less mature technologies, starting by experimenting with the open source community and then relying on trusted vendors, such as Red Hat, to have the levels of security and stability needed to go into production.”

Poste Italiane has been adapting its legacy infrastructures and processes to the new world of DevOps and containerization. This laid the foundation for new projects, such as an adaptation it has made to its financial platform in line with the PSD2 directive. “With OpenShift, we were able to create a reliable, high performance platform perfectly adapted to our needs, in order to meet another of our major business goals: to be at the forefront of innovation,” said Sforza.

The organization’s infrastructure modernization implicates the migration of some workloads off the mainframe. Sforza explained: “Where it makes sense, we are aiming to move monolithic workloads to a containerized infrastructure using microservices, which is more cost effective and gives us greater scalability. This will help us manage applications more efficiently and provide a more slick end-user experience, especially given the rise in customers using our digital channels.”

SIA

SIA is headquartered in Milan and operates in 50 countries. SIA is a European leader in the design, construction and management of technology infrastructures and services for financial institutions, central banks, public companies and government entities, focusing on payments, e-money, network services and capital markets.

Nicola Nicolotti, a Senior System Administrator at SIA, explained how they are supporting customers with the move to containers: “the traditional waterfall approach is often not compatible with the adoption of new technologies, which require a deeper level of integration. However, many traditionally structured organizations face multiple difficulties when adopting new technologies and putting changes into practice, so we aim to help them understand what those challenges might be as well as the corresponding solutions that can help them meet their business objectives.”

Matteo Combi, SIA Solution Architect, emphasised the importance of collaboration when working with the open source community – not just via software development. “When we participated in Red Hat Summit in Boston, we recognised the value in sharing diverse experiences at an international level. Being able to compare different scenarios enables us to develop new ideas to improve our use of the technology itself as well as how it can be applied to meet our business goals.”

Learn more about our customer stories at Red Hat Summit Virtual Experience.

* Customer insights in this post originally appeared in Italian as part of a special feature in ImpresaCity magazine, issue #33, October 2019, available to read here

The post Digital Transformation in Italy, Powered by OpenShift appeared first on Red Hat OpenShift Blog.

Open Container Storage 4.2 in Open Container Platform 4.2.14 – UPI Installation in Red Hat Virtualization

OCS 4.2 in OCP 4.2.14 – UPI installation in RHV

When OCS 4.2 GA was released, I was thrilled to finally test and deploy it in my lab. I read the documentation and saw that only vSphere and AWS installations are currently supported. My lab is installed in an RHV environment following the UPI Bare Metal documentation so, in the beginning, I was a bit disappointed. I realized that it could be an interesting challenge to find a different way to use it and, well, I found it during my day by day late night fun. All the following procedures are unsupported.

Prerequisites

  • An OCP 4.2.x cluster installed (the current latest version is 4.2.14)
  • The possibility to create new local disks inside the VMs (if you are using a virtualized environment) or servers with disks that can be used

Issues

The official OCS 4.2 installation in vSphere requires a minimum of 3 nodes which use 2TB volume each (a PVC using the default “thin” storage class) for the OSD volumes + 10GB for each mon POD (3 in total using always a PVC). It also requires 16 CPU and 64GB RAM for node.

Use case scenario

  • bare-metal installations
  • vSphere cluster
    • without a shared datastore
    • you don’t want to use the vSphere dynamic provisioner
    • without enough space in the datastore
    • without enough RAM or CPU
  • other virtualized installation (for example RHV which is the one used for this article)

Challenges

  • create a PVC using local disks
  • change the default 2TB volumes size
  • define a different StorageClass (without using a default one) for the mon PODs and the OSD volumes
  • define different limits and requests per component

Solutions

  • use the local storage operator
  • create the ocs-storagecluster resource using a YAML file instead of the new interface. That means also add the labels to the worker nodes that are going to be used by OCS

Procedures

Add the disks in the VMs. Add 2 disks for each node. 10GB disk for mon POD and 100GB disk for the OSD volume
Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Repeat for the other 2 nodes

The disks MUST be in the same order and have the same device name in all the nodes. For example, /dev/sdb MUST be the 10GB disk and /dev/sdc the 100GB disk in all the nodes.

[root@utility ~]# for i in {1..3} ; do ssh core@worker-${i}.ocp42.ssa.mbu.labs.redhat.com lsblk | egrep "^sdb.*|sdc.*$" ; done
sdb      8:16   0   10G  0 disk
sdc      8:32   0  100G  0 disk
sdb      8:16   0   10G  0 disk
sdc      8:32   0  100G  0 disk
sdb      8:16   0   10G  0 disk
sdc      8:32   0  100G  0 disk
[root@utility ~]#

Install the Local Storage Operator. Here the official documentation

Create the namespace

[root@utility ~]# oc new-project local-storage‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Then install the operator from the OperatorHub
Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Wait for the operator POD up&running

[root@utility ~]# oc get pod -n local-storage
NAME                                     READY   STATUS    RESTARTS   AGE
local-storage-operator-ccbb59b45-nn7ww   1/1     Running   0          57s
[root@utility ~]#

The Local Storage Operator works using the devices as reference. The LocalVolume resource scans the nodes which match the selector and creates a StorageClass for the device.

Do not use different StorageClass names for the same device.

We need the Filesystem type for these volumes. Prepare the LocalVolume YAML file to create the resource for the mon PODs which use /dev/sdb

[root@utility ~]# cat &lt;&lt;EOF &gt; local-storage-filesystem.yaml
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks-fs"
  namespace: "local-storage"
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-1.ocp42.ssa.mbu.labs.redhat.com
          - worker-2.ocp42.ssa.mbu.labs.redhat.com
          - worker-3.ocp42.ssa.mbu.labs.redhat.com
  storageClassDevices:
    - storageClassName: "local-sc"
      volumeMode: Filesystem
      devicePaths:
        - /dev/sdb
EOF

Then create the resource

[root@utility ~]# oc create -f local-storage-filesystem.yaml
localvolume.local.storage.openshift.io/local-disks-fs created
[root@utility ~]#‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Check if all the PODs are up&running and if the StorageClass and the PVs exist

[root@utility ~]# oc get pod -n local-storage
NAME                                     READY   STATUS    RESTARTS   AGE
local-disks-fs-local-diskmaker-2bqw4     1/1     Running   0          106s
local-disks-fs-local-diskmaker-8w9rz     1/1     Running   0          106s
local-disks-fs-local-diskmaker-khhm5     1/1     Running   0          106s
local-disks-fs-local-provisioner-g5dgv   1/1     Running   0          106s
local-disks-fs-local-provisioner-hkj69   1/1     Running   0          106s
local-disks-fs-local-provisioner-vhpj8   1/1     Running   0          106s
local-storage-operator-ccbb59b45-nn7ww   1/1     Running   0          15m
[root@utility ~]# oc get sc
NAME       PROVISIONER                    AGE
local-sc   kubernetes.io/no-provisioner   109s
[root@utility ~]# oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
local-pv-68faed78   10Gi       RWO            Delete           Available           local-sc                84s
local-pv-780afdd6   10Gi       RWO            Delete           Available           local-sc                83s
local-pv-b640422f   10Gi       RWO            Delete           Available           local-sc                9s
[root@utility ~]#

The PVs were created.

Prepare the LocalVolume YAML file to create the resource for the OSD volumes which use /dev/sdc

We need the Block type for these volumes.

[root@utility ~]# cat &lt;&lt;EOF &gt; local-storage-block.yaml
apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "local-disks"
  namespace: "local-storage"
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-1.ocp42.ssa.mbu.labs.redhat.com
          - worker-2.ocp42.ssa.mbu.labs.redhat.com
          - worker-3.ocp42.ssa.mbu.labs.redhat.com
  storageClassDevices:
    - storageClassName: "localblock-sc"
      volumeMode: Block
      devicePaths:
        - /dev/sdc
EOF

Then create the resource

[root@utility ~]# oc create -f local-storage-block.yaml
localvolume.local.storage.openshift.io/local-disks created
[root@utility ~]#

Check if all the PODs are up&running and if the StorageClass and the PVs exist

[root@utility ~]# oc get pod -n local-storage
NAME                                     READY   STATUS    RESTARTS   AGE
local-disks-fs-local-diskmaker-2bqw4     1/1     Running   0          6m33s
local-disks-fs-local-diskmaker-8w9rz     1/1     Running   0          6m33s
local-disks-fs-local-diskmaker-khhm5     1/1     Running   0          6m33s
local-disks-fs-local-provisioner-g5dgv   1/1     Running   0          6m33s
local-disks-fs-local-provisioner-hkj69   1/1     Running   0          6m33s
local-disks-fs-local-provisioner-vhpj8   1/1     Running   0          6m33s
local-disks-local-diskmaker-6qpfx        1/1     Running   0          22s
local-disks-local-diskmaker-pw5ql        1/1     Running   0          22s
local-disks-local-diskmaker-rc5hr        1/1     Running   0          22s
local-disks-local-provisioner-9qprp      1/1     Running   0          22s
local-disks-local-provisioner-kkkcm      1/1     Running   0          22s
local-disks-local-provisioner-kxbnn      1/1     Running   0          22s
local-storage-operator-ccbb59b45-nn7ww   1/1     Running   0          19m
[root@utility ~]# oc get sc
NAME            PROVISIONER                    AGE
local-sc        kubernetes.io/no-provisioner   6m36s
localblock-sc   kubernetes.io/no-provisioner   25s
[root@utility ~]# oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE
local-pv-5c4e718c   100Gi      RWO            Delete           Available           localblock-sc            10s
local-pv-68faed78   10Gi       RWO            Delete           Available           local-sc                 6m13s
local-pv-6a58375e   100Gi      RWO            Delete           Available           localblock-sc            10s
local-pv-780afdd6   10Gi       RWO            Delete           Available           local-sc                 6m12s
local-pv-b640422f   10Gi       RWO            Delete           Available           local-sc                 4m58s
local-pv-d6db37fd   100Gi      RWO            Delete           Available           localblock-sc            5s
[root@utility ~]#

All the PVs were created.

Install OCS 4.2. Here the official documentation

Create the namespace “openshift-storage

[root@utility ~]# cat &lt;&lt;EOF &gt; ocs-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-storage
  labels:
    openshift.io/cluster-monitoring: "true"
EOF
[root@utility ~]# oc create -f ocs-namespace.yaml
namespace/openshift-storage created
[root@utility ~]#

Add the labels to the workers

oc label node worker-1.ocp42.ssa.mbu.labs.redhat.com "cluster.ocs.openshift.io/openshift-storage=" --overwrite
oc label node worker-1.ocp42.ssa.mbu.labs.redhat.com "topology.rook.io/rack=rack0" --overwrite
oc label node worker-2.ocp42.ssa.mbu.labs.redhat.com "cluster.ocs.openshift.io/openshift-storage=" --overwrite
oc label node worker-2.ocp42.ssa.mbu.labs.redhat.com "topology.rook.io/rack=rack1" --overwrite
oc label node worker-3.ocp42.ssa.mbu.labs.redhat.com "cluster.ocs.openshift.io/openshift-storage=" --overwrite
oc label node worker-3.ocp42.ssa.mbu.labs.redhat.com "topology.rook.io/rack=rack3" --overwrite‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Install the operator from the web interface
Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Check on the web interface if the operator is Up to date
Image may be NSFW.
Clik here to view.

And wait for the PODs up&running

[root@utility ~]# oc get pod -n openshift-storage
NAME                                  READY   STATUS    RESTARTS   AGE
noobaa-operator-85d86479fc-n8vp5      1/1     Running   0          106s
ocs-operator-65cf57b98b-rk48c         1/1     Running   0          106s
rook-ceph-operator-59d78cf8bd-4zcsz   1/1     Running   0          106s
[root@utility ~]#

Create the OCS Cluster Service YAML file

[root@utility ~]# cat &lt;&lt;EOF &gt; ocs-cluster-service.yaml
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  manageNodes: false
  monPVCTemplate:
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: 'local-sc'
      volumeMode: Filesystem
  storageDeviceSets:
  - count: 1
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: 'localblock-sc'
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: true
    replica: 3
    resources: {}
EOF

You can notice the “monPVCTemplate” section in which we define the StorageClass “local-sc” and in the section “storageDeviceSets” the different storage sizes and the StorageClass “localblock-sc” used by OSD volumes.

Now we can create the resource

[root@utility ~]# oc create -f ocs-cluster-service.yaml
storagecluster.ocs.openshift.io/ocs-storagecluster created
[root@utility ~]#‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

During the creation of the resources, we can see how the PVCs created are bounded with the Local Storage PVs

[root@utility ~]# oc get pvc -n openshift-storage
NAME              STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
rook-ceph-mon-a   Bound    local-pv-68faed78   10Gi       RWO            local-sc       13s
rook-ceph-mon-b   Bound    local-pv-b640422f   10Gi       RWO            local-sc       8s
rook-ceph-mon-c   Bound    local-pv-780afdd6   10Gi       RWO            local-sc       3s
[root@utility ~]# oc get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                               STORAGECLASS    REASON   AGE
local-pv-5c4e718c   100Gi      RWO            Delete           Available                                       localblock-sc            28m
local-pv-68faed78   10Gi       RWO            Delete           Bound       openshift-storage/rook-ceph-mon-a   local-sc                 34m
local-pv-6a58375e   100Gi      RWO            Delete           Available                                       localblock-sc            28m
local-pv-780afdd6   10Gi       RWO            Delete           Bound       openshift-storage/rook-ceph-mon-c   local-sc                 34m
local-pv-b640422f   10Gi       RWO            Delete           Bound       openshift-storage/rook-ceph-mon-b   local-sc                 33m
local-pv-d6db37fd   100Gi      RWO            Delete           Available                                       localblock-sc            28m
[root@utility ~]#

And now we can see the OSD PVCs and the PVs bounded

[root@utility ~]# oc get pvc -n openshift-storage
NAME                      STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS    AGE
ocs-deviceset-0-0-7j2kj   Bound    local-pv-6a58375e   100Gi      RWO            localblock-sc   3s
ocs-deviceset-1-0-lmd97   Bound    local-pv-d6db37fd   100Gi      RWO            localblock-sc   3s
ocs-deviceset-2-0-dnfbd   Bound    local-pv-5c4e718c   100Gi      RWO            localblock-sc   3s‍‍‍‍‍
[root@utility ~]# oc get pv | grep localblock-sc
local-pv-5c4e718c                          100Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-2-0-dnfbd   localblock-sc                          31m
local-pv-6a58375e                          100Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-0-0-7j2kj   localblock-sc                          31m
local-pv-d6db37fd                          100Gi      RWO            Delete           Bound    openshift-storage/ocs-deviceset-1-0-lmd97   localblock-sc                          31m
[root@utility ~]#

This is the first PVC created inside the OCS cluster used by noobaa

[root@utility ~]# oc get pvc -n openshift-storage
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
db-noobaa-core-0          Bound    pvc-d8dbb86f-3d83-11ea-ac51-001a4a16017d   50Gi       RWO            ocs-storagecluster-ceph-rbd   72s‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Wait for all the PODs up&running

[root@utility ~]# oc get pod -n openshift-storage
NAME                                                              READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-2qkl8                                            3/3     Running     0          5m31s
csi-cephfsplugin-4pbvl                                            3/3     Running     0          5m31s
csi-cephfsplugin-j8w82                                            3/3     Running     0          5m31s
csi-cephfsplugin-provisioner-647cd6996c-6mw9t                     4/4     Running     0          5m31s
csi-cephfsplugin-provisioner-647cd6996c-pbrxs                     4/4     Running     0          5m31s
csi-rbdplugin-9nj85                                               3/3     Running     0          5m31s
csi-rbdplugin-jmnqz                                               3/3     Running     0          5m31s
csi-rbdplugin-provisioner-6b8ff67dc4-jk5lm                        4/4     Running     0          5m31s
csi-rbdplugin-provisioner-6b8ff67dc4-rxjhq                        4/4     Running     0          5m31s
csi-rbdplugin-vrzjq                                               3/3     Running     0          5m31s
noobaa-core-0                                                     1/2     Running     0          2m34s
noobaa-operator-85d86479fc-n8vp5                                  1/1     Running     0          13m
ocs-operator-65cf57b98b-rk48c                                     0/1     Running     0          13m
rook-ceph-drain-canary-worker-1.ocp42.ssa.mbu.labs.redhat.w2cqv   1/1     Running     0          2m41s
rook-ceph-drain-canary-worker-2.ocp42.ssa.mbu.labs.redhat.whv6s   1/1     Running     0          2m40s
rook-ceph-drain-canary-worker-3.ocp42.ssa.mbu.labs.redhat.ll8gj   1/1     Running     0          2m40s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-d7d64976d8cm7   1/1     Running     0          2m28s
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-864fdf78ppnpm   1/1     Running     0          2m27s
rook-ceph-mgr-a-5fd6f7578c-wbsb6                                  1/1     Running     0          3m24s
rook-ceph-mon-a-bffc546c8-vjrfb                                   1/1     Running     0          4m26s
rook-ceph-mon-b-8499dd679c-6pzm9                                  1/1     Running     0          4m11s
rook-ceph-mon-c-77cd5dd54-64z52                                   1/1     Running     0          3m46s
rook-ceph-operator-59d78cf8bd-4zcsz                               1/1     Running     0          13m
rook-ceph-osd-0-b46fbc7d7-hc2wz                                   1/1     Running     0          2m41s
rook-ceph-osd-1-648c5dc8d6-prwks                                  1/1     Running     0          2m40s
rook-ceph-osd-2-546d4d77fb-qb68j                                  1/1     Running     0          2m40s
rook-ceph-osd-prepare-ocs-deviceset-0-0-7j2kj-s72g4               0/1     Completed   0          2m56s
rook-ceph-osd-prepare-ocs-deviceset-1-0-lmd97-27chl               0/1     Completed   0          2m56s
rook-ceph-osd-prepare-ocs-deviceset-2-0-dnfbd-s7z8v               0/1     Completed   0          2m56s
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-d7b4b5b6hnpr   1/1     Running     0          2m12s

Our installation is now complete and OCS fully operative.

Now we can browse the noobaa management console (for now it only works in Chrome) and create a new user to test the S3 object storage
Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Get the endpoint for the S3 object server

[root@utility ~]# oc get route s3 -o jsonpath='{.spec.host}' -n openshift-storage
s3-openshift-storage.apps.ocp42.ssa.mbu.labs.redhat.com‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Test it with your preferred S3 client (I use Cyberduck in my windows desktop which I’m using to write this article)
Image may be NSFW.
Clik here to view.

Image may be NSFW.
Clik here to view.

Create something to check if you can write
Image may be NSFW.
Clik here to view.

It works!

Set the ocs-storagecluster-cephfs StorageClass as the default one

[root@utility ~]# oc patch storageclass ocs-storagecluster-cephfs -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/ocs-storagecluster-cephfs patched
[root@utility ~]#‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Test the ocs-storagecluster-cephfs StorageClass adding persistent storage to the registry

 [root@utility ~]# oc edit configs.imageregistry.operator.openshift.io
storage:
  pvc:
    claim:‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Check the PVC created and wait for the new POD up&running

[root@utility ~]# oc get pvc -n openshift-image-registry
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
image-registry-storage   Bound    pvc-ba4a07c1-3d86-11ea-ad40-001a4a1601e7   100Gi      RWX            ocs-storagecluster-cephfs   12s
[root@utility ~]# oc get pod -n openshift-image-registry
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-655fb7779f-pn7ms   2/2     Running   0          36h
image-registry-5bdf96556-98jbk                     1/1     Running   0          105s
node-ca-9gbxg                                      1/1     Running   1          35h
node-ca-fzcrm                                      1/1     Running   0          35h
node-ca-gr928                                      1/1     Running   1          35h
node-ca-jkfzf                                      1/1     Running   1          35h
node-ca-knlcj                                      1/1     Running   0          35h
node-ca-mb6zh                                      1/1     Running   0          35h
[root@utility ~]#

Test it in a new project test

[root@utility ~]# oc new-project test
Now using project "test" on server "https://api.ocp42.ssa.mbu.labs.redhat.com:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app django-psql-example

to build a new example application in Python. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node

[root@utility ~]# podman pull alpine
Trying to pull docker.io/library/alpine...Getting image source signatures
Copying blob c9b1b535fdd9 doneCopying config e7d92cdc71 doneWriting manifest to image destination
Storing signaturese7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a
[root@utility ~]# podman login -u $(oc whoami) -p $(oc whoami -t) $REGISTRY_URL --tls-verify=false
Login Succeeded!
[root@utility ~]# podman tag alpine $REGISTRY_URL/test/alpine
[root@utility ~]# podman push $REGISTRY_URL/test/alpine --tls-verify=false
Getting image source signatures
Copying blob 5216338b40a7 done
Copying config e7d92cdc71 done
Writing manifest to image destination
Storing signatures
[root@utility ~]# oc get is -n test
NAME     IMAGE REPOSITORY                                                                        TAGS     UPDATED
alpine   default-route-openshift-image-registry.apps.ocp42.ssa.mbu.labs.redhat.com/test/alpine   latest   3 minutes ago
[root@utility ~]#

The registry works!

Other Scenario

If your cluster is deployed in vSphere and uses the default “thin” StorageClass but your datastore isn’t big enough, you can start from the OCS installation.
When it comes to creating the OCS Cluster Service, create a YAML file with your desired sizes and without storageClassName (it will use the default one).
You can also remove the “monPVCTemplate” if you are not interested in changing the storage size.

[root@utility ~]# cat &lt;&lt;EOF &gt; ocs-cluster-service.yaml
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  manageNodes: false
  monPVCTemplate:
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: ''
      volumeMode: Filesystem
  storageDeviceSets:
  - count: 1
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: ''
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: true
    replica: 3
    resources: {}
EOF

Limits and Requests

Limits and Requests, by default, are set like that

[root@utility ~]# oc describe node worker-1.ocp42.ssa.mbu.labs.redhat.com
...
  Namespace           Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------           ----                               ------------  ----------  ---------------  -------------  ---
  openshift-storage   noobaa-core-0                      4 (25%)       4 (25%)     8Gi (12%)        8Gi (12%)      13m
  openshift-storage   rook-ceph-mgr-a-676d4b4796-54mtk   1 (6%)        1 (6%)      3Gi (4%)         3Gi (4%)       12m
  openshift-storage   rook-ceph-mon-b-7d7747d8b4-k9txg   1 (6%)        1 (6%)      2Gi (3%)         2Gi (3%)       13m
  openshift-storage   rook-ceph-osd-1-854847fd4c-482bt   1 (6%)        2 (12%)     4Gi (6%)         8Gi (12%)      12m
...

We can create our new YAML file to change those settings in the ocs-storagecluster StorageCluster resource

[root@utility ~]# cat &lt;&lt;EOF &gt; ocs-cluster-service-modified.yaml
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  resources:
    mon:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
    mgr:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
    noobaa-core:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
    noobaa-db:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
  manageNodes: false
  monPVCTemplate:
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: 'local-sc'
      volumeMode: Filesystem
  storageDeviceSets:
  - count: 1
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: 'localblock-sc'
        volumeMode: Block
    name: ocs-deviceset
    placement: {}
    portable: true
    replica: 3
    resources:
      limits:
        cpu: 1
        memory: 4Gi
      requests:
        cpu: 1
        memory: 4Gi
EOF

And apply

[root@utility ~]# oc apply -f ocs-cluster-service-modified.yaml
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
storagecluster.ocs.openshift.io/ocs-storagecluster configured

We have to wait for the operator which reads the new configs and applies them

[root@utility ~]# oc describe node worker-1.ocp42.ssa.mbu.labs.redhat.com
...
  Namespace           Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------           ----                               ------------  ----------  ---------------  -------------  ---
  openshift-storage   noobaa-core-0                      2 (12%)       2 (12%)     2Gi (3%)         2Gi (3%)       23s
  openshift-storage   rook-ceph-mgr-a-54f87f84fb-pm4rn   1 (6%)        1 (6%)      1Gi (1%)         1Gi (1%)       56s
  openshift-storage   rook-ceph-mon-b-854f549cd4-bgdb6   1 (6%)        1 (6%)      1Gi (1%)         1Gi (1%)       46s
  openshift-storage   rook-ceph-osd-1-ff56d545c-p7hvn    1 (6%)        1 (6%)      4Gi (6%)         4Gi (6%)       50s
...

And now we have our PODs with the new configurations applied.

The OSD PODs won’t start if you choose too low values.

Sections:

  • mon for rook-ceph-mon
  • mgr for rook-ceph-mgr
  • noobaa-core and noobaa-db for the 2 containers in the pod noobaa-core-0
  • mds for rook-ceph-mds-ocs-storagecluster-cephfilesystem
  • rgw for rook-ceph-rgw-ocs-storagecluster-cephobjectstore
  • the resources section in the end for rook-ceph-osd

rgw and mds sections work only the first time we create the resource.

---
spec:
  resources:
    mds:
      limits:
        cpu: 2
        memory: 4Gi
      requests:
        cpu: 2
        memory: 4Gi
    rgw:
      limits:
        cpu: 1
        memory: 2Gi
      requests:
        cpu: 1
        memory: 2Gi
---

Conclusions

Now you can enjoy your brand-new OCS 4.2 in OCP 4.2.x\
Things changed if you think about OCS 3.x, for example, the use of the PVCs instead of using directly the disks attached and for now, there are a lot of limitations for sustainability and supportability reasons.\
We will wait for a fully supported installation for these scenarios.

UPDATES

  • The cluster used to write this article has been updated from 4.2.14 to 4.2.16 and then from 4.2.16 to 4.3.0.

The current OCS setup is still working
Image may be NSFW.
Clik here to view.

  • Added Requests and Limits configurations.

The post Open Container Storage 4.2 in Open Container Platform 4.2.14 – UPI Installation in Red Hat Virtualization appeared first on Red Hat OpenShift Blog.

Video: OpenShift is Kubernetes


Introduction to Security Contexts and SCCs

With Role Based Access Control, we have an OpenShift-wide tool to determine the actions (or verbs) each user can perform against each object in the API. For that, rules are defined combining resources with the API verbs into sets called roles, and with the role binding we attribute those rules to users. Once we have those Users or Service Accounts, we can attribute them to particular resources to give them access to those actions. For example, a Pod may be able to delete a ConfigMap, but not a Secret when running under a specific Service Account. That’s an upper level control plane feature that doesn’t take into account the underlay node permission model, meaning the Unix permission model, and some of it’s newer kernel accouterments.

So, the container platform is protected with good RBAC practices from it’s created object,s but the node may not be. That is where a Pod may not be able to delete an object in etcd using the API because it’s restricted by RBAC, but it may delete important files in the system and even stop kubelet if properly programmed for that. To prevent this scenario, SCCs (Security Context Constraints) can come to the rescue.

Linux Processes and Privileges

Before going into deep waters with SCCs, let’s go back in time and take a look at some of the key concepts Linux brings to us regarding processes. A good start is entering the command man capabilities on a Linux terminal. That’s the manual page that contains very important fundamentals to understand the goal behind the SCCs.

The first important distinction that we need to do is between privileged and unprivileged processes. While privileged processes will have user ID 0 being the superuser or root, unprivileged processes will have non-zero user IDs. Privileged processes bypass kernel permission checks. That means that the actions that a process or thread can perform on operating systems objects such as files, directories, symbolic links, pseudo filesystems (procfs, cgroupfs, sysfs etc.) and even memory objects such as shared memory regions, pipes and sockets… Those actions are unlimited and not verified by the system. Meaning, the kernel won’t check user, group or others permissions (taking from the Unix permission model UGO – user, group and others) to grant access to that specific object in behalf of the process.

If we look at the list of running processes on a Linux system using the command ps -u root we will find very important processes such as systemd for example that has the PID 1 and is responsible for bootstrapping the user space in most distributions and initializing most common services. For that it needs non restricted access to the system.

Unprivileged processes, though, are subject to full permission checking based on process credentials (user ID, group ID and supplementary group list etc.). The kernel will make an iterative check under each category user, groups and others trying to match the user and group credentials on the running process with the target object’s permissions in order to grant or deny access. Keep in mind that this is not the service account in OpenShift. This is the system’s user that runs the container process if we want to speak containers.

After kernel 2.2 the concept of capabilities was introduced. In order to have more flexibility and enable the use of superuser or root features in a granular way, those super privileges were broken into small pieces that can be enabled or disabled independently. That is what we call capabilities. We can take a deeper look on http://man7.org/linux/man-pages/man7/capabilities.7.html

As an example, let’s say that we have an application that needs special networking configurations. Let’s say that we need to configure one interface, open a port on the system’s firewall, create a NAT rule for that and punt a new custom route on the system’s routing table. But you don’t need to make arbitrary changes to any file in the system. We can set CAP_NET_ADMIN instead of running the process as a privileged one.

Beyond privileges and capabilities we have SELinux and AppArmor that are both kernel security modules that can be added on top of capabilities to get even more fine grained security rules by using access control security policies or program profiles. In addition, we have Seccomp which is a secure computing mode kernel facility that reduces the available system calls to the kernel for a given process.

Finally, adding to all that, we still have interprocess communications, privilege escalation and access to the host namespace when we begin to talk about containers. That is out of scope here at this point but…

How does that translate to containers?

That said, we come back to containers and ask: what are containers again? They are processes segregated by namespaces and cgroups and on that note they have all the same security features described above. So how do we create containers with those security features then?

Let’s first take a look at what is the smallest piece of software that creates the container process: runc. As its definition on the github page says, it’s a tool to spawn and run containers according to the OCI specification. It’s the default choice for OCI runtimes although we have others such as kata containers. In order to use runc, we need to have a file system image and a bundle with the configuration for the process. The short story on the bundle is we must put a json formatted specification for the container where all the configurations will be taken into account. Check this part of it’s documentation: https://github.com/opencontainers/runtime-spec/blob/master/config.md#linux-process

From there we have fields such as apparmorProfile, capabilities or selinuxLabel. We can set user ID, group ID and supplementary group IDs. What tool then automates the process of getting the file system ready and passing down those parameters for us?

We can use podman, for example, for testing or development, running isolated containers or pods. It allows us to do it with special privileges as we show below:

Privileged bash terminal:

sudo podman run --privileged -it registry.access.redhat.com/rhel7/rhel /bin/bash

Process ntpd with privilege to change the system clock:

sudo podman run -d --cap-add SYS_TIME ntpd

Ok. Cool. But when it comes the time to run those containers on Kubernetes or OpenShift how do we configure those capabilities and security features?

Inside the OpenShift platform CRI-O container engine is the one that runs and manages containers. It is compliant with the Kubernetes Container Runtime Interface (CRI). It complies with kubelet rules in order to give it a standard interface to call the container engine and all the magic is done automating runc behind the scenes while allowing other features to be developed under the engine itself.

Following the workflow above to run a pod in Kubernetes or OpenShift, we’ll first make an API call to kubernetes asking to run a particular Pod. It could come from an oc command or from code, for example. Then the API will process that request and store it in etcd; the pod will be scheduled for a specific node since the scheduler watches those events; finally, kubelet, in that node, will read that event and call the container runtime (CRI-O) with all the parameters and options requested to run the pod. I know it’s very summarized. But the important thing here is that we need to pass parameters down to the API in order to have our Pod with the desired privileges configured. In the example below a new pod gets scheduled to run in node 1.

Image may be NSFW.
Clik here to view.

What goes into that yaml file in order to request those privileges? Two different objects are implemented under the Kubernetes API: PodSecurityContext and SecurityContext. The first one, obviously, related to Pods and the second one related to the specific container. They are part of their respective types. So you can find those fields on Pod and Container Specs on yaml manifests. With that they can be applied to an entire Pod, no matter how many containers are there or to specific containers into that Pod. Then the SecurityContext settings take precedence over the PodSecurityContext ones. You can find the security context source code under https://github.com/kubernetes/api/blob/master/core/v1/types.go.

Here we can find a few examples on how to configure security contexts for Pods. Below I present the first three fields of the SecurityContext object.

type SecurityContext struct {
    // The capabilities to add/drop when running containers.
    // Defaults to the default set of capabilities granted by the container runtime.
    // +optional
    Capabilities *Capabilities `json:"capabilities,omitempty" protobuf:"bytes,1,opt,name=capabilities"`
    // Run container in privileged mode.
    // Processes in privileged containers are essentially equivalent to root on the host.
    // Defaults to false.
    // +optional
    Privileged *bool `json:"privileged,omitempty" protobuf:"varint,2,opt,name=privileged"`
    // The SELinux context to be applied to the container.
    // If unspecified, the container runtime will allocate a random SELinux context for each
    // container.  May also be set in PodSecurityContext.  If set in both SecurityContext and
    // PodSecurityContext, the value specified in SecurityContext takes precedence.
    // +optional
    SELinuxOptions *SELinuxOptions `json:"seLinuxOptions,omitempty" protobuf:"bytes,3,opt,name=seLinuxOptions"`
    <...>
}

Here is an example of a yaml manifest configuration with capabilities on securityContext field:

apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo-4
spec:
  containers:
  - name: sec-ctx-4
    image: gcr.io/google-samples/node-hello:1.0
    securityContext:
      capabilities:
        add: ["NET_ADMIN", "SYS_TIME"]

Ok. Now what? We have an idea on how to give super powers to a container or Pod even though they may be RBAC restricted. How can we control this behavior?

Security Context Constraints

Finally we get back to our main subject. How can I make sure that a specific Pod or Container doesn’t request more than what it should request in terms of process privileges and not only OpenShift object privileges under it’s API?

That’s the role of Security Context Constraints. To check beforehand if the system can pass that pod or container configuration request, with privileged or custom security context, further onto the cluster API that will end up running a powerful container process. To have a taste on what a SCC looks like here is an example:

oc get scc restricted -o yaml

allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: MustRunAs
groups:
- system:authenticated
kind: SecurityContextConstraints
metadata:
  annotations:
    kubernetes.io/description: restricted denies access to all host features and requires
      pods to be run with a UID, and SELinux context that are allocated to the namespace.  This
      is the most restrictive SCC and it is used by default for authenticated users.
  creationTimestamp: "2020-02-08T17:25:39Z"
  generation: 1
  name: restricted
  resourceVersion: "8237"
  selfLink: /apis/security.openshift.io/v1/securitycontextconstraints/restricted
  uid: 190ef798-af35-40b9-a980-0d369369a385
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities:
- KILL
- MKNOD
- SETUID
- SETGID
runAsUser:
  type: MustRunAsRange
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret

That above is the default SCC that has pretty basic permissions and will accept Pod configurations that don’t request special security contexts. Just by looking at the name of the fields we can have an idea on how many features it can verify before letting a workload with containers pass by the API and get scheduled.

In conclusion, we have at hand a tool that allows an OpenShift admin to decide whether an entire pod can run in privileged mode, have special capabilities, access directories and volumes on the host namespace, use special SELinux contexts, what ID the container process can use among other features before the Pod gets requested to the API and passed to the container runtime process.

In the next blog posts we’ll explore each field of an SCC, explore their underlying Linux technology, present the prebuilt ones and understand their relationship with the RBAC system to grant or deny special security contexts declared under Pod’s or container’s Spec field. Stay tuned!

The post Introduction to Security Contexts and SCCs appeared first on Red Hat OpenShift Blog.

OpenShift Commons Briefing: Deep Dive on the OpenShift Logging-Stack – Gabriel Stein (Red Hat)

 

 

Just deployed the EFK Logging-Stack on top of OpenShift and now you cannot see all the logs on Kibana?

Suddenly the infra nodes start to hang and it is running out of resources? In this OpenShift Commons Briefing, Red Hat’s Gabriel Ferraz Stein shows us how to check the installation from the EFK Logging-Stack, how to to better capacity planning to not run out of resources and also effectively work with Red Hat Support Services to solve the Logging-Stack issues.

The briefing, Gabriel covers the many aspects from the EFK Logging-Stack installed on top of OpenShift Container Platform 3.x/4.x including:

  • What are exactly the components from the EFK Logging-Stack?
  • What functions do they have?
  • What are the most common recommendations and requirements to deploy a EFK Logging-Stack safely and in a reliable way?
  • How should I calculate the capacity from your resources?
  • Best practices and the sizing options
  • Debugging the EFK Logging Stack
  • Generate a Logging-Dump to help you understand all the components from the dump

 

Briefing Slides: A deep dive on the OpenShift Logging-Stack

Feedback:

To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.

If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.

The post OpenShift Commons Briefing: Deep Dive on the OpenShift Logging-Stack – Gabriel Stein (Red Hat) appeared first on Red Hat OpenShift Blog.

Announcing OpenShift Serverless 1.5.0 Tech Preview – A sneak peak of our GA

I am sure many of you are as excited as we about cloud native development, and one of the hot topics in the space is Serverless. With that in mind let’s talk about our most recent release of OpenShift Serverless that includes a number of features and functionalities that definitely improve the developer experience in Kubernetes and really enable many interesting application patterns and workloads. 

For the uninitiated, OpenShift Serverless is based on the open source project Knative and helps developers deploy and run almost any containerized workload as a serverless workload. Applications can scale up or down (to zero) or react to and consume events without lock-in concerns. The Serverless user experience can be integrated with other OpenShift services, such as OpenShift Pipelines, Monitoring and Metering. Beyond autoscaling and events, it also provides a number of other features, such as:

Image may be NSFW.
Clik here to view.
Immutable revisions allows you to deploy new features: performing canary, A/B or blue-green testing with gradual traffic rollout with no sweat and following best practices.
Image may be NSFW.
Clik here to view.
Ready for the hybrid cloud: Truly portable serverless running anywhere OpenShift runs, that is on-premises or on any public cloud. Leverage data locality and SaaS when needed.
Image may be NSFW.
Clik here to view.
Use any programming language or runtime of choice. From Java, Python, Go and JavaScript to Quarkus, SpringBoot or Node.js.

One of the most interesting aspects of running serverless containers is that it offers an alternative to application modernization that allows users to reuse investments already made and what is available today. If you have a number of web applications, microservices or RESTful APIs built as containers that you would like to scale up and down based on the number of HTTP requests, that’s a perfect fit. But if you also would like to build new event driven systems that will consume Apache Kafka messages or be triggered by new files being uploaded to Ceph (or S3), that’s possible too. Autoscaling your containers to match the number of requests can improve your response time, offering a better quality of service and increase your cluster density by allowing more applications to run, optimizing resources usage.

New Features in 1.5.0 – Technology Preview

Based on Knative 0.12.1 – Keeping up with the release cadence of the community, we already include Knative 0.12 in Serving, Eventing and kn – the official Knative CLI. As with anything we ship as a product at Red Hat, this means we have validated these components on a variety of different platforms and configurations OpenShift runs.

Use of Kourier – By using Kourier we can maintain the list of requirements to get Serverless installed in OpenShift to a minimal, with low resource consumption, faster cold-starts and avoiding impact on non-serverless workloads running on the same namespace. In combination with fixes we implemented in OpenShift 4.3.5 the time to create an application from a pre-built container improved between 40-50% depending on the container image size.

Before Kourier

Image may be NSFW.
Clik here to view.

After Kourier 

Image may be NSFW.
Clik here to view.

 

Disconnected installs (air gapped) – Given the request of several customers that want to benefit from serverless architectures and its programming model but in controlled environments with restricted or no internet access, we are enabling the OpenShift Serverless operator to be installed in disconnected OpenShift clusters. The kn CLI, used to manage applications in Knative, is also available to download from the OpenShift cluster itself, even in disconnected environments. 

Image may be NSFW.
Clik here to view.

The journey so far

We already have OpenShift Serverless being deployed and used on a number of Openshift clusters by a variety of customers during the Technology Preview. These clusters are running on a number of different providers such as on premises with bare metal hardware or virtualized systems, or on the cloud running on AWS or Azure. These environments exposed our team to a number of different configurations that you really only get by running hybrid cloud solutions which enables us to cover a wide net during this validation period and take this feedback back to the community, improving quality and usability. 

Install experience and upgrades with the Operator 

Image may be NSFW.
Clik here to view.

The Serverless operator deals with all the complexities of installing Knative on Kubernetes, offering a simplified experience. It takes it one step further by enabling an easy path to upgrades and updates, which are also delivered over-the-air and that can be applied automatically, making system administrators rest assured that they can receive CVEs and bug fixes to production systems. For those concerned with automatic updates, they can also opt for manually applying those as well. 

Integration with Console

With the integration with OpenShift console, users have the ability to configure traffic distribution using the UI as an alternative to use kn, the CLI. Traffic split lets users perform a number of different techniques to roll out new versions and new features on their applications, the most common ones being A/B testing, canary or dark launches. By letting users visualize this using the topology view they can get quickly an understanding of the architecture and deployment strategies being used and course correct if needed. 

 

 

Image may be NSFW.
Clik here to view.

The integration with the console provides a good visualization for event sources connected to services. The screenshot below for examples has a service (kiosk) consuming messages from Apache Kafka, while two other applications (frontend) are scaled down to zero. 

 

Deploy your first application and use Quarkus

To deploy your first serverless container using the CLI (kn), download the client and from a terminal execute: 

[markito@anakin ~]$ kn service create greeter --image quay.io/rhdevelopers/knative-tutorial-greeter:quarkus
Creating service 'greeter' in namespace 'default':
0.133s The Route is still working to reflect the latest desired specification.
0.224s Configuration "greeter" is waiting for a Revision to become ready.
5.082s ...
5.132s Ingress has not yet been reconciled.
5.235s Ready to serve.
Service 'greeter' created to latest revision 'greeter-pjxfx-1' is available at URL:
http://greeter.default.apps.test.mycluster.org

This will create a Knative Service based on the container image provided. Quarkus, a Kubernetes native Java stack, is a perfect fit for building serverless applications in Java, given its blazing fast startup time and low memory footprint, but Knative can also run any other language or runtime. Creating a Knative Service object will manage multiple Kubernetes objects commonly used to deploy an application, such as Deployments, Routes and Services, providing a simplified experience for anyone getting started with Kubernetes development, with the added benefit of making it autoscale based on the number of requests and all other benefits already mentioned on this post. 

 

You can also follow the excellent Knative Tutorial for more scenarios and samples. 

 

The journey so far has been exciting and we have been contributing to the Knative community since its inception. I would also like to send a big “thank you” to our team across engineering, QE and documentation for keeping up with the fast pace of the serverless space; they have been doing phenomenal work. 

 

Get started today with OpenShift Serverless following the installation instructions

The post Announcing OpenShift Serverless 1.5.0 Tech Preview – A sneak peak of our GA appeared first on Red Hat OpenShift Blog.

Simplifying deployments of accelerated AI workloads on Red Hat OpenShift with NVIDIA GPU Operator

In this blog we would like to demonstrate how to use the new NVIDIA GPU operator to deploy GPU-accelerated workloads on an OpenShift cluster.

The new GPU operator enables OpenShift to schedule workloads that require use of GPGPUs as easily as one would schedule CPU or memory for more traditional not accelerated workloads. Start by creating a container that has a GPU workload inside it and request the GPU resource when creating the pod and OpenShift will take care of the rest. This makes deployment of GPU workloads to OpenShift clusters straightforward for users and administrators as it is all managed at the cluster level and not on the host machines. The GPU operator for OpenShift will help to simplify and accelerate the compute-intensive ML/DL modeling tasks for data scientists, as well as  help running inferencing tasks across data centers, public clouds, and at the edge. Typical workloads that can benefit from GPU acceleration include image and speech recognition, visual search and several others.

We assume that you have an OpenShift 4.x cluster deployed with some worker nodes that have GPU devices.

$ oc get no

NAME                           STATUS ROLES AGE VERSION

ip-10-0-130-177.ec2.internal   Ready worker 33m v1.16.2

ip-10-0-132-41.ec2.internal    Ready master 42m v1.16.2

ip-10-0-156-85.ec2.internal    Ready worker 33m v1.16.2

ip-10-0-157-132.ec2.internal   Ready master 42m v1.16.2

ip-10-0-170-127.ec2.internal   Ready worker 4m15s v1.16.2

ip-10-0-174-93.ec2.internal    Ready master 42m v1.16.2

In order to expose what features and devices each node has to OpenShift we first need to deploy the Node Feature Discovery (NFD) Operator (see here for more detailed instructions).

Once the NFD Operator is deployed we can take a look at one of our nodes; here we see the difference between the node before and after. Among the new labels describing the node features, we see:

feature.node.kubernetes.io/pci-10de.present=true

This indicates that we have at least one PCIe device from the vendor ID 0x10de, which is for Nvidia. These labels created by the NFD operator are what the GPU Operator uses in order to determine where to deploy the driver containers for the GPU(s).

However, before we can deploy the GPU Operator we need to ensure that the appropriate RHEL entitlements have been created in the cluster (see here for more detailed instructions). After the RHEL entitlements have been deployed to the cluster, then we may proceed with installation of the GPU Operator.

The GPU Operator is currently installed via helm chart, so make sure that you have helm v3+ installed. Once you have helm installed we can begin the GPU Operator installation.

     1. Add the Nvidia helm repo:

 

$ helm repo add nvidia https://nvidia.github.io/gpu-operator "nvidia" has been added to your repositories

     2. Update the helm repo:


$ helm repo update Hang tight while we grab the latest from your chart repositories...
 ...Successfully got an update from the "nvidia" chart repository Update Complete. ⎈ Happy Helming!⎈

     3. Install the GPU Operator helm chart:

 

$ helm install --devel https://nvidia.github.io/gpu-operator/gpu-operator-1.0.0.tgz 
--set platform.openshift=true,operator.defaultRuntime=crio,nfd.enabled=false --wait --generate-name

     4. Monitor deployment of GPU Operator:

 

$ oc get pods -n gpu-operator-resources -w


This command will watch the gpu-operator-resources namespace as the operator rolls out on the cluster. Once the installation is completed you should
see something like this in the gpu-operator-resources namespace.

We can see that both the nvidia-driver-validation and the nvidia-device-plugin-validation pods have completed successfully and we have four daemonsets, each running the number of pods that match the node label feature.node.kubernetes.io/pci-10de.present=true. Now we can inspect our GPU node once again.

Here we can see the latest changes to our node which now include Capacity, Allocatable and Allocatable Resources for a new resource called nvidia.com/gpu. As we see above, since our GPU node only has one GPU we can see that reflected.

Now that we have the NFD Operator, cluster entitlements, and the GPU Operator deployed we can assign workloads that will use the GPU resources.

Let’s begin by configuring Cluster Autoscaling for our GPU devices. This will allow us to create workloads that request GPU resources and then will automatically scale our GPU nodes up and down depending on the amount of requests pending for these devices.

The first step is to create a ClusterAutoscaler resource definition, for example:

$ cat 0001-clusterautoscaler.yaml

apiVersion: "autoscaling.openshift.io/v1"

kind: "ClusterAutoscaler"

metadata:

  name: "default"

spec:

  podPriorityThreshold: -10

  resourceLimits:

    maxNodesTotal: 24

    gpus:

      - type: nvidia.com/gpu

        min: 0

        max: 16

  scaleDown:

    enabled: true

    delayAfterAdd: 10m

    delayAfterDelete: 5m

    delayAfterFailure: 30s

    unneededTime: 10m


$ oc create -f 0001-clusterautoscaler.yaml

clusterautoscaler.autoscaling.openshift.io/default created

 

Here we define the number of nvidia.com/gpu resources that we expect for the Autoscaler.

 

After we deploy the ClusterAutoscaler, we deploy the MachineAutoscaler resource that references the MachineSet that is used to scale the cluster:

$ cat 0002-machineautoscaler.yaml

apiVersion: "autoscaling.openshift.io/v1beta1"

kind: "MachineAutoscaler"

metadata:

  name: "gpu-worker-us-east-1a"

  namespace: "openshift-machine-api"

spec:

  minReplicas: 1

  maxReplicas: 6

  scaleTargetRef:

    apiVersion: machine.openshift.io/v1beta1

    kind: MachineSet

    name: gpu-worker-us-east-1a


$ oc create -f 0002-machineautoscaler.yaml

machineautoscaler.autoscaling.openshift.io/sj-022820-01-h4vrj-worker-us-east-1c created

 

The metadata name should be a unique MachineAutoscaler name, and the MachineSet name at the end of the file should be the value of an existing MachineSet.

 

Looking at our cluster, we check what MachineSets are available:

 

$ oc get machinesets -n openshift-machine-api

NAME                                   DESIRED CURRENT READY AVAILABLE AGE

sj-022820-01-h4vrj-worker-us-east-1a   1 1 1 1 4h45m

sj-022820-01-h4vrj-worker-us-east-1b   1 1 1 1 4h45m

sj-022820-01-h4vrj-worker-us-east-1c   1 1 1 1 4h45m

In this example the third MachineSet sj-022820-01-h4vrj-worker-us-east-1c is the one that has GPU nodes.

 

$ oc get machineset sj-022820-01-h4vrj-worker-us-east-1c -n openshift-machine-api -o yaml 

apiVersion: machine.openshift.io/v1beta1

kind: MachineSet

metadata:

  name: sj-022820-01-h4vrj-worker-us-east-1c

  namespace: openshift-machine-api

...

spec:

  replicas: 1

...

    spec:

      metadata:

          instanceType: p3.2xlarge

          kind: AWSMachineProviderConfig

          placement:

            availabilityZone: us-east-1c

            region: us-east-1

 

We can create our MachineAutoscaler resource definition, which would look like this:

 

$ cat 0002-machineautoscaler.yaml

apiVersion: "autoscaling.openshift.io/v1beta1"

kind: "MachineAutoscaler"

metadata:

  name: "sj-022820-01-h4vrj-worker-us-east-1c"

  namespace: "openshift-machine-api"

spec:

  minReplicas: 1

  maxReplicas: 6

  scaleTargetRef:

    apiVersion: machine.openshift.io/v1beta1

    kind: MachineSet

    name: sj-022820-01-h4vrj-worker-us-east-1c



$ oc create -f 0002-machineautoscaler.yaml

machineautoscaler.autoscaling.openshift.io/sj-022820-01-h4vrj-worker-us-east-1c created

We can now start to deploy RAPIDs using shared storage between multiple instances. Begin by creating a new project:

 

$ oc new-project rapids

 

Assuming you have a StorageClass that provides ReadWriteMany functionality like OpenShift Container Storage with cephfs, we can create a PVC to attach to our RAPIDs instances. (‘storageClassName` is the name of the StorageClass)

 

$ cat 0003-pvc-for-ceph.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: rapids-cephfs-pvc

spec:

  accessModes:

  - ReadWriteMany

  resources:

    requests:

      storage: 25Gi

  storageClassName: example-storagecluster-cephfs



$ oc create -f 0003-pvc-for-ceph.yaml

persistentvolumeclaim/rapids-cephfs-pvc created



$ oc get pvc -n rapids

NAME                STATUS VOLUME                                 CAPACITY ACCESS MODES STORAGECLASS       AGE

rapids-cephfs-pvc   Bound pvc-a6ba1c38-6498-4b55-9565-d274fb8b003e   25Gi RWX example-storagecluster-cephfs   33s

 

Now that we have our shared storage deployed we can finally deploy the RAPIDs template and create the new application inside our rapids namespace:

 

$ oc create -f 0004-rapids_template.yaml

template.template.openshift.io/rapids created

$ oc new-app rapids

--> Deploying template "rapids/rapids" to project rapids


     RAPIDS

     ---------

     Template for RAPIDS


     A RAPIDS pod has been created.


     * With parameters:

        * Number of GPUs=1

        * Rapids instance number=1


--> Creating resources ...

    service "rapids" created

    route.route.openshift.io "rapids" created

    pod "rapids" created

--> Success

    Access your application via route 'rapids-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com'

    Run 'oc status' to view your app.

 

In a browser we can now load the route that the template created above: rapids-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com

Image may be NSFW.
Clik here to view.

Image shows example notebook running using GPUs on OpenShift

 

We can also see on our GPU node that RAPIDs is running on it and using the GPU resource:

 

$ oc describe gpu node 

 

Given we have more than one person that wants to run Jupyter playbooks, lets create a second RAPIDs instance with its own dedicated GPU.

 

$ oc new-app rapids -p INSTANCE=2

--> Deploying template "rapids/rapids" to project rapids

     RAPIDS

     ---------

     Template for RAPIDS

     A RAPIDS pod has been created.

     * With parameters:

        * Number of GPUs=1

        * Rapids instance number=2

--> Creating resources ...

    service "rapids2" created

    route.route.openshift.io "rapids2" created

    pod "rapids2" created

--> Success

    Access your application via route 'rapids2-rapids.apps.sj-022820-01.perf-testing.devcluster.openshift.com'

    Run 'oc status' to view your app.

 

But we just used our only GPU resource on our GPU node, so the new deployment of rapids (rapids2) is not schedulable due to insufficient GPU resources.

 

$ oc get pods -n rapids

NAME      READY STATUS    RESTARTS AGE

rapids    1/1 Running   0 30m

rapids2   0/1 Pending   0 2m44s

 

If we look at the event state of the rapids2 pod:

 

$ oc describe pod/rapids -n rapids

...

Events:

  Type     Reason         Age From             Message

  ----     ------         ---- ----             -------

  Warning  FailedScheduling  <unknown> default-scheduler   0/9 nodes are available: 9 Insufficient nvidia.com/gpu.

  Normal   TriggeredScaleUp  44s cluster-autoscaler  pod triggered scale-up: [{openshift-machine-api/sj-022820-01-h4vrj-worker-us-east-1c 1->2 (max: 6)}]

 

We just need to wait for the ClusterAutoscaler and MachineAutoscaler to do their job and scale up the MachineSet as we see above. Once the new node is created:

 

$ oc get no 

NAME                           STATUS ROLES AGE VERSION

(old nodes)

...

ip-10-0-167-0.ec2.internal     Ready worker 72s v1.16.2

 

The new RAPIDs instance will deploy to the new node once it becomes Ready with no user intervention.

To summarize, the new NVIDIA GPU operator simplifies the  use of GPU resources in OpenShift clusters. In this blog we’ve demonstrated the use-case for multi-user RAPIDs development using NVIDIA GPUs. Additionally we’ve used OpenShift Container Storage and the ClusterAutoscaler to automatically scale up our special resource nodes as they are being requested by applications.

As you observed, NVIDIA GPU Operator is already relatively easy to deploy using Helm and work is ongoing to support t deployments right from OperatorHub, simplifying this process even further. 

For more information on NVIDIA GPU Operator and OpenShift, please see the official Nvidia documentation.

1 – Helm 3 is in Tech Preview in OpenShift 4.3, and will GA in OpenShift 4.4

The post Simplifying deployments of accelerated AI workloads on Red Hat OpenShift with NVIDIA GPU Operator appeared first on Red Hat OpenShift Blog.

OpenShift Commons Briefing: JupyterHub on-demand (and other tools) with Red Hat’s Guillaume Moutier and Landon LaSmith


Welcome to the first briefing of the “All Things Data” series of OpenShift Commons briefings. We’ll be holding future briefings on Tuesdays at 8:00am PST, so reach out with any topics you’re interested in and remember to bookmark the OpenShift Commons Briefing calendar!

In this first briefing for the “All Things Data” OpenShift Commons series, Red Hat’s Guillaume Moutier and Landon LaSmith demo’d how to easily integrate Open Data Hub and OpenShift Container Storage to build your own data science platform. When working on data science projects, it’s a guarantee that you will need different kinds of storage for your data: block, file, object.

Open Data Hub (ODH) is an open source project that provides open source AI tools for running large and distributed AI workloads on OpenShift Container Platform.

OpenShift Container Storage (OCS) is software-defined storage for containers that provides you with every type of storage you need, from a simple, single source.

Briefing Slides: ODH on OCS

Additional Resources:

Culture of innovation: Open Data Hub

Open Data Hub Community Project Website: opendatahub.io

OpenShift AI/ML Resources: openshift.com/ai-ml

Product Documentation for Product Documentation for Red Hat OpenShift Container Storage 4.2

Feedback:

To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.

If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.

 

The post OpenShift Commons Briefing: JupyterHub on-demand (and other tools) with Red Hat’s Guillaume Moutier and Landon LaSmith appeared first on Red Hat OpenShift Blog.

Red Hat OpenShift Installation Process Experiences on IBM Z/LinuxONE

Image may be NSFW.
Clik here to view.

OpenShift stands out as a leader with a security-focused, supported Kubernetes platform—including a foundation based on Red Hat Enterprise Linux.
But we already knew all that, the game changer for OpenShift, is the release of OCP version 4.x: OpenShift 4 is powered by Kubernetes Operators and Red Hat’s commitment to full-stack security, so you can develop and scale big ideas for the enterprise.

OpenShift started with distributed systems. It was eventually extended to IBM Power Systems, and now it is available on IBM Z. This creates a seamless user experience across major architectures such as x86, PPC and s390x!

This article’s goal is to share my experience on how to install the OpenShift Container Platform (OCP) 4.2.19 on IBM Z. We will use the minimum requirements to get our environment up and running. That said, for production or performance testing use the recommended hardware configuration from the official Red Hat documentation. The minimum machine requirements for a cluster with user-provisioned infrastructure are as follows:

The smallest OpenShift Container Platform clusters require the following hosts:
* One temporary bootstrap machine.
* Three control plane, or master, machines.
* At least two compute, or worker, machines.

The bootstrap, control plane (often called masters), and compute machines must use Red Hat Enterprise Linux CoreOS (RHCOS) as the operating system.
All the RHCOS machines require network in initramfs during boot to fetch Ignition config files from the Machine Config Server. The machines are configured with static IP addresses. No DHCP server is required.

To install on IBM Z under z/VM, we require a single z/VM virtual NIC in layer 2 mode. You also need:
* A direct-attached OSA
* A z/VM VSwitch set up.

Minimum Resource Requirements

Each cluster machine must meet the following minimum requirements, in our case, these are the resource requirements for the VMs on IBM z/VM:
Image may be NSFW.
Clik here to view.

For our testing purposes (and resources limitations) we used DASD model 54 for each node instead of the 120GB recommended by the official Red Hat documentation.
Make sure to install OpenShift Container Platform version 4.2 using one of the following IBM hardware:

  • IBM Z, versions 13, 14, or 15.
  • LinuxONE, any version.

Hardware Requirements

  • 1 LPAR with 3 IFLs that supports SMT2.
  • 1 OSA or RoCE network adapter.

Operating System Requirements

  • One instance of z/VM 7.1.

This is the environment that we created to install the Openshift Container Platform following the minimum resource requirements. Keep in mind that other services will be required in this environment, and you can have them either on Z or provided to the Z box from outside; DNS (name resolution), HAProxy ( our load balancer), Workstation (our client system where we would run the CLI commands for OCP), HTTPd (serving the files such as the Red Hat CoreOS image as well as the ignition files that will be generated by later sections of this guide):
Image may be NSFW.
Clik here to view.

Network Topology Requirements

Before you install OpenShift Container Platform, you must provision two layer-4 load balancers. The API requires one load balancer and the default Ingress Controller needs the second load balancer to provide ingress to applications. In our case, we used a single instance of HAProxy running on a Red Hat Enterprise Linux 8 VM as our load balancer.

The following haproxy configuration will help us provide the load balancer layer for our purposes, edit the /etc/haproxy/haproxy.cfg and add:


listen ingress-http bind *:80 mode tcp server worker0 :80 check server worker1 :80 check listen ingress-https bind *:443 mode tcp server worker0 :443 check server worker1 :443 check listen api bind *:6443 mode tcp server bootstrap :6443 check server master0 :6443 check server master1 :6443 check server master2 :6443 check listen api-int bind *:22623 mode tcp server bootstrap :22623 check server master0 :22623 check server master1 :22623 check server master2 :22623 check

Don’t forget to open the respective ports on the system’s firewall as well as set the SELinux boolean as follows:

# firewall-cmd --add-port=443/tcp
# firewall-cmd --add-port=443/tcp --permanent

# firewall-cmd --add-port=80/tcp
# firewall-cmd --add-port=80/tcp --permanent

# firewall-cmd --add-port=6443/tcp
# firewall-cmd --add-port=6443/tcp --permanent

# firewall-cmd --add-port=22623/tcp
# firewall-cmd --add-port=33623/tcp --permanent

# setsebool -P haproxy_connection_any 1

The following DNS records are required for an OpenShift Container Platform cluster that uses user-provisioned infrastructure. In each record, is the cluster name and is the cluster base domain that you specify in the install-config.yaml file.
Required DNS Records:

api..

This DNS record must point to the load balancer for the control plane machines. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.

api-int..

This DNS record must point to the load balancer for the control plane machines. This record must be resolvable from all the nodes within the cluster.
The API server must be able to resolve the worker nodes by the host names that are recorded in Kubernetes. If it cannot resolve the node names, proxied API calls can fail, and you cannot retrieve logs from Pods.

*.apps..

A wildcard DNS record that points to the load balancer that targets the machines that run the Ingress router pods, which are the worker nodes by default. This record must be resolvable by both clients external to the cluster and from all the nodes within the cluster.

etcd-..

OpenShift Container Platform requires DNS records for each etcd instance to point to the control plane machines that host the instances. The etcd instances are differentiated by values, which start with 0 and end with n-1, where n is the number of control plane machines in the cluster. The DNS record must resolve to an unicast IPv4 address for the control plane machine, and the records must be resolvable from all the nodes in the cluster.

_etcd-server-ssl._tcp..

For each control plane machine, OpenShift Container Platform also requires a SRV DNS record for etcd server on that machine with priority 0, weight 10 and port 2380. A cluster that uses three control plane machines requires the following records:

# _service._proto.name. TTL class SRV priority weight port targ 

Transfer the initramfs, kernel, parameter file, and RHCOS images to z/VM, for example with FTP.

Punch the files to the virtual reader of the z/VM guest virtual machine that is to become your bootstrap node.

Log in to CMS on the bootstrap machine.

IPL the bootstrap machine from the reader.

Once the installation of the Red Hat CoreOS finishes, make sure to re-IPL this VM so it will load the Linux OS from it’s internal DASD.

Repeat this procedure for the other machines in the cluster, which means applying the same steps for creating the Red Hat Enterprise Linux CoreOS with the respective changes to

`master0`, `master1`, `master2`, `compute0` and `compute1`.et.
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-0...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-1...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-2...

As a summary, this is how our DNS records defined in our domain zone would look like when using Bind as my DNS server :

$TTL 86400
@ IN SOA .. admin.. (
2020021813 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)

;Name Server Information
@ IN NS ..

;IP Address for Name Server
IN A

;A Record for the following Host name

haproxy IN A
bootstrap IN A
master0 IN A
master1 IN A
master2 IN A
workstation IN A

compute0 IN A
compute1 IN A

etcd-0. IN A
etcd-1. IN A
etcd-2. IN A

;CNAME Record

api. IN CNAME haproxy..
api-int. IN CNAME haproxy..
*.apps. IN CNAME haproxy..

_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-0...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-1...
_etcd-server-ssl._tcp... 86400 IN SRV 0 10 2380 etcd-2...

Don’t forget to create the reserve records for your zone as well, example of how we setup ours:

$TTL 86400
@ IN SOA .. admin.. (
2020021813 ;Serial
3600 ;Refresh
1800 ;Retry
604800 ;Expire
86400 ;Minimum TTL
)
;Name Server Information
@ IN NS ..
IN A

;Reverse lookup for Name Server
IN PTR ..

;PTR Record IP address to Hostname
IN PTR haproxy..
IN PTR bootstrap..
IN PTR master0..
IN PTR master1..
IN PTR master2..
IN PTR master3..
IN PTR compute0..
IN PTR compute1..
IN PTR workstation..

Where for each record will be the last octet of their IP addresses.

Make sure that your Bind9 DNS server also provides access to the outside world, a.k.a Internet access by using the parameter in your /etc/named.conf options configuration section:

options {
// listen-on port 53 { 127.0.0.1; };
// listen-on-v6 port 53 { ::1; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
secroots-file "/var/named/data/named.secroots";
recursing-file "/var/named/data/named.recursing";
allow-query { localhost; ; };
forwarders { ; };

For the sections Generating an SSH private key and Installing the CLI as well as Manually Creating the installation configuration files, we used the Workstation VM using RHEL8.

Generating an SSH private key and adding it to the agent
In our case, we used a Linux workstation as the base system outside of the OCP cluster. The next steps were done in this system.
If you want to perform installation debugging or disaster recovery on your cluster, you must provide an SSH key to both your ssh-agent and to the installation program.

If you do not have an SSH key that is configured for password-less authentication on your computer, create one. For example, on a computer that uses a Linux operating system, run the following command:

$ ssh-keygen -t rsa -b 4096 -N '' \
-f /

Then access the Infrastructure Provider page on the Red Hat OpenShift Cluster Manager site. If you have a Red Hat account, log in with your credentials. If you do not, create an account.

Navigate to the page for your installation type, download the installation program for your operating system, and place the file in the directory where you will store the installation configuration files:
https://…/openshift-v4/s390x/clients/ocp/latest/openshift-install-linux-4.2.18.tar.gz

Extract the installation program. For example, on a computer that uses a Linux operating system, run the following command:

$ tar xvf .tar.gz

From the Pull Secret page on the Red Hat OpenShift Cluster Manager site, download your installation pull secret as a .txt file. This pull secret allows you to authenticate with the services that are provided by the included authorities, including Quay.io, which serves the container images for OpenShift Container Platform components.

Installing the CLI

You can install the CLI in order to interact with OpenShift Container Platform using a command-line interface.
From the Infrastructure Provider page on the Red Hat OpenShift Cluster Manager site, navigate to the page for your installation type and click Download Command-line Tools.

Click the folder for your operating system and architecture and click the compressed file.
– Save the file to your file system.
https://…/openshift-v4/s390x/clients/ocp/latest/openshift-client-linux-4.2.18.tar.gz
– Extract the compressed file.
– Place it in a directory that is on your PATH.

After you install the CLI, it is available using the oc command:

$ oc
<command></command>
<command></command>

Manually creating the installation configuration file

For installations of OpenShift Container Platform that use user-provisioned infrastructure, you must manually generate your installation configuration file.

Create an installation directory to store your required installation assets in:

$ mkdir

Customize the following install-config.yaml file template and save it in the “.
Sample install-config.yaml file for bare metal

You can customize the install-config.yaml file to specify more details about your OpenShift Container Platform cluster’s platform or modify the values of the required parameters. For IBM Z, please make sure to add architecture: s390x for both compute and controlPlane nodes or the config-cluster.yaml file will be generated with AMD64.

apiVersion: v1
baseDomain:
compute:
- architecture: s390x
hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
architecture: s390x
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name:
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: ''
sshKey: ''

Creating the Kubernetes manifest and Ignition config files

Because you must modify some cluster definition files and manually start the cluster machines, you must generate the Kubernetes manifest and Ignition config files that the cluster needs to make its machines.
Generate the Kubernetes manifests for the cluster:

$ ./openshift-install create manifests --dir=

WARNING There are no compute nodes specified. The cluster will not fully initialize without compute nodes.
INFO Consuming “Install Config” from target directory

Modify the //manifests/cluster-scheduler-02-config.yml Kubernetes manifest file to prevent Pods from being
scheduled on the control plane machines:
1. Open the manifests/cluster-scheduler-02-config.yml file.
2. Locate the mastersSchedulable parameter and set its value to False.
3. Save and exit the file.

Create the Ignition config files:

$ ./openshift-install create ignition-configs --dir=

The following files are generated in the directory:
.
├── auth
│ ├── kubeadmin-password
│ └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign

Copy the files master.ign, worker.ign and bootstrap.ign to the HTTPD node where you should have configured a http server (Apache) to serve these files during the creation of the Red Hat Linux CoreOS VMs.

Creating Red Hat Enterprise Linux CoreOS (RHCOS) machines

Download the Red Hat Enterprise Linux CoreOS installation files from the RHCOS image mirror
Download the following files:
* The initramfs: rhcos--installer-initramfs.img
* The kernel: rhcos--installer-kernel
* The operating system image for the disk on which you want to install RHCOS. This type can differ by virtual machine:
* rhcos--s390x-metal-dasd.raw.gz for DASD (We used the DASD version)

Create parameter files. The following parameters are specific for a particular virtual machine:
* For coreos.inst.install_dev=, specify dasda for a DASD installation.
* For rd.dasd=, specifies the DASD where RHCOS is to be installed.

The bootstrap machine ignition file is called bootstrap-0, the master ignition files are numbered 0 through 2, the worker ignition files from 0 upwards. All other parameters can stay as they are.
Example parameter file we used on our environment, bootstrap-0.parm, for the bootstrap machine:

rd.neednet=1 coreos.inst=yes
coreos.inst.install_dev=
coreos.inst.image_url=http:///rhcos-4.2.18.raw.gz
coreos.inst.ignition_url=http:///bootstrap.ign
vlan=eth0.<1110>:
ip=:::::eth0.<1110>:off
nameserver=
rd.znet=qeth,<0.0.1f00>,<0.0.1f01>,<0.0.1f02>,layer2=1,portno=0
cio_ignore=all,!condev
rd.dasd=<0.0.0202>

Where = physical interface, = virtual interface alias for enc1e00 and &lt;1100&gt; = vlan ID
Note that for your environment the rd.znet=, rd.dasd=, coreos.inst.install_dev=, will all be different for you.

Each VM on z/VM will require access to the initramfs, kernel, and parameter (.parm) files on their internal disk. We used a common approach which is create a VM that will use it’s internal disk as a repository for all these files, and all the other VMs part of the cluster (bootstrap, master0, master1, …. worker1) will have access to this repository VMs disk (often in read-only mode) saving disk space as these files will only be used in the first stage of the process to load the files for each VM part of the cluster into the server’s memory. Each cluster VM will have a dedicated disk for the RHCOS, which is a completely separate disk (as previously covered, the model 54 ones).

Transfer the initramfs, kernel and all parameter (.parm) files to the repository VM’s local A disk on z/VM from an external FTP server:

==> ftp <VM_REPOSITORY_IP>

VM TCP/IP FTP Level 710

Connecting to <VM_REPOSITORY_IP>, port 21
220 (vsFTPd 3.0.2)
USER (identify yourself to the host):
>>>USER <username>
331 Please specify the password.
Password:
>>>PASS ********
230 Login successful.
Command:
cd <repositoryofimages>
ascii
get <parmfile_bootstrap>.parm
get <parmfile_master>.parm
get <parmfile_worker>.parm
locsite fix 80
binary
get <kernel_image>.img
get <initramfs_file>

Example of the VM definition (userid=LNXDB030) for the bootstrap VM on IBM z/VM for this installation:

USER LNXDB030 LBYONLY 16G 32G                     
   INCLUDE DFLT                                                     
   COMMAND DEFINE STORAGE 16G STANDBY 16G       
   COMMAND DEFINE VFB-512 AS 0101 BLK 524288     
   COMMAND DEFINE VFB-512 AS 0102 BLK 524288        
   COMMAND DEFINE VFB-512 AS 0103 BLK 524288        
   COMMAND DEFINE NIC 1E00 TYPE QDIO             
   COMMAND COUPLE 1E00 SYSTEM VSWITCHG              
   CPU 00 BASE                                  
   CPU 01                                       
   CPU 02                                       
   CPU 03                                                           
   MACHINE ESA 8                                
   OPTION APPLMON CHPIDV ONE            
   POSIXINFO UID 100533                                                 
   MDISK 0191 3390 436 50 USAW01                                        
   MDISK 0201 3390 1 END LXDBC0

Where USER LNXDB030 LBYONLY 16G 32G is userid password memory definition, COMMAND DEFINE VFB-512 AS 0101 BLK 524288 is Swap definition, COMMAND DEFINE NIC 1E00 TYPE QDIO is NIC definition, COMMAND COUPLE 1E00 SYSTEM VSWITCHG is vswitch couple, MDISK 0191 3390 436 50 USAW01 is where you put the EXEC to run, MDISK 0201 3390 1 END LXDBC0 is the mdisk mod54 for the RHCOS.

Punch the files to the virtual reader of the z/VM guest virtual machine that is to become your bootstrap node.

Log in to CMS on the bootstrap machine.

IPL CMS

Create the exec file to punch the other files (kernel, parm file, initramfs) to start the linux installation on each linux servers part of Openshift cluster using the mdisk 191, this example shows the bootstrap exec file:

/* EXAMPLE EXEC FOR OC LINUX INSTALLATION  */ 

TRACE O
'CP SP CON START CL A *'
'EXEC VMLINK MNT3 191 <1191 Z>'
'CL RDR'
'CP PUR RDR ALL' 
'CP SP PU * RDR CLOSE'
'PUN KERNEL IMG Z (NOH'
'PUN BOOTSTRAP PARM Z (NOH'
'PUN INITRAMFS IMG Z (NOH'
'CH RDR ALL KEEP NOHOLD'                        
'CP IPL 00C'

The line EXEC VMLINK MNT3 191 <1191 Z> shows that the disk from the repository VM will be linked to this VM’s EXEC process, making the files we already transferred to the the repository VM’s local disk available to the VM where this EXEC file will be run, for example the bootstrap VM.

Call the EXEC file to start the bootstrap installation process

<BOOTSTRAP> EXEC

Once the installation of the Red Hat CoreOS finishes, make sure to re-IPL this VM so it will load the Linux OS from it’s internal DASD:

#CP IPL 201

The you will see the RHCOS loading from it’s internal mode 54 dasd disk:

Red Hat Enterprise Linux CoreOS 42s390x.81.20200131.0 (Ootpa) 4.2"  
SSH host key: <SHA256key>"
SSH host key: : <SHA256key>"
SSH host key: <SHA256key>"
eth0.1100: ,<ipaddress> fe80::3ff:fe00:9a"
bootstrap login:

Repeat this procedure for the other machines in the cluster, which means applying the same steps for creating the Red Hat Enterprise Linux CoreOS with the respective changes to master0, master1, master2, compute0 and compute1.

Make sure to include IPL 201 into the VMs definition so whenever the VM goes it will automatically IPL the disk 201 disk (RHCOS), example:

USER LNXDB030 LBYONLY 16G 32G                     
   INCLUDE DFLT                                                     
   COMMAND DEFINE STORAGE 16G STANDBY 16G       
   COMMAND DEFINE VFB-512 AS 0101 BLK 524288     
   COMMAND DEFINE VFB-512 AS 0102 BLK 524288        
   COMMAND DEFINE VFB-512 AS 0103 BLK 524288        
   COMMAND DEFINE NIC 1E00 TYPE QDIO             
   COMMAND COUPLE 1E00 SYSTEM VSWITCHG              
   CPU 00 BASE                                  
   CPU 01                                       
   CPU 02                                       
   CPU 03
   IPL 201                                                          
   MACHINE ESA 8                                
   OPTION APPLMON CHPIDV ONE            
   POSIXINFO UID 100533                                                 
   MDISK 0191 3390 436 50 USAW01                                        
   MDISK 0201 3390 1 END LXDBC0

Creating the cluster

To create the OpenShift Container Platform cluster, you wait for the bootstrap process to complete on the machines that you provisioned by using the Ignition config files that you generated with the installation program.
Monitor the bootstrap process:

$ ./openshift-install --dir= wait-for bootstrap-complete --log-level=debug

After bootstrap process is complete, remove the bootstrap machine from the load balancer.

Logging in to the cluster

You can log in to your cluster as a default system user by exporting the cluster kubeconfig file. The kubeconfig file contains information about the cluster that is used by the CLI to connect a client to the correct cluster and API server. The file is specific to a cluster and is created during
OpenShift Container Platform installation.

Export the kubeadmin credentials:

$ export KUBECONFIG=/auth/kubeconfig

Verify you can run oc commands successfully using the exported configuration:

$ oc whoami
system:admin

Review the pending certificate signing requests (CSRs) and ensure that the you see a client and server request with Pending or Approved status for each machine that you added to the cluster:

$ oc get csr

NAME AGE REQUESTOR CONDITION
csr-2qwv8 106m system:node:worker1. Approved,Issued
csr-2sjrr 61m system:node:worker1. Approved,Issued
csr-5s2rd 30m system:node:worker1. Approved,Issued
csr-9v5wz 15m system:node:worker1. Approved,Issued
csr-cffn6 127m system:servi…:node-bootstrapper Approved,Issued
csr-lmlsj 46m system:node:worker1. Approved,Issued
csr-qhwd8 76m system:node:worker1. Approved,Issued
csr-zz2z7 91m system:node:worker1. Approved,Issued

Check if all the nodes are Ready and healthy:

$ oc get nodes

NAME STATUS ROLES AGE VERSION
master0. Ready master 3d3h v1.14.6+c383847f6
master1. Ready master 3d3h v1.14.6+c383847f6
master2. Ready master 3d3h v1.14.6+c383847f6
worker0. Ready worker 3d3h v1.14.6+c383847f6
worker1. Ready worker 3d3h v1.14.6+c383847f6

Initial Operator configuration

After the control plane initializes, you must immediately configure some Operators so that they all become available.
Watch the cluster components come online (wait until all are True in the AVAILABLE column :

$ watch -n5 oc get clusteroperators

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.2.0 True False False 69s
cloud-credential 4.2.0 True False False 12m
cluster-autoscaler 4.2.0 True False False 11m
console 4.2.0 True False False 46s
dns 4.2.0 True False False 11m
image-registry 4.2.0 False True False 5m26s
ingress 4.2.0 True False False 5m36s
kube-apiserver 4.2.0 True False False 8m53s
kube-controller-manag 4.2.0 True False False 7m24s
kube-scheduler 4.2.0 True False False 12m
machine-api 4.2.0 True False False 12m
machine-config 4.2.0 True False False 7m36s
marketplace 4.2.0 True False False 7m54m
monitoring 4.2.0 True False False 7h54s
network 4.2.0 True False False 5m9s
node-tuning 4.2.0 True False False 11m
openshift-apiserver 4.2.0 True False False 11m
openshift-controller- 4.2.0 True False False 5m43s
openshift-samples 4.2.0 True False False 3m55s
operator-lifecycle-man 4.2.0 True False False 11m
operator-lifecycle-ma 4.2.0 True False False 11m
service-ca 4.2.0 True False False 11m
service-catalog-apiser 4.2.0 True False False 5m26s
service-catalog-contro 4.2.0 True False False 5m25s
storage 4.2.0 True False False 5m30s

You will notice that the image-registry operator shows False, to fix this follow these steps:

$ oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

Reference: https://docs.openshift.com/container-platform/4.2/installing/installing_ibm_z/installing-ibm-z.html#installation-registry-storage-config_installing-ibm-z

Once the the file gets patched it will automatically make sure that the image-registry container follow that state.
This is how the command $ oc get co (abbreviation of clusteroperators) should look like

$ oc get clusteroperators

NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.2.0 True False False 69s
cloud-credential 4.2.0 True False False 12m
cluster-autoscaler 4.2.0 True False False 11m
console 4.2.0 True False False 46s
dns 4.2.0 True False False 11m
image-registry 4.2.0 True False False 1ms
ingress 4.2.0 True False False 5m36s
kube-apiserver 4.2.0 True False False 8m53s
kube-controller-manag 4.2.0 True False False 7m24s
kube-scheduler 4.2.0 True False False 12m
machine-api 4.2.0 True False False 12m
machine-config 4.2.0 True False False 7m36s
marketplace 4.2.0 True False False 7m54m
monitoring 4.2.0 True False False 7h54s
network 4.2.0 True False False 5m9s
node-tuning 4.2.0 True False False 11m
openshift-apiserver 4.2.0 True False False 11m
openshift-controller- 4.2.0 True False False 5m43s
openshift-samples 4.2.0 True False False 3m55s
operator-lifecycle-man 4.2.0 True False False 11m
operator-lifecycle-ma 4.2.0 True False False 11m
service-ca 4.2.0 True False False 11m
service-catalog-apiser 4.2.0 True False False 5m26s
service-catalog-contro 4.2.0 True False False 5m25s
storage 4.2.0 True False False 5m30s

Monitor for cluster completion:

$ ./openshift-install --dir= wait-for install-complete
INFO Waiting up to 30m0s for the cluster to initialize…

The command succeeds when the Cluster Version Operator finishes deploying the OpenShift Container Platform cluster from Kubernetes API server.

INFO Waiting up to 30m0s for the cluster at https://api..:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root//auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps..
INFO Login to the console with user: kubeadmin, password: 3cXGD-Mb9CC-hgAN8-7S9YG

Login using a web browser: http://console-openshift-console.apps..

This article only covers the installation process, for day 2 operations, keep in mind that no storage was configured for the persistent storage workloads, I will cover that process in my next article. As for now, Red Hat Openshift 4 is ready to be explored, the following video helps familiarize with the graphical user interface from the developer perspective:

Youtube Developer video: https://www.youtube.com/watch?v=opdrYhIjqrg&feature=youtu.be

References:

Official Red Hat OpenShift Documentation:
https://docs.openshift.com/container-platform/4.2/installing/installing_ibm_z/installing-ibm-z.html

Key people that collaborated with this article:
Alexandre de Oliveira, Edi Lopes Alves, Alex Souza, Adam Young, Apostolos Dedes (Toly) and Russ Popeil

Filipe Miranda is a Senior Solutions Architect at Red Hat. The views expressed in this article are his alone, and he is responsible for the information provided in the article.

The post Red Hat OpenShift Installation Process Experiences on IBM Z/LinuxONE appeared first on Red Hat OpenShift Blog.

OpenShift Commons Briefing: Workload Consistency During Ceph Updates and Adding New Storage Devices with Red Hat’s Sagy Volkov


This is the second briefing of the “All Things Data” series of OpenShift Commons briefings. Future briefings are Tuesdays at 8:00am PST, so reach out with any topics you’re interested in and remember to bookmark the OpenShift Commons Briefing calendar!

In this second briefing for the “All Things Data” OpenShift Commons series, Red Hat’s Sagy Volkov gave a live demonstration of an OpenShift workload remaining online and running while Ceph storage updates and additions were being performed. This workload resilience and consistency during storage updates and additions is crucial to maintaining highly available applications in your OpenShift clusters.

Additional Resources:

OpenShift Container Storage: openshift.com/storage

Product Documentation for Product Documentation for Red Hat OpenShift Container Storage 4.2

Feedback:

To find out more about OpenShift Container Storage or to take a test drive, visit https://www.openshift.com/products/container-storage/.

If you would like to learn more about what the OpenShift Container Storage team is up to or provide feedback on any of the new 4.2 features, take this brief 3-minute survey.

 

The post OpenShift Commons Briefing: Workload Consistency During Ceph Updates and Adding New Storage Devices with Red Hat’s Sagy Volkov appeared first on Red Hat OpenShift Blog.


Red Hat OpenShift 4 and Red Hat Virtualization: Together at Last

OpenShift 4 was launched not quite a year ago at Red Hat Summit 2019.  One of the more significant announcements was the ability for the installer to deploy an OpenShift cluster using full-stack automation.  This means that the administrator only needs to provide credentials to a supported Infrastructure-as-a-Service, such as AWS, and the installer would provision all of the resources needed, e.g. virtual machines, storage, networks, and integrating them all together as well.

Over time, the full-stack automation experience has expanded to include Azure, Google Compute Platform, and Red Hat Openstack, allowing customers to deploy OpenShift clusters across different clouds and even on-premises with the same fully automated experience.

For organizations who need enterprise virtualization, but not the API-enabled, quota enforced consumption of infrastructure provided by Red Hat OpenStack, Red Hat Virtualization (RHV) provides a robust and trusted platform to consolidate workloads and provide the resiliency, availability, and manageability of a traditional hypervisor.

When using RHV, OpenShift’s “bare metal” installation experience, where there existed no testing or integration between OpenShift and the underlying infrastructure, has been the solution so far.  But, the wait is over! OpenShift 4.4 nightly releases now offer the full-stack automation experience for RHV!

Image may be NSFW.
Clik here to view.

Getting started with OpenShift on RHV

As you would expect from the full-stack automation installation experience, getting started is straightforward with just a few prerequisites below.  You can also use the quick start guide for more thorough and details instructions.

  1. You need a RHV deployment with RHV Manager.  It doesn’t matter if you’re using a self-hosted Manager or standalone, just be sure you’re using RHV version 4.3.7.2 or later.
  2. Until OpenShift 4.4 is generally available, you will need to download and use the nightly release of the OpenShift installer, available from https://cloud.redhat.com.
  3. Network requirements:
    1. DHCP is required for full-stack automated installs to assign IPs to nodes as they are created.
    2. Identify three (3) IP addresses you can statically allocate to the cluster and create two (2) DNS entries, as below.  These are used for communicating with the cluster as well as internal DNS and API access.
      1. An IP address for the internal-only OpenShift API endpoint
      2. An IP address for the internal OpenShift DNS, with an external DNS record of api.clustername.basedomain for this address
      3. An IP address for the ingress load balancer, with an external DNS record of *.apps.clustername.basedomain for this address.
  4. Create an ovirt-config.yaml file for the credentials you want to use, this file has just four lines:
    ovirt_url: https://rhv-m.host.name/ovirt-engine/api
    ovirt_username: user@domain.tld
    ovirt_password: password
    ovirt_insecure: True
    

    For now, the last value, “ovirt_insecure”, should be “True”.  As documented in this BZ, even if the RHV-M certificate is trusted by the client where openshift-install is executing from, that doesn’t mean that the pods deployed to OpenShift trust the certificate.  We are working on a solution to this, so please keep an eye on the BZ for when it’s been addressed!  Remember, this is tech preview :D

With the prerequisites out of the way, let’s move on to deploying OpenShift to Red Hat Virtualization!

Magic (but really automation)!

Starting the install process, as with all OpenShift 4 deployments, uses the openshift-install binary.  Once we answer the questions, the process is wholly automated and we don’t have to do anything but wait for it to complete!

# log level debug isn’t necessary, but gives detailed insight to what’s
# happening
# the “dir” parameter tells the installer to use the provided directory
# to store any artifacts related to the installation
[notroot@jumphost ~] openshift-install create cluster --log-level=debug --dir=orv
? SSH Public Key /home/notroot/.ssh/id_rsa.pub
? Platform ovirt
? Select the oVirt cluster Cluster2
? Select the oVirt storage domain nvme
? Select the oVirt network VLAN101
? Enter the internal API Virtual IP 10.0.101.219
? Enter the internal DNS Virtual IP 10.0.101.220
? Enter the ingress IP  10.0.101.221
? Base Domain lab.lan
? Cluster Name orv
? Pull Secret [? for help] **********************

snip snip snip

INFO Waiting up to 30m0s for the cluster at https://api.orv.lab.lan:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/notroot/orv/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.orv.lab.lan
INFO Login to the console with user: kubeadmin, password: passw-wordp-asswo-rdpas

The result, after a few minutes of waiting, is a fully functioning OpenShift cluster, ready for the final configuration to be applied, like deploying logging and monitoring, and configuring a persistent storage provider.

From a RHV perspective, the installer has created a template virtual machine, which was used to deploy all of the member nodes, regardless of role, for the OpenShift cluster.  As you saw at the end of the video, not only does the installer use this template, but the Machine API integration also makes use of it when creating new VMs when scaling the nodes.  Scaling nodes manually is as easy as one command line (oc scale --replicas=# machineset)!

Image may be NSFW.
Clik here to view.

Deploying OpenShift

To get started testing and trying OpenShift full-stack automated deployments to your RHV clusters, the installer can be found from the Red Hat OpenShift Cluster Manager.  For now, deploying the full-stack automation experience on RHV is in developer preview, so please send us any feedback and questions you have via BugZilla.  The quickest way to reach us is using “OpenShift Container Platform” as the product, with “Installer” as the component and “OpenShift on RHV” for sub-component.

The post Red Hat OpenShift 4 and Red Hat Virtualization: Together at Last appeared first on Red Hat OpenShift Blog.

Accessing CodeReady Containers on a Remote Server

While installing an OpenShift cluster on a cloud isn’t difficult, the old school developer in me wants as much of my environment as possible to be in my house and fully under my control. I have some spare hardware in my basement that I wanted to use as an OpenShift 4 installation, but not enough to warrant a full blown cluster.

CodeReady Containers, or CRC for short, is perfect for that. Rather than try to rephrase what it is, I’ll just copy it directly from their site:

“CodeReady Containers brings a minimal, preconfigured OpenShift 4.1 or newer cluster to your local laptop or desktop computer for development and testing purposes. CodeReady Containers is delivered as a Red Hat Enterprise Linux virtual machine that supports native hypervisors for Linux, macOS, and Windows 10.”

The hiccup is that while my server is in my basement, I don’t want to have to physically sit at the machine to use it. Since CRC is deployed as a virtual machine, I needed a way to get to that VM from any other machine on my home network. This blog talks about how to configure HAProxy on the host machine to allow access to CRC from elsewhere on the network.

I ran the following steps on a CentoOS 8 installation, but they should work on any of the supported Linux distributions. You’ll also need some form of DNS resolution between your client machines and the DNS entries that CRC expects. In my case, I use a Pi-hole installation running on a Raspberry Pi (which effectively uses dnsmasq as described later in this post).

It’ll become obvious very quickly when you read this, but you’ll need sudo access on the CRC host machine.

Running CRC

The latest version of CRC can be downloaded from Red Hat’s site. You’ll need to download two things:

  • The crc binary itself, which is responsible for the management of the CRC virtual machine
  • Your pull secret, which is used during creation; save this in a file somewhere on the host machine

This blog isn’t going to go into the details of setting up CRC. Detailed information can be found in the Getting Started Guide in the CRC documentation.

That said, if you’re looking for a TL;DR version of that guide, it boils down to:

crc setup
crc start -p 

Make sure CRC is running on the destination machine before continuing, since we’ll need the IP address that the VM is running on.

Configuring the Host Machine

We’ll use firewalld and HAProxy to route the host’s inbound traffic to the CRC instance. Before we can configure that, we’ll need to install a few dependencies:

sudo dnf -y install haproxy policycoreutils-python-utils

Configuring the Firewall

The CRC host machine needs to allow inbound connections on a variety of ports used by OpenShift. The following commands configure the firewall to open up those ports:

sudo systemctl start firewalld
sudo firewall-cmd --add-port=80/tcp --permanent
sudo firewall-cmd --add-port=6443/tcp --permanent
sudo firewall-cmd --add-port=443/tcp --permanent
sudo systemctl restart firewalld
sudo semanage port -a -t http_port_t -p tcp 6443

Configuring HA Proxy

Once the firewall is configured to allow traffic into the server, HAProxy is used to forward it to the CRC instance. Before we can configure that, we’ll need to know the IP of the server itself, as well as the IP of the CRC virtual machine:

export SERVER_IP=$(hostname --ip-address)
export CRC_IP=$(crc ip)

Note: If your server is running DHCP, you’ll want to take steps to ensure its IP doesn’t change, either by changing it to run on a static IP or by configuring DHCP reservations. Instructions for how to do that are outside the scope of this blog, but chances are if you’re awesome enough to want to set up a remote CRC instance, you know how to do this already.

We’re going to replace the default haproxy.cfg file, so to be safe, create a backup copy:

cd /etc/haproxy
sudo cp haproxy.cfg haproxy.cfg.orig

Replace the contents of the haproxy.cfg file with the following:

global
debug

defaults
log global
mode http
timeout connect 0
timeout client 0
timeout server 0

frontend apps
bind SERVER_IP:80
bind SERVER_IP:443
option tcplog
mode tcp
default_backend apps

backend apps
mode tcp
balance roundrobin
option ssl-hello-chk
server webserver1 CRC_IP check

frontend api
bind SERVER_IP:6443
option tcplog
mode tcp
default_backend api

backend api
mode tcp
balance roundrobin
option ssl-hello-chk
server webserver1 CRC_IP:6443 check

Note: Generally speaking, setting the timeouts to 0 is a bad idea. In this context, we set those to keep websockets from timing out. Since you are (or rather, “should”) be running CRC in a development environment, this shouldn’t be quite as big of a problem.

You can either manually change the instances of SERVER_IP and CRC_IP as appropriate, or run the following commands to automatically perform the replacements:

sudo sed -i "s/SERVER_IP/$SERVER_IP/g" haproxy.cfg
sudo sed -i "s/CRC_IP/$CRC_IP/g" haproxy.cfg

Once that’s finished, start HAProxy:

sudo systemctl start haproxy

Configuring DNS for Clients

As I said earlier, your client machines will need to be able to resolve the DNS entries used by CRC. This will vary depending on how you handle DNS. One possible option is to use dnsmasq on your client machine.

Before doing that, you’ll need to update NetworkManager to use dnsmasq. This is done by creating a new NetworkManager config file:

cat << EOF > /tmp/00-use-dnsmasq.conf
[main]
dns=dnsmasq
EOF

sudo mv /tmp/00-use-dnsmasq.conf /etc/NetworkManager/conf.d/00-use-dnsmasq.conf

You’ll also need to add DNS entries for the CRC server:

cat << EOF > /tmp/01-crc.conf
address=/apps-crc.testing/SERVER_IP
address=/api.crc.testing/SERVER_IP
EOF

sudo mv /tmp/01-crc.conf /etc/NetworkManager/dnsmasq.d/01-crc.conf

Again, you can either manually enter the IP of the host machine or use the following commands to replace it:

sudo sed -i "s/SERVER_IP/$SERVER_IP/g" /etc/NetworkManager/dnsmasq.d/01-crc.conf

Once the changes have been made, restart NetworkManager:

sudo systemctl reload NetworkManager

Accessing CRC

The crc binary provides subcommands for discovering the authentication information to access the CRC instance:

crc console --url
https://console-openshift-console.apps-crc.testing

crc console --credentials
To login as a regular user, run 'oc login -u developer -p developer https://api.crc.testing:6443'.
To login as an admin, run 'oc login -u kubeadmin -p mhk2X-Y8ozE-9icYb-uLCdV https://api.crc.testing:6443'

The URL from the first command will access the web console from any machine with the appropriate DNS resolution configured. The login credentials can be determined from the output of the second command.

To give credit where it is due, much of this information came from this gist by Trevor McKay.

Happy Coding :)

The post Accessing CodeReady Containers on a Remote Server appeared first on Red Hat OpenShift Blog.

Fully Automated Management of Egress IPs with the egressip-ipam-operator

Introduction

Egress IPs is an OpenShift feature that allows for the assignment of an IP to a namespace (the egress IP) so that all outbound traffic from that namespace appears as if it is originating from that IP address (technically it is NATed with the specified IP).

This feature is useful within many enterprise environments as it allows for the establishment of firewall rules between namespaces and other services outside of the OpenShift cluster. The egress IP becomes the network identity of the namespace and all the applications running in it. Without egress IP, traffic from different namespaces would be indistinguishable because by default outbound traffic is NATed with the IP of the nodes, which are normally shared among projects. 

Image may be NSFW.
Clik here to view.

To clarify the concept, we can see in this diagram above containing two namespaces (A and B), each running two pods (A1, A2, B1, B2). A is a namespace whose applications can connect to a database in the company’s network. B is not authorized to do so. The A namespace is configured with an egress IP so all the pods outbound connections egress with that IP. A firewall is configured to allow connections from that IP to an enterprise database. The B namespace is not configured with an egress IP so its pods egress via using the node’s IP. Those IPs are not allowed by the firewall to connect to the database.

However, to enable this feature requires some manual steps to be properly configured. Also, when running on cloud providers, additional configuration is needed.

Reasoning about this question with a customer we realized that there was an opportunity to automate the entire process with an operator.

 

The egressip-ipam-operator 

The purpose of the egressip-ipam-operator is to manage the assignment of egressIPs (IPAM) to namespaces and to ensure that the necessary configuration in OpenShift and the underlying infrastructure is consistent.

IPs can be assigned to namespaces via an annotation or the egressip-ipam-operator can select one from a preconfigured CIDR range.

For a bare metal deployment, the configuration would be similar to the example below:

 

apiVersion: redhatcop.redhat.io/v1alpha1
kind: EgressIPAM
metadata:
 name: egressipam-baremetal
spec:
 cidrAssignments:
   - labelValue: "true"
     CIDR: 192.169.0.0/24
 topologyLabel: egressGateway
 nodeSelector:
   matchLabels:
     node-role.kubernetes.io/worker: ""

This configuration states that nodes selected by the nodeSelector should be divided in groups based on the topology label and each group will receive egressIPs from the specified CIDR.

In this example, we have only one group which in most cases will be enough for a bare metal configuration. Having multiple groups can occur when nodes are dislocated in multiple subnets, where different CIDRs are needed to make the addresses routable. This is exactly what happens with multi AZs deployments in cloud providers (see more about this below).

Users can opt in to having their namespaces receive egress IPs by adding the following annotation to the namespace: 

egressip-ipam-operator.redhat-cop.io/egressipam=<egressIPAM>

So, in the case of the example from above the annotation would take the form: 

egressip-ipam-operator.redhat-cop.io/egressipam=egressipam-baremetal.

When this occurs, the namespace is assigned an egress IP per cidrAssignment.

In the case of bare metal, a node is selected by OpenShift to carry that egress IP.

It is also possible for the user to specify which egress IPs a namespace should have. In this case, a second annotation is needed with the following format: 

egressip-ipam-operator.redhat-cop.io/egressips=IP1,IP2...

The annotation value is a comma separated array of IPs. There must be exactly one IP per cidrAssignment .

AWS Support

The egress-ipam-operator can also work with Amazon Web Services (AWS). In this case, the operator has additional tasks to perform because it needs to configure the EC2 VM instances to carry the additional IPs. This is due to the fact that like in most cloud providers, the cloud provider needs to control the IPs that are assigned to VMs.

For the AWS use case,the EgressIPAM configuration appears as follows:

apiVersion: redhatcop.redhat.io/v1alpha1
kind: EgressIPAM
metadata:
 name: egressipam-aws
spec:
 cidrAssignments:
   - labelValue: "eu-central-1a"
     CIDR: 10.0.128.0/20
   - labelValue: "eu-central-1b"
     CIDR: 10.0.144.0/20
   - labelValue: "eu-central-1c"
     CIDR: 10.0.160.0/20
 topologyLabel: topology.kubernetes.io/zone
 nodeSelector:
   matchLabels:
     node-role.kubernetes.io/worker: ""

Here, we can see multiple cidrAssignments, one per availability zone, in which the cluster is installed. Also, notice that the topologyLabel must be specified as topology.kubernetes.io/zone to identify the availability zone. The CIDRs must be the same as the CIDRs used for the node subnet.

When a project with the opt-in node is created, the following actions occur:

  1. One IP per cidrAssignent is assigned to the namespace
  2. One VM per zone is selected to carry the corresponding IP.
  3. The OpenShift nodes corresponding to the AWS VMs are configured to carry that IP.

Installation

For detailed instructions on how to install the egress-ipam-operator, see the github repository.

Conclusion

Everytime there is an automation opportunity around and about OpenShift, we should consider capturing the automation as an operator and, possibly, also consider open sourcing the resulting operator. In this case, we automated the operations around egress IPs. 

Keep in mind that this operator is not officially supported by Red Hat and it is currently managed by the container Community of Practice (CoP) at Red Hat, which will provide best effort support. Feedback and contributions (for example, supporting additional cloud providers) are welcome.

 

The post Fully Automated Management of Egress IPs with the egressip-ipam-operator appeared first on Red Hat OpenShift Blog.

OpenShift Commons Briefing: Bringing OpenShift to IBM Cloud with Chris Rosen (IBM)

.

 

In this briefing, IBM Cloud’s Chris Rosen discusses the logistics of bringing OpenShift to IBM Cloud and walk us thru how to make the most of this new offering from IBM Cloud.

Red Hat OpenShift is now available on IBM Cloud as a fully managed OpenShift service that leverages the enterprise scale and security of IBM Cloud, so you can focus on developing and managing your applications. It’s directly integrated into the same Kubernetes service that maintains 25 billion on-demand forecasts daily at The Weather Company.

Chris Rosen walks us thru how to

  • Enjoy dashboards with a native OpenShift experience, and push-button integrations with high-value IBM and Red Hat middleware and advanced services.
  • Rely on continuous availability with multizone clusters across six regions globally.
  • Move workloads and data more securely with Bring Your Own Key; Level 4 FIPS; and built-in industry compliance including PCI, HIPAA, GDPR, SOC1 and SOC2.
  • Start fast and small using one-click provisioning and metered billing, with no long-term commitment

Slides here: Red Hat OpenShift on IBM Cloud – Webinar – 2020-03-18

Additional Resources:


To stay abreast of all the latest releases and events, please join the OpenShift Commons and join our mailing lists & slack channel.


What is OpenShift Commons?

Commons builds connections and collaboration across OpenShift communities, projects, and stakeholders. In doing so we’ll enable the success of customers, users, partners, and contributors as we deepen our knowledge and experiences together.

Our goals go beyond code contributions. Commons is a place for companies using OpenShift to accelerate its success and adoption. To do this we’ll act as resources for each other, share best practices and provide a forum for peer-to-peer communication.

Join OpenShift Commons today!

The post OpenShift Commons Briefing: Bringing OpenShift to IBM Cloud with Chris Rosen (IBM) appeared first on Red Hat OpenShift Blog.

Guide to Installing an OKD 4.4 Cluster on your Home Lab

Image may be NSFW.
Clik here to view.
 

Take OKD 4, the Community Distribution of Kubernetes that powers Red Hat OpenShift, for a test drive on your Home Lab. 

Craig Robinson at East Carolina University has created an excellent blog explaining how to install OKD 4.4 in your home lab!

What is OKD?

OKD is the upstream community-supported version of the Red Hat OpenShift Container Platform (OCP).  OpenShift expands vanilla Kubernetes into an application platform designed for enterprise use at scale.  Starting with the release of OpenShift 4, the default operating system is Red Hat CoreOS, which provides an immutable infrastructure and automated updates. OKD’s default operating system is  Fedora CoreOS which, like OKD, is the upstream version of Red Hat CoreOS. 

Instructions for Deploying OKD 4 Beta on your Home Lab

For those of you who have a Home Lab, check out the step-by-step guide here helps you successfully build an OKD 4.4 cluster at home using VMWare as the example hypervisor, but you can use Hyper-V, libvirt, VirtualBox, bare metal, or other platforms just as easily. 

Experience is an excellent way to learn new technologies. Used hardware for a home lab that could run an OKD cluster is relatively inexpensive these days ($250–$350), especially when compared to a cloud-hosted solution costing over $250 per month.

The purpose of this step-by-step guide is to help you successfully build an OKD 4.4 cluster at home that you can take for a test drive.  VMWare is the example hypervisor used in this guide, but you could use Hyper-V, libvirt, VirtualBox, bare metal, or other platforms. 

This guide assumes you have a virtualization platform, basic knowledge of Linux, and the ability to Google.


Check out the step-by-step guide here on Medium.com


Once you’ve gain some experience with OpenShift by using the open source upstream combination of OKD and FCOS (Fedora CoreOS) to build your own cluster on your home lab, be sure to share your feedback and any issues with the OKD-WG on this Beta release of OKD in the  OKD Github Repo here: https://github.com/openshift/okd

 

Additional Resources:

This should get you up and going. Good luck on your journey with OpenShift! 

The post Guide to Installing an OKD 4.4 Cluster on your Home Lab appeared first on Red Hat OpenShift Blog.

Viewing all 141 articles
Browse latest View live