Install instructions for Cloud Pak for Data (work in progress)

Industry Accelerators
Data sets
Installing CLI
Collect required information
Preparing the bastion node
Pulling the cli container
Logging into OCP using cpd-cli
Create global secret with entitlement key
Creating Operator Lifecycle Management objects
Creating the platform and services
Updating for Watson Knowledge Catalog services
Knowledge Accelerators for Watson Knowledge Catalog

Industry Accelerators

You may want to start out by looking at the industry accelerators to understand what you will want to install.

Data sets

Here are a few good places to find data sets

Installing CLI

First steps are to set up your CPD CLI. There have been some issues with the new Mac processors and Podman/Docker, so I have switched to following a more common practice of using a Jump Server/Bastion Node or a limux VM. I place oc and kubectl into the /usr/local/bin/. I place the cpd-cli and env.sh file in a root/cpd directory. To get the binaries, I use wget. Here is a few commands downloading 4.6.5 CPD-CLI and 4.12 ocp

  wget https://github.com/IBM/cpd-cli/releases/download/v12.0.5/cpd-cli-linux-EE-12.0.5.tgz
  wget https://mirror.openshift.com/pub/openshift-v4/clients/oc/4.6/linux/oc.tar.gz

Installation Instructions are here

Collect required information

While they want you to collect 2 separate items this all gets placed into a single file which you will source prior to running commands. Go ahead and collect the API key.

In a web browser, Login and access the entitlement registry. Click Get entitlement key on the left side above Library. Either Generate a key or Copy the existing key. This will populate your scratchpad where you can paste it into the env.sh file. If there is not an apikey, please reflect on the following instructions.
If you are using IBM ROKS, then you will have an env.sh like this one. Notice that the OCP_TOKEN and OCP_SERVER will come from the login from the OCP Console. I tend to use PROJECT_CPD_INSTANCE=cpd but feel free to change this as you desire.
There is also a storage class that is custom, which will use faster network IO. You will have to apply this prior to install. The YAML to create this is here You will use ibmc-file-customer-gold-gid as the STG_CLASS_FILE=. Use this file and run oc apply -f storageclass-ibmc-file-custom-gold-gid.yaml

Product Documenation for collecting required information

Preparing the bastion node

On the Bastion node, you will need to install Podman. Assuming yum is already installed, you will run yum install podman this will pull the latest podman to run the CLI Containers.

Pulling the cli container

After you ahce the bastion node prepped, you will want to start the CLI container. From the directory where you house clp-cli, you will source the env.sh source env.sh documentation will use a different name. Then you will run the following. ./cpd-cli manage restart-container I am using the forwar ./ as I do not have cpd-cli in my path. you may choose to do this. This command will pull the latest CPD-CLI contianers to execute the cpd-cli comamnds. This will create a directory in the same directory as you cpd-cli called cpd-cli-workspace all files will be pulled and run from here.

Logging into OCP using cpd-cli

One the cli container is running, you will want to log into ocp using cpd-cli. Remember the token is listed in the env.sh file, you can cut and paste to the command line instead of using the OCP_TOKEN environment variable. I tend to update the env and source again for ease of repetiion. What this does is maintain a login credential in the running container instance which is different from if you executed oc login from the shell. This can cause some confusion.

Here is the login command: ` ./cpd-cli manage login-to-ocp –username=${OCP_USERNAME} –token=${OCP_TOKEN} –server=${OCP_URL}`

Create global secret with entitlement key

This is a tricky part. On ROKS you will need to reload the nodes to pick up the global secret. This secret will contain the API entitlement key and location of registries. Since I am doing this with just the public registry you only need to run the following command then reload the nodes. What I tend to do for expedience sake, is to reduce the number of worker nodes in the pool to 1, then run this command.
./cpd-cli manage add-icr-cred-to-global-pull-secret ${IBM_ENTITLEMENT_KEY}
NOTE: this IBM_ENTITLEMENT_KEY is listed in the env.sh. Once this is executed, it will go back and resize the worker pool to the desired size. As new nodes are deployed they will pick up the global pull secret and you will not get any imagepullerrors

Creating OLM objects

This is the step which creates the underlying links between services. For example, Watson Knowledge Catalog requires Common Core Serices (CCS) to be ready to progress to the next step installation. It also requires a host of other OperandRequests to be created. During this phase all of the catalog operators, Subscriptions, service operators and other base dependancies are installed and configure prior to creating the actual instances of the platform and services. This architecture allow you to create multiple instance of the same services in different projects. Here is a link to the documentation for further reading.

You will list out the components you which to install in the env.sh or you can add by hand. You will wanto to issue the following command, which can take up to 30 minutes to execute. Each version seems to be getting faster.
./cpd-cli manage apply-olm --release=${VERSION} --components=${COMPONENTS}

Creating the platform and services

This is the last step installing Cloud Pak for Data. In this step, you will be creating and instance of each Customer Resource Definition or CRD. This is the definition of the service. Generally you will not need to touch any of these definitions. In the next section, you will want to update WKC’s instance definition while it is being installed or WKC could fail during installation. Luckily, OpenShift operators you configured in the previous stepp will fix this during reconcillication.

You will list out the components you which to install in the env.sh or you can add by hand. You will list out the components you which to install in the env.sh or you can add by hand. You will wanto to issue the following command, which can take up to 30 minutes to execute for each service component in the env.sh. Each version seems to be getting faster.
./cpd-cli manage apply-cr --components=${COMPONENTS} --release=${VERSION} --cpd_instance_ns=${PROJECT_CPD_INSTANCE} --block_storage_class=${STG_CLASS_BLOCK} --file_storage_class=${STG_CLASS_FILE} --license_acceptance=true --upgrade=true

Updating for WKC services

If you are installing Watson Knowledge Catalog, then you will have to set up some security Constraints first. Here is the cpd-cli command to set these up.
./cpd-cli manage apply-scc --cpd_instance_ns=${PROJECT_CPD_INSTANCE} --components=wkc

There will be CRIO and Kernal Parameters to setup for Db2 family services and Watson Knowledge Catalog. Here is the documentation on what to change, these will be configured for you if you set the following entries in a yaml wkc-cr yaml.

  iis_db2u_set_kernel_params: true
  wkc_db2u_set_kernel_params: true

Also in WKC, you might run into the need to extend a timeout on for the data rules service. This is done by applying the following values in the same wkc-cr yaml. These will be in line with the values above which are indented 2 spaces. YAML can be touchy.

   wkc_data_rules_resources:
    limits:
      cpu: 1
      memory: 2048Mi
    requests:
      cpu: 100m
      memory: 800Mi

Knowledge Accelerators for Watson Knowledge Catalog

The Knowledge accelerators can be a great starting point for your governacnce journey. There are 4 main industries which are represented as well as cross industry. Here is a link to the documentation for Knowledge Accelerators so that you can investigate the depth and if you desire to install them. Make sure you read through all of these sections to better understand the power that these provide.

If you choose to install them, they can be imported from an archive or via API. I feel API is the most efficient method if not an airgapped environment. The importing documentation is pretty clear on how to do this. Just understand you will probably need to iterate to install all the components you desire. One thing that I missed wad that I need to run the import for Scopes, Business Core Vocabulary and Industry Aliment vocabulary.