IBM Analyics Engine Library¶
This project is a Python library for working with the IBM Analytics Engine. The github repository for this library can be found here:
Examples¶
This section shows example code snippets for working with this library.
Prerequisites¶
API Key
The lifecycle of an IBM Analytics Engine cluster is controlled through Cloud Foundry (e.g. create, delete, status operations). This python library requires an API key to work with the Cloud Foundry APIs. For more information on IBM Cloud API Keys including how to create and download an API Key, see https://console.bluemix.net/docs/iam/userid_keys.html#userapikey
Installation
Ensure you have installed this library.
Install with:
pip install --upgrade git+https://github.com/snowch/ibm-analytics-engine-python@master
Logging¶
Log level is controlled with the environment variable LOG_LEVEL
.
You may set it programmatically in your code:
os.environ["LOG_LEVEL"] = "DEBUG"
Typical valid values are ERROR, WARNING, INFO, DEBUG. For a full list of values, see: https://docs.python.org/3/library/logging.html#logging-levels
Finding your space guid¶
Many operations in this library require you to specify a space guid. You can list the spaces guids for your account using this example:
from ibm_analytics_engine.cf.client import CloudFoundryAPI
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
cf.print_orgs_and_spaces()
Alternatively, if you know your organisation name and space name, you can use the following:
from ibm_analytics_engine.cf.client import CloudFoundryAPI
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
try:
space_guid = cf.space_guid(org_name='your_org_name', space_name='your_space_name')
print(space_guid)
except ValueError as e:
# Space not found
print(e)
Create Cluster¶
This example shows how to create a basic spark cluster.
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE, IAEServicePlanGuid
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
space_guid = cf.space_guid(org_name='your_org_name', space_name='your_space_name')
iae = IAE(cf_client=cf)
cluster_instance_guid = iae.create_cluster(
service_instance_name='SPARK_CLUSTER',
service_plan_guid=IAEServicePlanGuid.LITE,
space_guid=space_guid,
cluster_creation_parameters={
"hardware_config": "default",
"num_compute_nodes": 1,
"software_package": "ae-1.0-spark",
}
)
print('>> IAE cluster instance id: {}'.format(cluster_instance_guid))
# This call blocks for several minutes. See the Get Cluster Status example
# for alternative options.
status = iae.status(
cluster_instance_guid=cluster_instance_guid,
poll_while_in_progress=True)
print('>> Cluster status: {}'.format(status))
The above example creates a LITE cluster. See IBMServicePlanGuid for the available service plan guids.
Delete Cluster¶
from ibm_analytics_engine.cf.client import CloudFoundryAPI, CloudFoundryException
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
iae = IAE(cf_client=cf)
try:
iae.delete_cluster(
cluster_instance_guid='12345-12345-12345-12345',
recursive=True)
print('Cluster deleted.')
except CloudFoundryException as e:
print('Unable to delete cluster: ' + str(e))
Get or Create Credentials¶
import json
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
iae = IAE(cf_client=cf)
vcap_json = iae.get_or_create_credentials(cluster_instance_guid='12345-12345-12345-12345')
# prettify json
vcap_formatted = json.dumps(vcap_json, indent=4, separators=(',', ': '))
print(vcap_formatted)
To save the returned data to disk, you can do something like:
with open('./vcap.json', 'w') as vcap_file:
vcap_file.write(vcap_formatted)
Get Cluster Status¶
To return the Cloud Foundry status:
import time
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
iae = IAE(cf_client=cf)
while True:
status = iae.status(cluster_instance_guid='12345-12345-12345-12345')
if status == 'succeeded' or status == 'failed': break
time.sleep(60)
print(status)
Alternative option to poll for the Cloud Foundry status. Note that this approach can block for many minutes while a cluster is being provisioned. While it is blocked, there is no progress output:
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
iae = IAE(cf_client=cf)
status = iae.status(
cluster_instance_guid='12345-12345-12345-12345',
poll_while_in_progress=True)
print(status)
To return the Data Platform API status:
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
iae = IAE(cf_client=cf)
vcap = iae.get_or_create_credentials(cluster_instance_guid='12345-12345-12345-12345')
status = iae.dataplatform_status(vcap)
print(status)
List Clusters¶
from ibm_analytics_engine.cf.client import CloudFoundryAPI
from ibm_analytics_engine import IAE
cf = CloudFoundryAPI(api_key_filename='your_api_key_filename')
space_guid = cf.space_guid(org_name='your_org_name', space_name='your_space_name')
iae = IAE(cf_client=cf)
for i in iae.clusters(space_guid=space_guid):
print(i)
Jupyter Notebook Gateway¶
This is an example script for running a docker notebook that connects to the cluster using the JNBG protocol and the credentials in your vcap.json file.
#!/bin/bash
export VCAP_STR="$(cat vcap.json)"
KG_HTTP_USER=$(python -c "import json, os; print(json.loads(os.environ['VCAP_STR'])['cluster']['user'])")
KG_HTTP_PASS=$(python -c "import json, os; print(json.loads(os.environ['VCAP_STR'])['cluster']['password'])")
KG_HTTP_URL=$(python -c "import json, os; print(json.loads(os.environ['VCAP_STR'])['cluster']['service_endpoints']['notebook_gateway'])")
KG_WS_URL=$(python -c "import json, os; print(json.loads(os.environ['VCAP_STR'])['cluster']['service_endpoints']['notebook_gateway_websocket'])")
# Create a directory for the notebooks so they don't disappear when the docker constainer shuts down
if [ ! -d notebooks ]
then
mkdir notebooks
fi
docker run -it --rm \
-v $(pwd)/notebooks:/tmp/notebooks \
-e KG_HTTP_USER=$KG_HTTP_USER \
-e KG_HTTP_PASS=$KG_HTTP_PASS \
-e KG_URL=$KG_HTTP_URL \
-e KG_WS_URL=$KG_WS_URL \
-p 8888:8888 \
biginsights/jupyter-nb-nb2kg
# Open a browser window to: http://127.0.0.1:8888
API Docs¶
A python library for working with IBM Analytics Engine
Service Instance Operations¶
Lifecycle operations for an IAE service instance.
-
class
ibm_analytics_engine.
IAE
(cf_client)[source]¶ This class provides methods for working with IBM Analytics Engine (IAE) deployment operations. Many of the methods in this calls are performed by calling the Cloud Foundry Rest API (https://apidocs.cloudfoundry.org/272/). The Cloud Foundry API is quite abstract, so this class provides methods names that are more meaningful for those just wanting to work with IAE.
This class does not save the state from the Cloud Foundry operations - it retrieve all state from Cloud Foundry as required.
-
__init__
(cf_client)[source]¶ Create a new instance of the IAE client.
Parameters: cf_client (CloudFoundryAPI) – The object that makes the Cloud Foundry rest API calls.
-
clusters
(space_guid, short=True, status=None)[source]¶ Returns a list of clusters in the space_guid
Parameters: - space_guid (
str
) – The space_guid to query for IAE clusters. - short (
bool
, optional) – Whether to return short (brief) output. If false, returns the full Cloud Foundry API output. - status (
str
, optional) – Filter the return only the provided status values.
Returns: If the short=True, this method returns: [ (cluster_name, cluster_guid, last_operation_state), … ]
The last_operation_status may be:- in progress- succeeded- failedReturn type: list
- space_guid (
-
create_cluster
(service_instance_name, service_plan_guid, space_guid, cluster_creation_parameters)[source]¶ Create a new IAE Cluster`
Parameters: - service_instance_name (
str
) – The name you would like for the Cluster. - service_plan_guid (
IAEServicePlanGuid
) – The guid representing the type of Cluster to create. - space_guid (
str
) – The space guid where the Cluster will be created. - cluster_creation_parameters (
dict
) –The cluster creation parameters. An example cluster creation parameters is shown below:
{“num_compute_nodes”: 1,”hardware_config”: “Standard”,”software_package”: “ae-1.0-spark”,”customization”: [{“name”: “action1”,”type”: “bootstrap”,”script”: {”script_params”: []}]}
Returns: The cluster_instance_guid
Return type: str
- service_instance_name (
-
More api docs coming soon …
Spark Livy Operations¶
Not implemented yet.
Spark SSH Operations¶
Not implemented yet.
Ambari Operations¶
Not implemented yet.
Cloud Foundry Operations¶
Utility Classes¶
IBMServicePlanGuid¶
-
class
ibm_analytics_engine.
IAEServicePlanGuid
[source]¶ Service Plan Guid for IBM Analytics Engine.
-
LITE
= 'acb06a56-fab1-4cb1-a178-c811bc676164'¶ IBM Analytics Engine ‘Lite’ plan.
-
STD_HOURLY
= '9ba7e645-fce1-46ad-90dc-12655bc45f9e'¶ IBM Analytics Engine ‘Standard Hourly’ plan.
-
STD_MONTHLY
= 'f801e166-2c73-4189-8ebb-ef7c1b586709'¶ IBM Analytics Engine ‘Standard Monthly’ plan.
-
Example notebook - Create Cluster¶
This example uses a python library for working with an IAE instance.
IBM Analytics Engine Python Library links:
- documentation is on readthedocs
- source code repository is on github
- this notebook is on github
In [ ]:
! pip install --quiet --upgrade git+https://github.com/snowch/ibm-analytics-engine-python@master
In [ ]:
from ibm_analytics_engine import CloudFoundryAPI, CloudFoundryAPI
from ibm_analytics_engine import IAE, IAEServicePlanGuid, IAEClusterSpecificationExamples
We use an IBM Cloud API key to work with an IAE Instance. You can create an API using the bluemix CLI tools, e.g.
bluemix iam api-key-create My_IAE_Key -d "This is my IAE API key" -f my_api_key.json
Alternatively, follow these instructions to create an API key using the IBM Cloud web console and then save it in a secure location.
In [ ]:
cf = CloudFoundryAPI(api_key_filename='./my_api_key.json')
You aren’t restricted to just using an API key file. If you have the API key value, you can do this:
from getpass import getpass
api_key = getpass("Enter your api key: ")
cf = CloudFoundryAPI(api_key=api_key)
In [ ]:
# Provide your organizaton name and space name:
SPACE_GUID = cf.space_guid(org_name='my_org_name', space_name='my_space_name')
print(SPACE_GUID)
If you couldn’t find your space guid, try printing out all your orgs and spaces:
cf.print_orgs_and_spaces()
In [ ]:
# We interact with the IBM Analytics Engine through the IAE class.
# Let's create an instance of it:
iae = IAE(cf_client=cf)
In [ ]:
# List the clusters in the space
iae.clusters(space_guid=SPACE_GUID)
In [ ]:
cluster_guid = iae.create_cluster(service_instance_name = 'MY_SPARK_CLUSTER',
service_plan_guid = IAEServicePlanGuid.LITE,
cluster_creation_parameters = {
"hardware_config": "default",
"num_compute_nodes": 1,
"software_package": "ae-1.0-spark",
},
space_guid = SPACE_GUID)
Alternative options for service_plan_guid:
IAEServicePlanGuid.STD_HOURLY
IAEServicePlanGuid.STD_MONTHLY
There are also some examples of cluster_creation_paramters in
IAEClusterSpecificationExamples
class:
IAEClusterSpecificationExamples.SINGLE_NODE_BASIC_SPARK = {
'num_compute_nodes': 1,
'hardware_config': 'default',
'software_package': 'ae-1.0-spark'
}
and:
IAEClusterSpecificationExamples.SINGLE_NODE_BASIC_HADOOP = {
'num_compute_nodes': 1,
'hardware_config': 'default',
'software_package': 'ae-1.0-hadoop-spark'
}
These have been provided so you don’t have to remember the parameters for creating a default basic cluster.
You would use them like this:
iae.create_cluster(...,
cluster_creation_parameters = IAEClusterSpecificationExamples.SINGLE_NODE_BASIC_SPARK,
...)
In [ ]:
# Poll the cluster until provisioning has finished
import time
while True:
status = iae.status(cluster_instance_guid=cluster_guid)
print(status)
if status == 'succeeded' or status == 'failed': break
time.sleep(60)
In [ ]:
# Only run this cell after the previous cell has finished with the status 'succeeded',
# otherwise you will receive an error trying to get or create the credentials.
import json
# get the credentials data for the cluster in vcap json format
vcap = iae.get_or_create_credentials(cluster_instance_guid=cluster_guid)
# print the credentials out
vcap_formatted = json.dumps(vcap, indent=4, separators=(',', ': '))
print(vcap_formatted)
# save the credentials to a file
with open('./vcap.json', 'w') as vcap_file:
vcap_file.write(vcap_formatted)
In [ ]:
# Grab the ambari console url
print(vcap['cluster']['service_endpoints']['ambari_console'])
In [ ]:
# Delete the cluster. Recursive=True will delete service bindings, service keys,
# and routes associated with the service instance.
iae.delete_cluster(cluster_guid, recursive=True)
Example notebook - nb2kg¶
This notebook is an example of connecting a jupyter notebook to IBM Analytics Engine (IAE) using nb2kg. It is expected that you will run this notebook on a local jupyter environment (i.e. not from DSX).
The notebook connecting to IAE is spun up in a docker container. This is because:
- When you enable the nb2kg extension, all kernels run on the configured Kernel Gateway, instead of on the Notebook server host. The extension does not support local kernels. If nb2kg is installed in your local notebook environment, it would prevent you from running local kernels. Thus using docker prevents me from corrupting your local notebook environment.
- Setting up a notebook environment with all the dependencies for nb2kg can be tricky. Docker allows me to provide you with an environment that is pre-configured.
Python is used to interact with docker because this was easier to script
in a notebook. However, if you run docker with sudo
, this notebook
may not work for you.
WARNING: This project is just a demo of connecting to IAE using nb2kg. Your mileage may vary.
In [ ]:
! pip install --quiet docker
Load the IBM Analytics Engine (IAE) credentials. See this example notebook for creating an IAE instance and saving the vcap.json credentials file.
In [ ]:
import json
vcap = json.load(open('./vcap.json'))
In [ ]:
import docker
client = docker.from_env()
api_client = docker.APIClient()
The docker images can be quite large and take a long time to load. Here we load the parent image and regularly print some output so you can see what is going on.
In [ ]:
import json
lines_processed = 0
api_client = docker.APIClient()
for line in api_client.pull('jupyter/minimal-notebook:fa77fe99579b', tag=None, stream=True):
if lines_processed % 25 == 0:
try:
status = json.loads(str(line)[2:-5]) # strip quotes and newline
if 'progressDetail' in status:
print(status['progressDetail'])
except:
pass
lines_processed += 1
I have created a custom docker environment that uses nb2kg. This environment has been kept as simple as possible to make it easy for you to adapt to to your own requirements.
In [ ]:
image = client.images.build(
path='https://github.com/snowch/docker_jupyter_notebook_kg.git',
tag='docker_jupyter_notebook_kg')
print(image.tags)
Delete any containers hanging around from a previous run of this notebook.
WARNING: Ensure you backup any work you want to keep before running this command.
In [ ]:
from datetime import datetime as dt
import dateutil.parser
for cont in client.containers.list(filters={ 'name': 'iae_nb2kg_example' }):
created = dt.utcnow() - dateutil.parser.parse(cont.attrs['Created']).replace(tzinfo=None)
print("Name: {} | Status: {} | Age (H:M:S): {}".format(
cont.attrs['Name'],
cont.attrs['State']['Status'],
created
))
#print(json.dumps(cont.attrs, indent=4, sort_keys=True))
cont.kill()
Run the notebook. You may need to change the LOCAL_PORT if this port is not free on your local machine.
Change the LOCAL_NOTEBOOKS_FOLDER to a folder on your local machine where you want your notebooks in the ‘work’ folder to be saved.
In [ ]:
LOCAL_PORT = 8899
LOCAL_NOTEBOOKS_FOLDER = '/Users/snowch/Desktop/notebooks'
container = client.containers.run(
image = image,
volumes = { LOCAL_NOTEBOOKS_FOLDER : '/home/jovyan/work' },
ports = {str(LOCAL_PORT)+'/tcp': LOCAL_PORT},
environment = {
'NB_PORT': LOCAL_PORT,
'KG_HTTP_USER': vcap['cluster']['user'],
'KG_HTTP_PASS': vcap['cluster']['password'],
'KG_URL': vcap['cluster']['service_endpoints']['notebook_gateway'],
'KG_WS_URL': vcap['cluster']['service_endpoints']['notebook_gateway_websocket'],
'KG_CONNECT_TIMEOUT': '50.0',
'KG_REQUEST_TIMEOUT': '40.0'
},
detach=True,
stdout=True,
stderr=True,
remove=True,
name='iae_nb2kg_example'
)
The next cell prints the log output. Ensure there are no errors reported.
You should see a url, e.g.
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://0.0.0.0:8899/?token=12345
Click on the url in the output to open it in your browser.
From here, you should be able to create a new notebook:
In [ ]:
from IPython.lib import backgroundjobs as bg
jobs = bg.BackgroundJobManager()
def printlogs():
for line in container.logs(stream=True):
print(str(line)[2:-3]) # strip quotes and newline
sys.stdout.flush()
jobs.new('printlogs()')
jobs.status()
finally, remove the notebook docker container
In [ ]:
container.kill()
Install with:
pip install --upgrade git+https://github.com/snowch/ibm-analytics-engine-python@master
NOTE: This documentation is a work-in-progress. It will be complete within the next few days/weeks. Please come back soon …