Developer’s Guide

The following guide is intended for those interested in the inner workings of nodepool and its various processes.

Operation

If you send a SIGUSR2 to one of the daemon processes, Nodepool will dump a stack trace for each running thread into its debug log. It is written under the log bucket nodepool.stack_dump. This is useful for tracking down deadlock or otherwise slow threads.

Nodepool Builder

The following is the overall diagram for the nodepool-builder process and its most important pieces:

                     +-----------------+
                     |    ZooKeeper    |
                     +-----------------+
                       ^      |
                bld    |      | watch
+------------+  req    |      | trigger
|   client   +---------+      |           +--------------------+
+------------+                |           | NodepoolBuilderApp |
                              |           +---+----------------+
                              |               |
                              |               | start/stop
                              |               |
                      +-------v-------+       |
                      |               <-------+
            +--------->   NodePool-   <----------+
            |     +---+   Builder     +---+      |
            |     |   |               |   |      |
            |     |   +---------------+   |      |
            |     |                       |      |
      done  |     | start           start |      | done
            |     | bld             upld  |      |
            |     |                       |      |
            |     |                       |      |
        +---------v---+               +---v----------+
        | BuildWorker |               | UploadWorker |
        +-+-------------+             +-+--------------+
          | BuildWorker |               | UploadWorker |
          +-+-------------+             +-+--------------+
            | BuildWorker |               | UploadWorker |
            +-------------+               +--------------+

Drivers

class nodepool.driver.Driver

The Driver interface

This is the main entrypoint for a Driver. A single instance of this will be created for each driver in the system and will persist for the lifetime of the process.

The class or instance attribute name must be provided as a string.

abstract getProvider(provider_config)

Return a Provider instance

Parameters

provider_config (dict) – A ProviderConfig instance

abstract getProviderConfig(provider)

Return a ProviderConfig instance

Parameters

provider (dict) – The parsed provider configuration

reset()

Called before loading configuration to reset any global state

class nodepool.driver.Provider(*args, **kw)

The Provider interface

Drivers implement this interface to supply Providers. Each “provider” in the nodepool configuration corresponds to an instance of a class which implements this interface.

If the configuration is changed, old provider instances will be stopped and new ones created as necessary.

The class or instance attribute name must be provided as a string.

abstract cleanupLeakedResources()

Clean up any leaked resources

This is called periodically to give the provider a chance to clean up any resources which make have leaked.

abstract cleanupNode(node_id)

Cleanup a node after use

The driver may delete the node or return it to the pool. This may be called after the node was used, or as part of cleanup from an aborted launch attempt.

Parameters

node_id (str) – The id of the node

abstract getRequestHandler(poolworker, request)

Return a NodeRequestHandler for the supplied request

abstract join()

Wait for provider to finish

On shutdown, this is called after stop() and should return when the provider has completed all tasks. This may not be called on reconfiguration (so drivers should not rely on this always being called after stop).

abstract labelReady(name)

Determine if a label is ready in this provider

If the pre-requisites for this label are ready, return true. For example, if the label requires an image that is not present, this should return False. This method should not examine inventory or quota. In other words, it should return True if a request for the label would be expected to succeed with no resource contention, but False if is not possible to satisfy a request for the label.

Parameters

name (str) – The name of the label

Returns

True if the label is ready in this provider, False otherwise.

abstract start(zk_conn)

Start this provider

Parameters

zk_conn (ZooKeeper) – A ZooKeeper connection object.

This is called after each configuration change to allow the driver to perform initialization tasks and start background threads. The ZooKeeper connection object is provided if the Provider needs to interact with it.

startNodeCleanup(node)

Starts a background process to delete a node

This should return a NodeDeleter to implement and track the deletion of a node.

Parameters

node (Node) – A locked Node object representing the instance to delete.

Returns

A NodeDeleter instance.

abstract stop()

Stop this provider

Before shutdown or reconfiguration, this is called to signal to the driver that it will no longer be used. It should not begin any new tasks, but may allow currently running tasks to continue.

abstract waitForNodeCleanup(node_id)

Wait for a node to be cleaned up

When called, this will be called after cleanupNode().

This method should return after the node has been deleted or returned to the pool.

Parameters

node_id (str) – The id of the node

class nodepool.driver.ProviderNotifications

Notification interface for Provider objects.

This groups all notification messages bound for the Provider. The Provider class inherits from this by default. A Provider overrides the methods here if they want to handle the notification.

nodeDeletedNotification(node)

Called after the ZooKeeper object for a node is deleted.

Parameters

node (Node) – Object describing the node just deleted.

class nodepool.driver.NodeRequestHandler(pw, request)

Class to process a single nodeset request.

The PoolWorker thread will instantiate a class of this type for each node request that it pulls from ZooKeeper.

Subclasses are required to implement the launch method.

abstract property alive_thread_count

Return the number of active node launching threads in use by this request handler.

This is used to limit request handling threads for a provider.

This is an approximate, top-end number for alive threads, since some threads obviously may have finished by the time we finish the calculation.

Returns

A count (integer) of active threads.

checkReusableNode(node)

Handler may implement this to verify a node can be re-used. The OpenStack handler uses this to verify the node az is correct.

hasProviderQuota(node_types)

Checks if a provider has enough quota to handle a list of nodes. This does not take our currently existing nodes into account.

Parameters

node_types – list of node types to check

Returns

True if the node list fits into the provider, False otherwise

hasRemainingQuota(ntype)

Checks if the predicted quota is enough for an additional node of type ntype.

Parameters

ntype – node type for the quota check

Returns

True if there is enough quota, False otherwise

abstract imagesAvailable()

Handler needs to implement this to determines if the requested images in self.request.node_types are available for this provider.

Returns

True if it is available, False otherwise.

abstract launch(node)

Handler needs to implement this to launch the node.

abstract launchesComplete()

Handler needs to implement this to check if all nodes in self.nodeset have completed the launch sequence.

This method will be called periodically to check on launch progress.

Returns

True if all launches are complete (successfully or not), False otherwise.

run()

Execute node request handling.

This code is designed to be re-entrant. Because we can’t always satisfy a request immediately (due to lack of provider resources), we need to be able to call run() repeatedly until the request can be fulfilled. The node set is saved and added to between calls.

setNodeMetadata(node)

Handler may implement this to store driver-specific metadata in the Node object before building the node. This data is normally dynamically calculated during runtime. The OpenStack handler uses this to set az, cloud and region.

unlockNodeSet(clear_allocation=False)

Attempt unlocking all Nodes in the node set.

Parameters

clear_allocation (bool) – If true, clears the node allocated_to attribute.

class nodepool.driver.NodeRequestHandlerNotifications

Notification interface for NodeRequestHandler objects.

This groups all notification messages bound for the NodeRequestHandler. The NodeRequestHandler class inherits from this by default. A request handler overrides the methods here if they want to handle the notification.

nodeReusedNotification(node)

Handler may implement this to be notified when a node is re-used. The OpenStack handler uses this to set the choozen_az.

class nodepool.driver.ProviderConfig(provider)

The Provider config interface

The class or instance attribute name must be provided as a string.

abstract getSchema()

Return a voluptuous schema for config validation

abstract getSupportedLabels(pool_name=None)

Return a set of label names supported by this provider.

Parameters

pool_name (str) – If provided, get labels for the given pool only.

abstract load(newconfig)

Update this config object from the supplied parsed config

abstract property manage_images

Return True if provider manages external images, False otherwise.

abstract property pools

Return a dict of ConfigPool-based objects, indexed by pool name.

Writing A New Provider Driver

Nodepool drivers are loaded from the nodepool/drivers directory. A driver is composed of three main objects:

  • A ProviderConfig to manage validation and loading of the provider.

  • A Provider to manage resource allocations.

  • A NodeRequestHandler to manage nodeset (collection of resource) allocations.

Those objects are referenced from the Driver main interface that needs to be implemented in the __init__.py file of the driver directory.

ProviderConfig

The ProviderConfig is constructed with the driver object and the provider configuration dictionary.

The main procedures of the ProviderConfig are:

  • getSchema() exposes a voluptuous schema of the provider configuration.

  • load(config) parses the provider configuration. Note that the config argument is the global Nodepool.yaml configuration. Each provided labels need to be referenced back to the global config.labels dictionary so that the launcher service know which provider provide which labels.

Provider

The Provider is constructed with the ProviderConfig.

The main procedures of the Provider are:

  • cleanupNode(external_id) terminates a resource

  • listNodes() returns the list of existing resources. This procedure needs to map the nodepool_node_id with each resource. If the provider doesn’t support resource metadata, the driver needs to implement a storage facility to associate resource created by Nodepool with the internal nodepool_node_id. The launcher periodically look for non-existent node_id in listNodes() to delete any leaked resources.

  • getRequestHandler(pool, request) returns a NodeRequestHandler object to manage the creation of resources. The contract between the handler and the provider is free form. As a rule of thumb, the handler should be in charge of interfacing with Nodepool’s database while the provider should provides primitive to create resources. For example the Provider is likely to implement a createResource(pool, label) procedure that will be used by the handler.

NodeRequestHandler

The NodeRequestHandler is constructed with the assigned pool and the request object. Before the handler is used, the following attributes are set:

  • self.provider : the provider configuration.

  • self.pool : the pool configuration.

  • self.zk : the database client.

  • self.manager : the Provider object.

The main procedures of the NodeRequestHandler are:

  • launch(node) starts the creation of a new resource.

  • launchesComplete() returns True if all the node of the nodesets self attributes are READY.

An Handler may not have to launch each node of the nodesets as Nodepool will re-use existing nodes.

The launch procedure usually consists of the following operations:

  • Use the provider to create the resources associated with the node label. Once an external_id is obtained, it should be stored to the node.external_id.

  • Once the resource is created, READY should be stored to the node.state. Otherwise raise an exception to restart the launch attempt.

TaskManager

If you need to use a thread-unsafe client library, or you need to manage rate limiting in your driver, you may want to use the TaskManager class. Implement any remote API calls as tasks and invoke them by submitting the tasks to the TaskManager. It will run them sequentially from a single thread, and assist in rate limiting.

The BaseTaskManagerProvider class is a subclass of Provider which starts and stops a TaskManager automatically. Inherit from it to build a Provider as described above with a TaskManager.

class nodepool.driver.taskmanager.Task(**kw)

Base task class for use with TaskManager

Subclass this to implement your own tasks.

Set the name field to the name of your task and override the main() method.

Keyword arguments to the constructor are stored on self.args for use by the main() method.

main(manager)

Implement the work of the task

Parameters

manager (TaskManager) – The instance of TaskManager running this task.

Arguments passed to the constructor are available as self.args.

wait()

Call this method after submitting the task to the TaskManager to receieve the results.

class nodepool.driver.taskmanager.TaskManager(name, rate_limit)

A single-threaded task dispatcher

This class is meant to be instantiated by a Provider in order to execute remote API calls from a single thread with rate limiting.

Parameters
  • name (str) – The name of the TaskManager (usually the provider name) used in logging.

  • rate_limit (float) – The rate limit of the task manager expressed in requests per second.

rateLimit()

Return a context manager to perform rate limiting. Use as follows:

stop()

Stop the task manager.

submitTask(task)

Submit a task to the task manager.

Parameters

task (Task) – An instance of a subclass of Task.

Returns

The submitted task for use in function chaning.

class nodepool.driver.taskmanager.BaseTaskManagerProvider(provider)

Subclass this to build a Provider with an included taskmanager

Simple Drivers

If your system is simple enough, you may be able to use the SimpleTaskManagerDriver class to implement support with just a few methods. In order to use this class, your system must create and delete instances as a unit (without requiring multiple resource creation calls such as volumes or floating IPs).

Note

This system is still in development and lacks robust support for quotas or image building.

To use this system, you will need to implement a few subclasses. First, create a ProviderConfig subclass as you would for any driver. Then, subclass SimpleTaskManagerInstance to map remote instance data into a format the simple driver can understand. Next, subclass SimpleTaskManagerAdapter to implement the main API methods of your provider. Finally, subclass SimpleTaskManagerDriver to tie them all together.

See the gce provider for an example.

class nodepool.driver.simple.SimpleTaskManagerInstance(data)

Represents a cloud instance

This class is used by the Simple Task Manager Driver classes to represent a standardized version of a remote cloud instance. Implement this class in your driver, override the load() method, and supply as many of the fields as possible.

Parameters

data – An opaque data object to be passed to the load method.

getQuotaInformation()

Return quota information about this instance.

Returns

A QuotaInformation object.

load(data)

Parse data and update this object’s attributes

Parameters

data – An opaque data object which was passed to the constructor.

Override this method and extract data from the data parameter.

The following attributes are required:

  • ready: bool (whether the instance is ready)

  • deleted: bool (whether the instance is in a deleted state)

  • external_id: str (the unique id of the instance)

  • interface_ip: str

  • metadata: dict

The following are optional:

  • public_ipv4: str

  • public_ipv6: str

  • private_ipv4: str

  • az: str

  • region: str

class nodepool.driver.simple.SimpleTaskManagerAdapter(provider)

Public interface for the simple TaskManager Provider

Implement these methods as simple synchronous calls, and pass this class to the SimpleTaskManagerDriver class.

You can establish a single long-lived connection in the initializer. The provider will call methods on this object from a single thread.

All methods accept a task_manager argument. Use this to control rate limiting:

with task_manager.rateLimit():
    <execute API call>
createInstance(task_manager, hostname, metadata, label_config)

Create an instance

Parameters
  • task_manager (TaskManager) – An instance of TaskManager.

  • hostname (str) – The intended hostname for the instance.

  • metadata (dict) – A dictionary of key/value pairs that must be stored on the instance.

  • label_config (ProviderLabel) – A LabelConfig object describing the instance which should be created.

deleteInstance(task_manager, external_id)

Delete an instance

Parameters
  • task_manager (TaskManager) – An instance of TaskManager.

  • external_id (str) – The id of the cloud instance.

getQuotaForLabel(task_manager, label_config)

Return information about the quota used for a label

The default implementation returns a simple QuotaInformation for one instance; override this to return more detailed information including cores and RAM.

Parameters
  • task_manager (TaskManager) – An instance of TaskManager.

  • label_config (ProviderLabel) – A LabelConfig object describing a label for an instance.

Returns

A QuotaInformation object.

getQuotaLimits(task_manager)

Return the quota limits for this provider

The default implementation returns a simple QuotaInformation with no limits. Override this to provide accurate information.

Parameters

task_manager (TaskManager) – An instance of TaskManager.

Returns

A QuotaInformation object.

listInstances(task_manager)

Return a list of instances

Parameters

task_manager (TaskManager) – An instance of TaskManager.

Returns

A list of SimpleTaskManagerInstance objects.

class nodepool.driver.simple.SimpleTaskManagerDriver

Subclass this to make a simple driver

getAdapter(provider_config)

Instantiate an adapter

Parameters

provider_config (ProviderConfig) – An instance of ProviderConfig previously returned by getProviderConfig().

Returns

An instance of SimpleTaskManagerAdapter

getProvider(provider_config)

Return a provider.

Usually this method does not need to be overridden.

getProviderConfig(provider)

Instantiate a config object

Parameters

provider (dict) – A dictionary of YAML config describing the provider.

Returns

A ProviderConfig instance with the parsed data.

State Machine Drivers

Note

This system is still in development and lacks robust support for quotas or image building.

To use this system, you will need to implement a few subclasses. First, create a ProviderConfig subclass as you would for any driver.

Then, subclass Instance to map remote instance data into a format the driver can understand.

Next, create two subclasses of StateMachine to implement creating and deleting instances.

Subclass Adapter to implement the main methods that interact with the cloud.

Finally, subclass StateMachineDriver to tie them all together.

See the example provider for an example.

class nodepool.driver.statemachine.Instance

Represents a cloud instance

This class is used by the Simple Task Manager Driver classes to represent a standardized version of a remote cloud instance. Implement this class in your driver, override the load() method, and supply as many of the fields as possible.

The following attributes are required:

  • ready: bool (whether the instance is ready)

  • deleted: bool (whether the instance is in a deleted state)

  • external_id: str (the unique id of the instance)

  • interface_ip: str

  • metadata: dict

The following are optional:

  • public_ipv4: str

  • public_ipv6: str

  • private_ipv4: str

  • az: str

  • region: str

  • driver_data: any

And the following are even more optional (as they are usually already set from the image configuration):

  • username: str

  • python_path: str

  • shell_type: str

  • connection_port: str

  • connection_type: str

  • host_keys: [str]

getQuotaInformation()

Return quota information about this instance.

Returns

A QuotaInformation object.

class nodepool.driver.statemachine.StateMachine
class nodepool.driver.statemachine.Adapter(provider_config)

Cloud adapter for the State Machine Driver

This class will be instantiated once for each Nodepool provider. It may be discarded and replaced if the configuration changes.

You may establish a single long-lived connection to the cloud in the initializer if you wish.

Parameters

provider_config (ProviderConfig) – A config object representing the provider.

deleteImage(external_id)

Delete an image from the cloud

Parameters

str (external_id) – The external id of the image to delete

deleteResource(resource)

Delete the supplied resource

The driver has identified a leaked resource and the adapter should delete it.

Parameters

resource (Resource) – A Resource object previously returned by ‘listResources’.

getCreateStateMachine(hostname, label, image_external_id, metadata, retries, log)

Return a state machine suitable for creating an instance

This method should return a new state machine object initialized to create the described node.

Parameters
  • hostname (str) – The hostname of the node.

  • label (ProviderLabel) – A config object representing the provider-label for the node.

  • image_external_id (str) – If provided, the external id of a previously uploaded image; if None, then the adapter should look up a cloud image based on the label.

  • dict (metadata) – A dictionary of metadata that must be stored on the instance in the cloud. The same data must be able to be returned later on Instance objects returned from listInstances.

  • int (retries) – The number of attempts which should be made to launch the node.

  • Logger (log) – A logger instance for emitting annotated logs related to the request.

Returns

A StateMachine object.

getDeleteStateMachine(external_id, log)

Return a state machine suitable for deleting an instance

This method should return a new state machine object initialized to delete the described instance.

Parameters
  • external_id (str) – The external_id of the instance, as supplied by a creation StateMachine or an Instance.

  • Logger (log) – A logger instance for emitting annotated logs related to the request.

getQuotaForLabel(label)

Return information about the quota used for a label

The default implementation returns a simple QuotaInformation for one instance; override this to return more detailed information including cores and RAM.

Parameters

label (ProviderLabel) – A config object describing a label for an instance.

Returns

A QuotaInformation object.

getQuotaLimits()

Return the quota limits for this provider

The default implementation returns a simple QuotaInformation with no limits. Override this to provide accurate information.

Returns

A QuotaInformation object.

listInstances()

Return an iterator of instances accessible to this provider.

The yielded values should represent all instances accessible to this provider, not only those under the control of this adapter, but all visible instances in order to achive accurate quota calculation.

Returns

A generator of Instance objects.

listResources()

Return a list of resources accessible to this provider.

The yielded values should represent all resources accessible to this provider, not only those under the control of this adapter, but all visible instances in order for the driver to identify leaked resources and instruct the adapter to remove them.

Returns

A generator of Resource objects.

stop()

Release any resources as this provider is being stopped

uploadImage(provider_image, image_name, filename, image_format=None, metadata=None, md5=None, sha256=None)

Upload the image to the cloud

Parameters
  • ProviderImageConfig (provider_image) – The provider’s config for this image

  • str (sha256) – The name of the image

  • str – The path to the local file to be uploaded

  • str – The format of the image (e.g., “qcow”)

  • dict (metadata) – A dictionary of metadata that must be stored on the image in the cloud.

  • str – The md5 hash of the image file

  • str – The sha256 hash of the image file

Returns

The external id of the image in the cloud

class nodepool.driver.statemachine.StateMachineDriver

Entrypoint for a state machine driver

getAdapter(provider_config)

Instantiate an adapter

Parameters

provider_config (ProviderConfig) – An instance of ProviderConfig previously returned by getProviderConfig().

Returns

An instance of SimpleTaskManagerAdapter

getProvider(provider_config)

Return a Provider instance

Parameters

provider_config (dict) – A ProviderConfig instance

getProviderConfig(provider)

Instantiate a config object

Parameters

provider (dict) – A dictionary of YAML config describing the provider.

Returns

A ProviderConfig instance with the parsed data.