Cinder Newton Design Summit Summary
At the Design Summit in Austin, the Cinder team met over three days
to go over a variety of topics. This is a general summary of the
notes captured from each session.
We were also able to record most sessions. Please see the
openstack-cinder YouTube channel for all its minute and tedious
Replication Next Steps
Replication v2.1 was added in Mitaka. This was a first step in supporting
a simplified use case. A few drivers were able to implement support for
this in Mitaka, with a few already in the queue for support in Newton.
There is a desire to add the ability to replicate smaller groups of volumes
and control them individually for failover, failback, etc. Eventually we
would also want to expose this functionality to non-admin users. This will
allow tenants to group their volumes by application workload or other user
specific constraint and give them control over managing that workload.
It was agreed that it is too soon to expose this at this point. We would
first like to get broader vendor support for the current replication
capabilities before we add anything more. We also want to improve the admin
experience with handling full site failover. As it is today, there is a lot
of manual work that the admin would need to do to be able to fully recover
from a failover. There are ways we can make this experience better. So before
we add additional things on top of replication, we want to make sure what
we have is solid and at least slightly polished.
Personally, I would like to see some work done with Nova or some third party
entity like Smaug or other projects to be able to coordinate activities on
the compute and storage sides in order to fail over an environment completely
from a primary to secondary location.
Related to the group replication (tiramisu) work was the idea of generic
volume groups. Some sort of grouping mechanism would be required to tie in
to that. We have a grouping today with consistency groups, but that has its
own set of semantics and expectations that doesn’t always fully mesh with
what users would want for group replication.
There have also been others looking at using consistency groups to enable
vendor specific functionality not quite inline with the intent of what
CGs are meant for.
We plan on creating a new concept of a group that has a set of possible types.
One of these types will be consistency, with the goal that internally we can
shift things around to convert our current CG concept to be a group of type
consistency while still keeping the API interface that users are used to for
working with them.
But beyond that we will be able to add things like a “replication” type that
will allow users to group volumes, that may or not be able to be snapped in
a IO order consistent manner, but that can be acted on as a group to be
replicated. We can also expand this group type to other concepts moving
forward to meet other use cases without needing to introduce a wholly new
concept. The mechanisms for managing groups will already be in place and a new
type will be able to be added using existing plumbing.
Active/Active High Availability
Work continues on HA. Gorka gave an overview of the work completed so far and
the work left to do. We are still on the plan proposed at the Tokyo Summit,
just a lot of work to get it all implemented. The biggest variations are around
the host name used for the “clustered” service nodes and the idea that we will
not attempt to do any sort of automatic cleanup for in-progress work that gets
orphaned due to a node failure.
Two sessions were devoted to going over what had changed in Mitaka. There were
a lot of things introduced that developers and code reviewers now need to be
aware of, so we wanted to spend some time educating everyone on these things.
Conditional DB Updates
To try to eliminate races (partly related to the HA work) we will now use
conditional updates. This will eliminate the gap between checking a value in
setting it, making it one atomic DB update. Better performance than locking
API microversions was implemented in Mitaka. The new /v3 endpoint should be
used. Any change in the API should now be implemented as a micrversion bump.
Devref in Cinder with details of how to use this and more detail as to when
a microversion is needed and when it is not.
Devref added for rolling upgrades and versioned objects. Discussed need to make
incremental DB changes rather than all in one release. First release add new
colume – write to both, read from original. Second release – write to both,
read from new column. Third release – original column can now be deleted.
Recommended service upgrade order: cinder-api, cinder-scheduler, cinder-volume,
cinder-backup. After all services upgraded, send SIGHUP to each to release
Multinode grenade tests need to be set up to get regular testing on rolling
upgrades to ensure we don’t let anything through that will cause problems.
Some additional patches are in progress to fix a few internal things like
unversioned dicts that are passed around. This will help us make sure we don’t
change one of those structures in an incompatible way.
The backup service was decoupled from the c-vol service in Mitaka. This allows
us to move backup to other nodes to offload that work. We can also scale out
to allow more backup operations to work in parallel.
Also some discussion of whether KVMs change block tracking could be used to
further optimize this process.
Some other issues were identified with backup. It is currently a sequential
process, so backups can take a long time. Some work is being done to break out
the backup chunks into separate processes to allow concurrent processing.
The idea of having different backup types was brought up. This will require
more discussion. The idea is to have a way to configure different volumes to
be able to be backed up to different backup target types (i.e., one to Swift
backend, one to Ceph backend, one to GCS, etc.).
There are some implications for the HA work being done and how to do clean up.
Those issues will need to be fully identified and worked through.
We discussed our various available testing and how we can get better test
coverage. We also want to optimize the testing process so everyone isn’t stuck
running long running tests just to validate small changes.
For unit tests, tests that take more than 3-5 seconds to run will be moved out
to our integration tests. Most of these are doing more than just “unit” testing
so makes more sense for them to be separated out.
There’s some desire to gate on coverage, but it was discussed how that can be
a difficult thing to enforce. There was a lot of concern that a hard policy
around coverage would lead to a lot of useless unit tests that don’t really add
valuable testing, just adds coverage over a given path to get the numbers up.
It may be nice to have something like our non-voting pylint jobs to help
detect when our test coverage decreases, but not sure if the additional infra
load to run this would be worth it.
We have a functional test job that is currently non-voting. It has been broken
recently and due to it being non-voting it was missed for some time. This needs
to be fixed and the job changed to be voting. Since the majority of these tests
were moved from the unit tests, they should be fine to make voting once passing
Having in-tree tempest tests were also discussed. We’ve used tempest for
testing things not in line with the mission of tempest. We also have tests in
there that may not really be relevant or needed for other projects getting
them by running all tempest tests. For the ones that are really Cinder specific
we should migrate them from being in tempest to being in tree with Cinder. That
also gives us full control over what is included.
We also discussed plans for how to test API microversions and adding pylint
testing to os-brick.
CinderClient and OpenStackClient
A group of folks are working through a list of commands to make sure osc has
functional parity with cinderclient. We need to have the same level of support
in osc before we can even consider deprecating the cinder client CLI.
The idea was raised to create a wiki page giving a “lookup” of cinder client
commands to openstackclient commands to help users migrate over to the new
client. Will need to decided how to support this given long term plans for the
wiki, but having the lookup posted somewhere should help.
We do have an extra challenge in that we added a cinderclient extension for
working with os-brick as a stand alone storage management tool. We will work
with the osc team to see what the options are for supporting something like
this and decide what the best end user experience would be for this. It may be
that we don’t deprecate python-cinderclient CLI just for this use case.
Made midcycle plans based on survey results. (Has since changed since we ran in
to complications with hotal availability)
Talked breifly about non-Cinder volume replication and how to recover during
DR. Discussed whether all backends support managing existing volumes. How to
discover available volumes. What to do with snapshots on manage.
Nova Cross Project
All about multiattach. Matt Riedman wrote up a nice recap of this discussion
API capability and feature discovery.
What to do for backends that only support allocation in units greater than 1G.
Next steps for user message reporting.
- User messaging
- Active/active high availability
- Improved functional test coverage
- Better support for cheesecake (repl v2.1)
Extending attached volume.
Discussed qemu-img convert error a couple drivers were seeing.
FICON protocol support.
Generic group discussion.
QoS support for IOPs per GB.
Per tenant QoS.
Cinder callbacks to Nova for task completion.
Config options inheritance (defined in Default, apply to subsections)
Status of image caching.
Driver interface compliance checks and interface documentation.