ddiss/linux.git
8 years agotarget/rbd: add pr_read_reservation support christie_dfuller_lio_rbd_module_scsi2_reserve_v5
David Disseldorp [Wed, 26 Aug 2015 20:27:27 +0000 (22:27 +0200)]
target/rbd: add pr_read_reservation support

Add a tcm_rbd_pr_ops handler for PR READ RESERVATION requests.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/pr: add target_pr_ops read reservation hook
David Disseldorp [Wed, 26 Aug 2015 20:24:37 +0000 (22:24 +0200)]
target/pr: add target_pr_ops read reservation hook

The new pr_read_reservation hook is conditionally invoked on incoming
PERSISTENT RESERVE IN command with a READ RESERVATION service action.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/rbd: add pr_release support
David Disseldorp [Wed, 26 Aug 2015 19:54:07 +0000 (21:54 +0200)]
target/rbd: add pr_release support

Add a tcm_rbd_pr_ops handler for PR RELEASE requests.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/pr: add target_pr_ops release hook
David Disseldorp [Wed, 26 Aug 2015 18:58:42 +0000 (20:58 +0200)]
target/pr: add target_pr_ops release hook

The new pr_release hook is conditionally invoked on incoming PERSISTENT
RESERVE OUT command with a RELEASE service action. As with RESERVE, the
reservation scope is not passed through to the backend - LU_SCOPE is
validated beforehand.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/rbd: add pr_reserve support
David Disseldorp [Wed, 26 Aug 2015 18:44:24 +0000 (20:44 +0200)]
target/rbd: add pr_reserve support

Add a tcm_rbd_pr_ops handler for PR reserve requests. The on-disk PR
info xattr format is updated to accommodate a reservation key, IT nexus
and type, alongside existing registration information.
The on disk format change is done without a corresponding version bump,
as this version zero is as yet unreleased.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/pr: add target_pr_ops reservation hook
David Disseldorp [Wed, 26 Aug 2015 18:35:42 +0000 (20:35 +0200)]
target/pr: add target_pr_ops reservation hook

The new pr_reserve hook is conditionally invoked on incoming PERSISTENT
RESERVE OUT command with a RESERVE service action. The reservation type
and scope fields are validated prior to invocation.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/rbd: add support for PR register & read keys
David Disseldorp [Tue, 25 Aug 2015 14:41:26 +0000 (16:41 +0200)]
target/rbd: add support for PR register & read keys

Store and retrieve persistent reservation information in a "pr_info"
xattr on the rbd header object.
Persistent reservation information is encoded in an ASCII string, in
order to use the atomic compare-and-set functionality offered by Ceph
OSDs.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/pr: add PR registration & read keys hooks
David Disseldorp [Tue, 25 Aug 2015 14:10:38 +0000 (16:10 +0200)]
target/pr: add PR registration & read keys hooks

Allow for backends to separately handle persistent reservation
registration and read keys requests. This will allow for the future
propagation of such requests through the block layer.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add getxattr device attribute for debugging christie_dfuller_lio_rbd_module_scsi2_reserve_v2
David Disseldorp [Sat, 22 Aug 2015 16:52:53 +0000 (18:52 +0200)]
rbd: add getxattr device attribute for debugging

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add rbd_dev_getxattr() helper
David Disseldorp [Sat, 22 Aug 2015 16:43:42 +0000 (18:43 +0200)]
rbd: add rbd_dev_getxattr() helper

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/osd_client: add support for CEPH_OSD_OP_GETXATTR
David Disseldorp [Sun, 23 Aug 2015 22:32:00 +0000 (00:32 +0200)]
ceph/osd_client: add support for CEPH_OSD_OP_GETXATTR

Allows for xattr retrieval. Response data buffer allocation is the
responsibility of the osd_req_op_xattr_init() caller.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add cmpsetattr device attribute for debugging
David Disseldorp [Sat, 22 Aug 2015 13:30:33 +0000 (15:30 +0200)]
rbd: add cmpsetattr device attribute for debugging

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add rbd_dev_cmpsetxattr helper
David Disseldorp [Sat, 22 Aug 2015 13:26:22 +0000 (15:26 +0200)]
rbd: add rbd_dev_cmpsetxattr helper

Allows for the atomic update of an xattr, by comparing against an
existing string, and writing the new value only if the old value is
matching.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add setxattr device attribute for debugging
David Disseldorp [Fri, 21 Aug 2015 16:09:28 +0000 (18:09 +0200)]
rbd: add setxattr device attribute for debugging

Allows for the testing of the kernel RADOS setxattr functionality.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add rbd_dev_setxattr() helper
David Disseldorp [Fri, 21 Aug 2015 16:07:32 +0000 (18:07 +0200)]
rbd: add rbd_dev_setxattr() helper

To be used for persistent reservation state storage in future.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: call check_conflict hook before processing PRs
David Disseldorp [Fri, 21 Aug 2015 09:52:39 +0000 (11:52 +0200)]
target: call check_conflict hook before processing PRs

Use the type flag added in the previous commit to ensure that there are
no SCSI2 reservations prior to processing a PR IN/OUT request.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: check_conflict() arg for any initiator
David Disseldorp [Fri, 21 Aug 2015 09:07:08 +0000 (11:07 +0200)]
target: check_conflict() arg for any initiator

PR request attempt must fail if a SCSI2 reservation (from any initiator)
is already in place. Add a new target_pr_check_type enum argument to the
check_conflict() target_pr_ops function to support this behaviour.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget/rbd: add support for clustered SCSI2 reservations
David Disseldorp [Wed, 19 Aug 2015 13:46:12 +0000 (15:46 +0200)]
target/rbd: add support for clustered SCSI2 reservations

Convert reserve/release requests into corresponding RADOS lock
operations. Locks carry a name of "scsi2_reserve", and a cookie derived
from the initiator name and session Identifier.

All SCSI2 reservation RADOS locks are droped on LU reset, via the pr_ops
reset hook.

Support for persistent reservations will follow.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: use stringify.h instead of own definition
David Disseldorp [Wed, 19 Aug 2015 08:45:27 +0000 (10:45 +0200)]
target: use stringify.h instead of own definition

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add pr_ops->reset hook
David Disseldorp [Wed, 19 Aug 2015 18:38:42 +0000 (20:38 +0200)]
target: add pr_ops->reset hook

This allows for clustered lock release on LU reset TMF.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add backend API for reservation handling
David Disseldorp [Mon, 10 Aug 2015 23:46:08 +0000 (01:46 +0200)]
target: add backend API for reservation handling

With Persistent Reservation support moving to the block layer, it makes
sense to allow for backend modules (such as iblock) to handle
reservation requests directly.

This change only adds initial SCSI2 reserve, release and conflict check
hooks via a new target_pr_ops entry in target_backend_ops.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add rbd_dev_check_lock_cookies helper
David Disseldorp [Thu, 20 Aug 2015 11:35:02 +0000 (13:35 +0200)]
rbd: add rbd_dev_check_lock_cookies helper

Iterate all lockers, and return the number of occurences of a specific
cookie.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: use rbd_dev_[un]lock helpers
David Disseldorp [Thu, 20 Aug 2015 11:33:17 +0000 (13:33 +0200)]
rbd: use rbd_dev_[un]lock helpers

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: export rbd_dev_lock() and rbd_dev_unlock()
David Disseldorp [Thu, 20 Aug 2015 09:24:37 +0000 (11:24 +0200)]
rbd: export rbd_dev_lock() and rbd_dev_unlock()

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: export rbd_dev_break_locks()
David Disseldorp [Wed, 19 Aug 2015 18:36:49 +0000 (20:36 +0200)]
rbd: export rbd_dev_break_locks()

For use by the tcm_core_rbd module.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add rados locking
Mike Christie [Tue, 28 Apr 2015 22:05:46 +0000 (17:05 -0500)]
rbd: add rados locking

This patch adds support for rados lock, unlock and break lock.
This will be used to sync up scsi pr info manipulation and
TMF execution.

It also adds support for list locks and get lock info, but
that and the sysfs support is only for debugging. I do not
think that we want the sysfs interface for the final version
and will remove it in the final patchset. I just kept it in
in case people wanted to test with it. Do we want it in debugfs though?

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[ddiss@suse.de: use cls_lock.h fns, remove rbd_warn '\n's]
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/strings: export ceph_entity_type_name
David Disseldorp [Wed, 5 Aug 2015 17:56:44 +0000 (19:56 +0200)]
ceph/strings: export ceph_entity_type_name

Needed for RBD lock dumps.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/cls_lock: Add lock iteration helper
David Disseldorp [Wed, 5 Aug 2015 17:36:02 +0000 (19:36 +0200)]
ceph/cls_lock: Add lock iteration helper

The lock iteration helper obtains the list of locks for a specific
object, and calls lock_iter_fn for each entry.

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/cls_lock: add lock info iteration helper
David Disseldorp [Fri, 14 Aug 2015 14:25:20 +0000 (16:25 +0200)]
ceph/cls_lock: add lock info iteration helper

Based on ceph_cls_get_lock_info(), but offers a per locker callback, and
fixes some bugs:
- memleak of req/rsp pages
- no check for lock tag string decode errors

Signed-off-by: David Disseldorp <ddiss@suse.de>
8 years agocls_lock: add support for lock info decoding
Douglas Fuller [Tue, 30 Jun 2015 20:28:01 +0000 (13:28 -0700)]
cls_lock: add support for lock info decoding

Add an interface for the lock.lock_info method and associated data
structures.

Based heavily on Mike Christie's code originally authored for the previous
commit.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
[ddiss@suse.de: keep decode helper only]
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agocls_lock: add rados locking
Douglas Fuller [Tue, 30 Jun 2015 20:28:00 +0000 (13:28 -0700)]
cls_lock: add rados locking

This patch adds support for rados lock, unlock and break lock.
This will be used to sync up scsi pr info manipulation and
TMF execution.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[djf: broke out lock functions and moved to new source file]
[djf: changed interface to make use of osd_client utility functions]
[djf: snipped paragraph from commit message related to rbd code not moved]
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoosd_client: added single object method call
Douglas Fuller [Tue, 30 Jun 2015 20:27:59 +0000 (13:27 -0700)]
osd_client: added single object method call

Add a convenience function to osd_client to call class ops. The interface
assumes that request and reply data each consist of single pages.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoauth.c: added ceph_entity_name_encode
Douglas Fuller [Tue, 30 Jun 2015 20:27:58 +0000 (13:27 -0700)]
auth.c: added ceph_entity_name_encode

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoceph: add start/finish encoding helpers
Mike Christie [Tue, 30 Jun 2015 20:27:57 +0000 (13:27 -0700)]
ceph: add start/finish encoding helpers

This patch adds helpers to encode/decode the starting blocks
locking code. They are the equivalent of ENCODE_START and
DECODE_START_LEGACY_COMPAT_LEN in the userspace ceph code.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[djf: added fixup from mailing list]
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoAdd support for userspace ceph DECODE_START.
Mike Christie [Tue, 30 Jun 2015 20:27:56 +0000 (13:27 -0700)]
Add support for userspace ceph DECODE_START.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoosd_client: send watch ping messages
Douglas Fuller [Wed, 17 Jun 2015 14:25:56 +0000 (07:25 -0700)]
osd_client: send watch ping messages

Send CEPH_OSD_WATCH_OP_PING every osd_keepalive_timeout for each watch
event registered. When errors are detected, look up the watch event and
send it CEPH_WATCH_EVENT_DISCONNECTED.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoosd_client: add support for notify payloads via notify event
Douglas Fuller [Wed, 17 Jun 2015 14:25:55 +0000 (07:25 -0700)]
osd_client: add support for notify payloads via notify event

Add support in notify events for receiving data from notify_ack. Notify
events are optional; data is discarded if no event is found.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
[ddiss@suse.de: remove rebase conflict divider from source]
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoosd_client, rbd: update event interface for watch/notify2
Douglas Fuller [Wed, 17 Jun 2015 14:25:54 +0000 (07:25 -0700)]
osd_client, rbd: update event interface for watch/notify2

Change unused ceph_osd_event structure to refer to pending watch/notify2
messages. Watch events include the separate watch and watch error callbacks
used for watch/notify2. Update rbd to use separate watch and watch error
callbacks via the new watch event.

Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/rbd: update watch-notify ceph_osd_op
Mike Christie [Wed, 17 Jun 2015 14:25:53 +0000 (07:25 -0700)]
ceph/rbd: update watch-notify ceph_osd_op

This syncs the ceph_osd_op struct with the current version of ceph
where the watch struct has been updated to support more ops and
the notify-ack support has been broken out of the watch struct.

Ceph commits
1a82cc3926fc7bc4cfbdd2fd4dfee8660d5107a1
2288f318e1b1f6a1c42b185fc1b4c41f23995247
73720130c34424bf1fe36058ebe8da66976f40fb

It still has us use the legacy watch op for now. I will add support
later. It is mostly a prepartion patch for more advanced notify support.

Questions:

1. Should linger also be set for CEPH_OSD_WATCH_OP_RECONNECT?
2. Not sure what watch.gen is used for. Is that for our internal
use or does the osd do something with it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[djf: moved watch event definitions to ceph_fs.h]
[djf: removed changes to rbd.c for SCSI]
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/rbd: add support for header version 2 and 3
Mike Christie [Wed, 17 Jun 2015 14:25:52 +0000 (07:25 -0700)]
ceph/rbd: add support for header version 2 and 3

This adds support watch-notify header 2 and 3 support, so we can
get a return_code from those operations.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
[djf: fixed decoding bits]
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agoceph/rbd: add support for watch notify payloads
Mike Christie [Wed, 17 Jun 2015 14:25:51 +0000 (07:25 -0700)]
ceph/rbd: add support for watch notify payloads

This patch adds support for proto version 1 of watch-notify,
so drivers like rbd can be sent a buffer with information like
the notify operation being performed.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add lio rbd to makefile/Kconfig
Mike Christie [Wed, 29 Jul 2015 09:23:55 +0000 (04:23 -0500)]
target: add lio rbd to makefile/Kconfig

Add lio rbd backend module to target Makefile and Kconfig.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add rbd backend
Mike Christie [Wed, 29 Jul 2015 09:23:54 +0000 (04:23 -0500)]
target: add rbd backend

This adds a lio rbd backend. It translates scsi commands to ceph/rados
calls directly instead of going through the block layer (se_cmd -> bio ->
request -> ceph/rados).

It currently still uses the request_queue created by rbd for some setup
in tcm_rbd_configure_device and for matching a rbd device to a lio
se_device in tcm_rbd_set_configfs_dev_params.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: move structs used by lio rbd to new header
Mike Christie [Wed, 29 Jul 2015 09:23:53 +0000 (04:23 -0500)]
rbd: move structs used by lio rbd to new header

This moves structs and other definitions needed by the lio rbd
backend module to a header.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: export some functions used by lio rbd backend
Mike Christie [Wed, 29 Jul 2015 09:23:52 +0000 (04:23 -0500)]
rbd: export some functions used by lio rbd backend

The lio rbd backend will make img_request rbd calls, so this
patch exports the functions it uses.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agolibceph: fix pr_fmt compile issues
Mike Christie [Wed, 29 Jul 2015 09:23:51 +0000 (04:23 -0500)]
libceph: fix pr_fmt compile issues

When using ceph from other modules like the target ones, pr_fmt
might already be defined. This just ifndefs it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add COMPARE_AND_WRITE sg creation helper
Mike Christie [Wed, 29 Jul 2015 09:23:50 +0000 (04:23 -0500)]
target: add COMPARE_AND_WRITE sg creation helper

bc core and the rbd backend driver want seperate scatterlists
for the write phase of COMPARE_AND_WRITE. This moves the sbc
code to a helper function.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: compare and write backend driver sense handling
Mike Christie [Wed, 29 Jul 2015 09:23:49 +0000 (04:23 -0500)]
target: compare and write backend driver sense handling

Currently, backend drivers seem to only fail IO with
SAM_STAT_CHECK_CONDITION which gets us
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE.
For compare and write support we will want to be able to fail with
TCM_MISCOMPARE_VERIFY. This patch adds a new helper that allows backend
drivers to fail with specific sense codes.

It also allows the backend driver to set the miscompare offset.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agotarget: add compare and write callback
Mike Christie [Wed, 29 Jul 2015 09:23:48 +0000 (04:23 -0500)]
target: add compare and write callback

Add a sbc ops callout for compare and write commands, so backends
like rbd which support that commmand can execute it natively.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add support for writesame requests
Mike Christie [Wed, 29 Jul 2015 09:23:47 +0000 (04:23 -0500)]
rbd: add support for writesame requests

This adds support for ceph writesame requests.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agolibceph: add support for write same requests
Mike Christie [Wed, 29 Jul 2015 09:23:46 +0000 (04:23 -0500)]
libceph: add support for write same requests

This adds a new ceph request writesame. Write a buffer of length
writesame.data_length bytes at writesame.offset over writesame.length
bytes.

This command maps to SCSI's WRITE SAME request. In the next patches
rbd and lio will hook in to this support.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add support for COMPARE_AND_WRITE/CMPEXT
Mike Christie [Wed, 29 Jul 2015 09:23:45 +0000 (04:23 -0500)]
rbd: add support for COMPARE_AND_WRITE/CMPEXT

This patch adds support to rbd for SCSI COMPARE_AND_WRITE commands. Higher
levels like LIO will work with IMG_REQ_CMP_AND_WRITE requests, but
rbd breaks it up into CMPEXT and WRITE Ceph requests.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add num ops calculator helper
Mike Christie [Wed, 29 Jul 2015 09:23:44 +0000 (04:23 -0500)]
rbd: add num ops calculator helper

The next patches add new commands that have different
number of ops, so this adds a helper.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add write test helper
Mike Christie [Wed, 29 Jul 2015 09:23:43 +0000 (04:23 -0500)]
rbd: add write test helper

The next patches add a couple new commands that have write data.
This patch adds a helper to combine all the IMG_REQ write tests.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agolibceph: add support for CMPEXT compare extent requests
Mike Christie [Wed, 29 Jul 2015 09:23:42 +0000 (04:23 -0500)]
libceph: add support for CMPEXT compare extent requests

This adds support for the CMPEXT request. The request will compare
extent.length bytes and compare them to extent.length bytes at
extent.offset on disk. If there is a miscompare the osd will return
-EILSEQ, the offset in the buffer where it occurred, and the buffer.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agolibceph: support bidirectional requests
Mike Christie [Wed, 29 Jul 2015 09:23:41 +0000 (04:23 -0500)]
libceph: support bidirectional requests

The next patch will add support for SCSI's compare and write
command. This command sends N bytes, compares them to N bytes on disk,
then returns success or the offset in the buffer where a miscompare
occured. For Ceph support, I implemented this as a multiple op request:

1. a new CMPEXT (compare extent) operation that compare N bytes
and if a miscompare occured then returns the offset it miscompared
and also returns the buffer.
2. a write request. If the CMPEXT succeeds then this will be executed.

This patch modifies libceph so it can support both a request buffer
and response buffer for extent based IO, so the CMPEXT command can
send its comparision buffer and also receive the failed buffer if needed.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add lio specific data area
Mike Christie [Wed, 29 Jul 2015 09:23:40 +0000 (04:23 -0500)]
rbd: add lio specific data area

The LIO RBD backend is going to make img_request calls, so this patch
adds a pointer so it can store its cmd for completions.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agorbd: add support for scatterlist obj_request_type
Mike Christie [Wed, 29 Jul 2015 09:23:39 +0000 (04:23 -0500)]
rbd: add support for scatterlist obj_request_type

This adds support for a scatterlist rbd obj_request_type, so LIO
can pass down its sg to rbd.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agolibceph: add scatterlist messenger data type
Mike Christie [Wed, 29 Jul 2015 09:23:38 +0000 (04:23 -0500)]
libceph: add scatterlist messenger data type

LIO uses scatterlist for its page/data management. This patch
adds a scatterlist messenger data type, so LIO can pass its sg
down directly to rbd.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: David Disseldorp <ddiss@suse.de>
8 years agoMerge tag 'dmaengine-fix-4.2-rc8' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Tue, 18 Aug 2015 19:17:36 +0000 (12:17 -0700)]
Merge tag 'dmaengine-fix-4.2-rc8' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fix from Vinod Koul:
 "We recently found issue with dma_request_slave_channel() API causing
  privatecnt value to go bad.  This is fixed by balancing the privatecnt"

* tag 'dmaengine-fix-4.2-rc8' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: fix balance of privatecnt inc/dec operations

8 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Tue, 18 Aug 2015 14:55:05 +0000 (07:55 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "These came in late last week, I wanted to look over the mst one before
  forwarding, but it seems good.

  Just three i915 and one MST fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Commit planes on each crtc separately.
  drm/i915: calculate primary visibility changes instead of calling from set_config
  drm/i915: Only dither on 6bpc panels
  drm/dp/mst: Remove port after removing connector.

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Linus Torvalds [Mon, 17 Aug 2015 23:26:30 +0000 (16:26 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull rdma bugfix from Doug Ledford:
 "Bugfix in iw_cxgb4"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  iw_cxgb4: gracefully handle unknown CQE status errors

8 years agoMerge branch 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 17 Aug 2015 23:20:45 +0000 (16:20 -0700)]
Merge branch 'for-4.2-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
 "Three minor device-specific fixes and revert of NCQ autosense added
  during this -rc1.

  It turned out that NCQ autosense as currently implemented interferes
  with the usual error handling behavior.  It will be revisited in the
  near future"

* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: ahci_brcmstb: Fix misuse of IS_ENABLED
  sata_sx4: Check return code from pdc20621_i2c_read()
  Revert "libata: Implement NCQ autosense"
  Revert "libata: Implement support for sense data reporting"
  Revert "libata-eh: Set 'information' field for autosense"
  ata: ahci_brcmstb: Fix warnings with CONFIG_PM_SLEEP=n

8 years agoMerge branch 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 17 Aug 2015 23:15:26 +0000 (16:15 -0700)]
Merge branch 'for-4.2-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
 "A fix for a subtle bug introduced back during 3.17 cycle which
  interferes with setting configurations under specific conditions"

* 'for-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: use trialcs->mems_allowed as a temp variable

8 years agodmaengine: fix balance of privatecnt inc/dec operations
Robert Baldyga [Fri, 7 Aug 2015 10:26:47 +0000 (12:26 +0200)]
dmaengine: fix balance of privatecnt inc/dec operations

This patch increments privatecnt value and set DMA_PRIVATE in device
caps in dma_request_slave_channel() function. This is needed to keep
privatecnt increment/decrement balance.

As function dma_release_channel() decrements privatecnt counter, we need
to increment it when channel is requested. Otherwise privatecnt drops
into negatives after few dma_release_channel() calls.

Reported-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 17 Aug 2015 14:57:46 +0000 (07:57 -0700)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

   - a regression caused by the conversion of IPsec ESP to the new AEAD
     interface: ESN with authencesn no longer works because it relied on
     the AD input SG list having a specific layout which is no longer
     the case.  In linux-next authencesn is fixed properly and no longer
     assumes anything about the SG list format.  While for this release
     a minimal fix is applied to authencesn so that it works with the
     new linear layout.

   - fix memory corruption caused by bogus index in the caam hash code.

   - fix powerpc nx SHA hashing which could cause module load failures
     if module signature verification is enabled"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: caam - fix memory corruption in ahash_final_ctx
  crypto: nx - respect sg limit bounds when building sg lists for SHA
  crypto: authencesn - Fix breakage with new ESP code

8 years agoLinux 4.2-rc7
Linus Torvalds [Sun, 16 Aug 2015 23:34:13 +0000 (16:34 -0700)]
Linux 4.2-rc7

8 years agoMerge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 16 Aug 2015 22:44:33 +0000 (15:44 -0700)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A smallish batch of fixes, a little more than expected this late, but
  all fixes are contained to their platforms and seem reasonably low
  risk:

   - a somewhat large SMP fix for ux500 that still seemed warranted to
     include here
   - OMAP DT fixes for pbias regulator specification that broke due to
     some DT reshuffling
   - PCIe IRQ routing bugfix for i.MX
   - networking fixes for keystone
   - runtime PM for OMAP GPMC
   - a couple of error path bug fixes for exynos"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
  ARM: dts: keystone: fix the clock node for mdio
  memory: omap-gpmc: Don't try to save uninitialized GPMC context
  ARM: imx6: correct i.MX6 PCIe interrupt routing
  ARM: ux500: add an SMP enablement type and move cpu nodes
  ARM: dts: dra7: Fix broken pbias device creation
  ARM: dts: OMAP5: Fix broken pbias device creation
  ARM: dts: OMAP4: Fix broken pbias device creation
  ARM: dts: omap243x: Fix broken pbias device creation
  ARM: EXYNOS: fix double of_node_put() on error path
  ARM: EXYNOS: Fix potentian kfree() of ro memory

8 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 16 Aug 2015 22:39:31 +0000 (15:39 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS bugfix from Ralf Baechle:
 "Only a single MIPS fix - the math when invoking syscall_trace_enter
  was wrong"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Fix seccomp syscall argument for MIPS64

8 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 16 Aug 2015 22:11:25 +0000 (15:11 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Merge x86 fixes from Ingo Molnar:
 "Two followup fixes related to the previous LDT fix"

Also applied a further FPU emulation fix from Andy Lutomirski to the
branch before actually merging it.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
  x86/ldt: Further fix FPU emulation
  x86/ldt: Correct FPU emulation access to LDT
  x86/ldt: Correct LDT access in single stepping logic

8 years agox86/ldt: Further fix FPU emulation
Andy Lutomirski [Fri, 14 Aug 2015 22:02:55 +0000 (15:02 -0700)]
x86/ldt: Further fix FPU emulation

The previous fix confused a selector with a segment prefix.  Fix it.

Compile-tested only.

Cc: stable@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 4809146b86c3 ("x86/ldt: Correct FPU emulation access to LDT")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofs/fuse: fix ioctl type confusion
Jann Horn [Sun, 16 Aug 2015 18:27:01 +0000 (20:27 +0200)]
fs/fuse: fix ioctl type confusion

fuse_dev_ioctl() performed fuse_get_dev() on a user-supplied fd,
leading to a type confusion issue. Fix it by checking file->f_op.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'keystone-dts-late-fixes-v2' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Sun, 16 Aug 2015 19:29:57 +0000 (21:29 +0200)]
Merge tag 'keystone-dts-late-fixes-v2' of git://git./linux/kernel/git/ssantosh/linux-keystone into fixes

ARM: Couple of Keysyone MDIO DTS fixes for 4.2-rc6+

These are necessary to get the NIC card working on all Keystone
EVMs. Couple of boards are broken without these two fixes.

* tag 'keystone-dts-late-fixes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone:
  ARM: dts: keystone: Fix the mdio bindings by moving it to soc specific file
  ARM: dts: keystone: fix the clock node for mdio

Signed-off-by: Olof Johansson <olof@lixom.net>
8 years agoMIPS: Fix seccomp syscall argument for MIPS64
Markos Chandras [Thu, 13 Aug 2015 07:47:59 +0000 (08:47 +0100)]
MIPS: Fix seccomp syscall argument for MIPS64

Commit 4c21b8fd8f14 ("MIPS: seccomp: Handle indirect system calls (o32)")
fixed indirect system calls on O32 but it also introduced a bug for MIPS64
where it erroneously modified the v0 (syscall) register with the assumption
that the sycall offset hasn't been taken into consideration. This breaks
seccomp on MIPS64 n64 and n32 ABIs. We fix this by replacing the addition
with a move instruction.

Fixes: 4c21b8fd8f14 ("MIPS: seccomp: Handle indirect system calls (o32)")
Cc: <stable@vger.kernel.org> # 3.15+
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10951/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
8 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 15 Aug 2015 20:54:53 +0000 (13:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This has two libfc fixes for bugs causing rare crashes, one iscsi fix
  for a potential hang on shutdown, and a fix for an I/O blocksize issue
  which caused a regression"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  sd: Fix maximum I/O size for BLOCK_PC requests
  libfc: Fix fc_fcp_cleanup_each_cmd()
  libfc: Fix fc_exch_recv_req() error path
  libiscsi: Fix host busy blocking during connection teardown

8 years agoMerge tag 'topic/drm-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel...
Dave Airlie [Sat, 15 Aug 2015 04:52:12 +0000 (14:52 +1000)]
Merge tag 'topic/drm-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel into drm-next

single MST fixes from Maarten.

* tag 'topic/drm-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel:
  drm/dp/mst: Remove port after removing connector.

8 years agoMerge tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel...
Dave Airlie [Sat, 15 Aug 2015 04:51:31 +0000 (14:51 +1000)]
Merge tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel into drm-next

three display fixes for Intel.

* tag 'drm-intel-fixes-2015-08-14' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Commit planes on each crtc separately.
  drm/i915: calculate primary visibility changes instead of calling from set_config
  drm/i915: Only dither on 6bpc panels

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 15 Aug 2015 00:27:52 +0000 (17:27 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Just two very small & simple patches"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST
  KVM: x86: zero IDT limit on entry to SMM

8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 15 Aug 2015 00:05:26 +0000 (17:05 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  Update maintainers for DRM STI driver
  mm: cma: mark cma_bitmap_maxno() inline in header
  zram: fix pool name truncation
  memory-hotplug: fix wrong edge when hot add a new node
  .mailmap: Andrey Ryabinin has moved
  ipc/sem.c: update/correct memory barriers
  mm/hwpoison: fix panic due to split huge zero page
  ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
  ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
  mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
  mm/hwpoison: fix page refcount of unknown non LRU page

8 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 14 Aug 2015 23:10:04 +0000 (16:10 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clock fix from Stephen Boyd:
 "A one-liner for a regression found in the PXA clock driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: pxa: pxa3xx: fix CKEN register access

8 years agoUpdate maintainers for DRM STI driver
Benjamin Gaignard [Fri, 14 Aug 2015 22:35:24 +0000 (15:35 -0700)]
Update maintainers for DRM STI driver

Add Vincent Abriou and myself as maintainers.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: cma: mark cma_bitmap_maxno() inline in header
Gregory Fong [Fri, 14 Aug 2015 22:35:21 +0000 (15:35 -0700)]
mm: cma: mark cma_bitmap_maxno() inline in header

cma_bitmap_maxno() was marked as static and not static inline, which can
cause warnings about this function not being used if this file is included
in a file that does not call that function, and violates the conventions
used elsewhere.  The two options are to move the function implementation
back to mm/cma.c or make it inline here, and it's simple enough for the
latter to make sense.

Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agozram: fix pool name truncation
Sergey Senozhatsky [Fri, 14 Aug 2015 22:35:19 +0000 (15:35 -0700)]
zram: fix pool name truncation

zram_meta_alloc() constructs a pool name for zs_create_pool() call as

    snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);

However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.

With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:

  debugfs dir <zram100> creation failed
  zram: Error creating memory pool

... and so on.

Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id.  We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomemory-hotplug: fix wrong edge when hot add a new node
Xishi Qiu [Fri, 14 Aug 2015 22:35:16 +0000 (15:35 -0700)]
memory-hotplug: fix wrong edge when hot add a new node

When we add a new node, the edge of memory may be wrong.

e.g. system has 4 nodes, and node3 is movable, node3 mem:[24G-32G],

1. hotremove the node3,
2. then hotadd node3 with a part of memory, mem:[26G-30G],
3. call hotadd_new_pgdat()
        free_area_init_node()
                get_pfn_range_for_nid()
4. it will return wrong start_pfn and end_pfn, because we have not
update the memblock.

This patch also fixes a BUG_ON during hot-addition, please see
http://marc.info/?l=linux-kernel&m=142961156129456&w=2

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years ago.mailmap: Andrey Ryabinin has moved
Andrey Ryabinin [Fri, 14 Aug 2015 22:35:13 +0000 (15:35 -0700)]
.mailmap: Andrey Ryabinin has moved

Update my email address.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoipc/sem.c: update/correct memory barriers
Manfred Spraul [Fri, 14 Aug 2015 22:35:10 +0000 (15:35 -0700)]
ipc/sem.c: update/correct memory barriers

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/hwpoison: fix panic due to split huge zero page
Wanpeng Li [Fri, 14 Aug 2015 22:35:08 +0000 (15:35 -0700)]
mm/hwpoison: fix panic due to split huge zero page

Bug:

  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:1957!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
  CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
  Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
  task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000
  RIP: split_huge_page_to_list+0xdb/0x120
  Call Trace:
    memory_failure+0x32e/0x7c0
    madvise_hwpoison+0x8b/0x160
    SyS_madvise+0x40/0x240
    ? do_page_fault+0x37/0x90
    entry_SYSCALL_64_fastpath+0x12/0x71
  Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
  RIP  split_huge_page_to_list+0xdb/0x120
   RSP <ffff8800db16fde8>
  ---[ end trace aee7ce0df8e44076 ]---

Testcase:

    #define _GNU_SOURCE
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/mman.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <sys/types.h>
    #include <errno.h>
    #include <string.h>

    #define MB 1024*1024

    int main(void)
    {
            char *mem;

            posix_memalign((void **)&mem, 2 * MB, 200 * MB);

            madvise(mem, 200 * MB, MADV_HWPOISON);

            free(mem);

            return 0;
    }

Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
The get_user_pages_fast() which called in madvise_hwpoison() will get
huge zero page if the page is not allocated before.  Huge zero page is a
tranparent huge page, however, it is not an anonymous page.
memory_failure will split the huge zero page and trigger
BUG_ON(is_huge_zero_page(page));

After commit 98ed2b0052e6 ("mm/memory-failure: give up error handling
for non-tail-refcounted thp"), memory_failure will not catch non anon
thp from madvise_hwpoison path and this bug occur.

Fix it by catching non anon thp in memory_failure in order to not split
huge zero page in madvise_hwpoison path.

After this patch:

  Injecting memory failure for page 0x202800 at 0x7fd8ae800000
  MCE: 0x202800: non anonymous thp
  [...]

[akpm@linux-foundation.org: remove second split, per Wanpeng]
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:05 +0000 (15:35 -0700)]
ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()

After we acquire the sma->sem_perm lock in exit_sem(), we are protected
against a racing IPC_RMID operation.  Also at that point, we are the last
user of sem_undo_list.  Therefore it isn't required that we acquire or use
ulp->lock.

Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
Herton R. Krzesinski [Fri, 14 Aug 2015 22:35:02 +0000 (15:35 -0700)]
ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held
Wanpeng Li [Fri, 14 Aug 2015 22:34:59 +0000 (15:34 -0700)]
mm/hwpoison: fix fail isolate hugetlbfs page w/ refcount held

Hugetlbfs pages will get a refcount in get_any_page() or
madvise_hwpoison() if soft offlining through madvise.  The refcount which
is held by the soft offline path should be released if we fail to isolate
hugetlbfs pages.

Fix it by reducing the refcount for both isolation success and failure.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/hwpoison: fix page refcount of unknown non LRU page
Wanpeng Li [Fri, 14 Aug 2015 22:34:56 +0000 (15:34 -0700)]
mm/hwpoison: fix page refcount of unknown non LRU page

After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.

Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Aug 2015 18:06:43 +0000 (11:06 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "A single clocksource driver suspend/resume fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents/drivers/sh_cmt: Only perform clocksource suspend/resume if enabled

8 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Aug 2015 17:57:16 +0000 (10:57 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Misc fixes: PMU driver corner cases, tooling fixes, and an 'AUX'
  (Intel PT) race related core fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/cqm: Do not access cpu_data() from CPU_UP_PREPARE handler
  perf/x86/intel: Fix memory leak on hot-plug allocation fail
  perf: Fix PERF_EVENT_IOC_PERIOD migration race
  perf: Fix double-free of the AUX buffer
  perf: Fix fasync handling on inherited events
  perf tools: Fix test build error when bindir contains double slash
  perf stat: Fix transaction lenght metrics
  perf: Fix running time accounting

8 years agoMerge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 14 Aug 2015 17:45:23 +0000 (10:45 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull locking fix from Ingo Molnar:
 "A single fix for a locking self-test crash"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/pvqspinlock: Fix kernel panic in locking-selftest

8 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 14 Aug 2015 17:39:32 +0000 (10:39 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Back from holidays, found these in the cracks: one nouveau revert, one
  vmwgfx locking fix and a bunch of exynos fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"
  drm/vmwgfx: Fix execbuf locking issues
  drm/exynos/fimc: fix runtime pm support
  drm/exynos/mixer: always update INT_EN cache
  drm/exynos/mixer: correct vsync configuration sequence
  drm/exynos/mixer: fix interrupt clearing
  drm/exynos/hdmi: fix edid memory leak
  drm/exynos: gsc: fix wrong bitwise operation for swap detection

8 years agoRevert "drm/nouveau/fifo/gk104: kick channels when deactivating them"
Alexandre Courbot [Wed, 12 Aug 2015 04:17:38 +0000 (13:17 +0900)]
Revert "drm/nouveau/fifo/gk104: kick channels when deactivating them"

This reverts commit 1addc1264852

This commit seems to cause crashes in gk104_fifo_intr_runlist() by
returning 0xbad0da00 when register 0x2a00 is read. Since this commit was
intended for GM20B which is not completely supported yet, let's revert
it for the time being.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Afzal Mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agodrm/vmwgfx: Fix execbuf locking issues
Thomas Hellstrom [Wed, 12 Aug 2015 05:31:17 +0000 (22:31 -0700)]
drm/vmwgfx: Fix execbuf locking issues

This addresses two issues that cause problems with viewperf maya-03 in
situation with memory pressure.

The first issue causes attempts to unreserve buffers if batched
reservation fails due to, for example, a signal pending. While previously
the ttm_eu api was resistant against this type of error, it is no longer
and the lockdep code will complain about attempting to unreserve buffers
that are not reserved. The issue is resolved by avoid calling
ttm_eu_backoff_reservation in the buffer reserve error path.

The second issue is that the binding_mutex may be held when user-space
fence objects are created and hence during memory reclaims. This may cause
recursive attempts to grab the binding mutex. The issue is resolved by not
holding the binding mutex across fence creation and submission.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoMerge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Dave Airlie [Thu, 13 Aug 2015 23:47:07 +0000 (09:47 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes

   This pull request fixes memory leak and some issues related to
   mixer and gscaler driver issues.

* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos/fimc: fix runtime pm support
  drm/exynos/mixer: always update INT_EN cache
  drm/exynos/mixer: correct vsync configuration sequence
  drm/exynos/mixer: fix interrupt clearing
  drm/exynos/hdmi: fix edid memory leak
  drm/exynos: gsc: fix wrong bitwise operation for swap detection

8 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Thu, 13 Aug 2015 23:34:56 +0000 (16:34 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Another few small ARM fixes, mostly addressing some VDSO issues"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8410/1: VDSO: fix coarse clock monotonicity regression
  ARM: 8409/1: Mark ret_fast_syscall as a function
  ARM: 8408/1: Fix the secondary_startup function in Big Endian case
  ARM: 8405/1: VDSO: fix regression with toolchains lacking ld.bfd executable

8 years agox86: fix error handling for 32-bit compat out-of-range system call numbers
Linus Torvalds [Thu, 13 Aug 2015 23:19:44 +0000 (16:19 -0700)]
x86: fix error handling for 32-bit compat out-of-range system call numbers

Commit 3f5159a9221f ("x86/asm/entry/32: Update -ENOSYS handling to match
the 64-bit logic") broke the ENOSYS handling for the 32-bit compat case.
The proper error return value was never loaded into %rax, except if
things just happened to go through the audit paths, which ended up
reloading the return value.

This moves the loading or %rax into the normal system call path, just to
make sure the error case triggers it.  It's kind of sad, since it adds a
useless instruction to reload the register to the fast path, but it's
not like that single load from the stack is going to be noticeable.

Reported-by: David Drysdale <drysdale@google.com>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>