obnox/glusterfs.git
8 years agobuild: Fix autoconf warnings
Anoop C S [Wed, 14 Oct 2015 07:41:37 +0000 (13:11 +0530)]
build: Fix autoconf warnings

This patch avoids the following warnings on running autogen script

. . .
Running autoconf...
configure.ac:896: warning: AC_LANG_CONFTEST: no AC_LANG_SOURCE call detected in
body
../../lib/autoconf/lang.m4:193: AC_LANG_CONFTEST is expanded from...
configure.ac:896: the top level
. . .

This change uses AC_LINK_IFELSE for checking the atomic built-in function
support. Since AC_COMPILE_IFELSE checks for syntactical errors only, we need
to use AC_LINK_IFELSE to achieve the same which is more appropriate.

Reference links:
[1] https://autotools.io/forwardporting/autoconf.html
[2] http://www.gnu.org/software/autoconf/manual/autoconf.html#Generating-Sources
[3] http://www.gnu.org/software/autoconf/manual/autoconf.html#Running-the-Compiler

Change-Id: I4597f2976623496745b66f98bb78a0c9f1b07f79
BUG: 1198849
Signed-off-by: Anoop C S <anoopcs@redhat.com>
Reviewed-on: http://review.gluster.org/12351
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
8 years agodht: heal directory path if the directory is not present
Mohammed Rafi KC [Thu, 15 Oct 2015 06:31:14 +0000 (12:01 +0530)]
dht: heal directory path if the directory is not present

After a successful nameless lookup if the directory is not
present on any of the subvol, then we will get the path of
the directory and will recursively send a named lookp on
each parent directory.

This will help particularly for the scenarios like add brick
and attach-tier.

Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12376
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agodht: update cached subvolume during readdirp cbk
Mohammed Rafi KC [Wed, 28 Oct 2015 17:40:00 +0000 (23:10 +0530)]
dht: update cached subvolume during readdirp cbk

This reverts commit bb2370514598a99e6ab268af81df57dc16caa2c5.

issue and impact: readdirp_cbk was not resetting the layout for files,
this causes problem if the files is moved from one cached subvolume
and if the layout was not proper, then there is chance to fail
entry fops if the fops executed with out a lookup. Because the
cached subvolume will not change and the application assumes the
presence of file in cached subvol. so it fails with ENOENT.

The patch preset the layout information in readdirp cbk
for each files in the entry. That leaves the problem the commit
bb2370514598a99e6ab268af81df57dc16caa2c5 try to fix. We will fix the
problem in a separate patch.

Change-Id: I878ec32f44edde2fb9d4f132d9b1b547cde993d9
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12449
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agogeo-rep: Fix syncing chown in xsync crawl
Kotresh HR [Mon, 21 Sep 2015 09:21:13 +0000 (14:51 +0530)]
geo-rep: Fix syncing chown in xsync crawl

GEO-REP INTEROP WITH SHARD FEATURE

Problem:
    The sequence of entry creation and chown in master
 is recorded as creation of entry with resulted
 user:group in xsync changelog. During sync, entry
 creation is always split into two ops, MKNOD and
 SETATTR. Hence the issue is not being hit otherwise
 it would have failed with EPERM if parent is owned
 by different user. But with shard translator being
 enabled on slave, doing entry creation with MKNOD and
 SETATTR is not allowed, SETATTR fails as it accesses
 inode structure which is not linked.

Solution:
    The sequence of entry creation and chown in master
 should be recorded as MKNOD and SETATTR separately always
 and do entry creation with single op in gfid-access
 xlator. The gfid-access patch will be sent separately.

Change-Id: I93e554bf9342397a7660503f5128e9709f8a0cd8
BUG: 1265148
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/12205
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
8 years agoRevert "fuse: resolve complete path after a graph switch"
Mohammed Rafi KC [Thu, 15 Oct 2015 14:10:55 +0000 (19:40 +0530)]
Revert "fuse: resolve complete path after a graph switch"

This reverts commit d0edb6d555d687f76837515207b9408be0bdd55e.
The same functionality will be provided in a different patch

Change-Id: I3139478b218fa32e803bb088df585fbbdf94af34
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12375
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agocluster/tier correct promotion cycle calculation
Dan Lambright [Sun, 1 Nov 2015 15:22:00 +0000 (10:22 -0500)]
cluster/tier correct promotion cycle calculation

The tier translator should only choose candidate files for promotion
from the most recent cycle, not a multiple of the most recent cycles.
Otherwise user observed behavior can be inconsistent. Remove related
test in tier.t that is subject to race condition.

Change-Id: I9ad1523cac00f904097ce468efa6ddd515857024
BUG: 1275524
Signed-off-by: root <root@rhs-cli-15.gdev.lab.eng.bos.redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12480
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agotiering:Message shown in gluster vol tier <volname> status output is incorrect
Mohamed Ashiq [Fri, 6 Nov 2015 18:49:48 +0000 (00:19 +0530)]
tiering:Message shown in gluster vol tier <volname> status output is incorrect

Change-Id: I15a1a637090f1cc2f200d5c3582317e4aa3cf334
BUG: 1278927
Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com>
Reviewed-on: http://review.gluster.org/12532
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agotier/ctr: Ignore CTR Lookup heal insert errors
Joseph Fernandes [Tue, 3 Nov 2015 06:38:16 +0000 (12:08 +0530)]
tier/ctr: Ignore CTR Lookup heal insert errors

CTR doesnt read from the DB, so to make sure that file records are
created it does a heal during a lookup. It remembers the decision in
the inode context cache and retrys periodically. When the volume is
restarted it looses all the inode cache from the previous time and CTR
lookup heals tries the heal again, but this time it finds that the records
are already there from sql and logs this error, and remembers this until the
volume is restarted or inode is flushed out of inode cache of the brick.

Solution: the log levels should be reduced to trace for this case and
customers need not see this.

Change-Id: I67b568fb6904f8597e2c6d32894a247c4f500b94
BUG: 1277352
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12491
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agosnapshot: Add bug-1275616.t to bad test list
N Balachandran [Fri, 6 Nov 2015 08:53:07 +0000 (14:23 +0530)]
snapshot: Add bug-1275616.t to bad test list

bug-1275616.t fails spuriously in regression tests

Change-Id: Iea01476a9ffd811091865196e1536361d2298ab7
BUG: 1278418
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12527
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agoquota: fix for spurious failure
vmallika [Fri, 6 Nov 2015 07:47:26 +0000 (13:17 +0530)]
quota: fix for spurious failure

Filed a bug# 1278689.
For now marking the testcase tests/bugs/quota/bug-1235182.t' bad
once the bug# 1278689, remove the testcase from bad list

Change-Id: I224f907153d3e5f35834007a40b0050246d8787a
BUG: 1278689
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/12526
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agotier/libgfdb: Replacing ASCII query file with binary
Joseph Fernandes [Tue, 13 Oct 2015 18:30:41 +0000 (00:00 +0530)]
tier/libgfdb: Replacing ASCII query file with binary

Earlier, when the database was queried we used to save
all the queried records in an ASCII format in the query file.
This caused issues like filename having ASCII delimiter and used
to take a lot of space. The tier.c file also had a lot of parsing code.

Here we changed the format of the query file to binary.
All the logic of serialization and formating of query record is done
by libgfdb. Libgfdb provides API,
gfdb_write_query_record() and gfdb_read_query_record(),
which the user i.e tier migrator and CTR xlator can use to
write to and read from query file.
With this binary format we save on disk space i.e reduce to 50% atleast
as we are saving GFID's in binary format 16 bytes and not the string format
which takes 36 bytes + We are not saving path of the file + we are also saving on
ASCII delimiters.

The on disk format of query record is as follows,

+---------------------------------------------------------------------------+
| Length of serialized query record |       Serialized Query Record         |
+---------------------------------------------------------------------------+
             4 bytes                     Length of serialized query record
                                                      |
                                                      |
     -------------------------------------------------|
     |
     |
     V
   Serialized Query Record Format:
   +---------------------------------------------------------------------------+
   | GFID |  Link count   |  <LINK INFO>  |.....                      | FOOTER |
   +---------------------------------------------------------------------------+
     16 B        4 B         Link Length                                  4 B
                                |                                          |
                                |                                          |
   -----------------------------|                                          |
   |                                                                       |
   |                                                                       |
   V                                                                       |
   Each <Link Info> will be serialized as                                  |
   +-----------------------------------------------+                       |
   | PGID | BASE_NAME_LENGTH |      BASE_NAME      |                       |
   +-----------------------------------------------+                       |
     16 B       4 B             BASE_NAME_LENGTH                           |
                                                                           |
                                                                           |
   ------------------------------------------------------------------------|
   |
   |
   V
   FOOTER is a magic number 0xBAADF00D indicating the end of the record.
   This also serves as a serialized schema validator.

Change-Id: I9db7416fd421e118dd44eafab8b535caafe50d5a
BUG: 1272207
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12354
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agotests/tier : Corrected filename in run-tests.sh
N Balachandran [Fri, 6 Nov 2015 09:06:37 +0000 (14:36 +0530)]
tests/tier : Corrected filename in run-tests.sh

bug-1214222-directories_miising_after_attach_tier.t
was renamed to bug-1214222-directories_missing_after_attach_tier.t
 but run-tests.sh was not updated.

Change-Id: I64d6475ffb08e3252e56b4083cb0e828ba3584d9
BUG: 1278709
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12528
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agosnapshot: Making bug-1275616.t more regression failure tolerant
Avra Sengupta [Thu, 5 Nov 2015 12:49:14 +0000 (18:19 +0530)]
snapshot: Making bug-1275616.t more regression failure tolerant

snapshot clone creation fails 'spuriously' on the
regression setup coz the brick rpc connect for snap3
in the testcase, happens way after the snap was created.

So adding a EXPECT_WITHIN $PROCESS_UP_TIMEOUT check(read delay)
to help the cause. But this isn't a 100% guaranteed fix, as on an
even slower machine, even this check will fail followed by the
subsequent failures that this patch is trying to fix in the first place

Change-Id: I2f31558b717fd610111f14e451fe444c09f3f254
BUG: 1278418
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/12516
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
8 years agocluster/tier: fix lookup-unhashed on tiered volumes
Dan Lambright [Tue, 3 Nov 2015 23:27:18 +0000 (18:27 -0500)]
cluster/tier: fix lookup-unhashed on tiered volumes

During attach tier the commit hash must be copied to the hot tier.

Change-Id: I91b92fd8e98696993433856e1436409b657c439d
BUG: 1277716
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12498
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agotier/ctr: Ignore bitrot related fops
Joseph Fernandes [Thu, 5 Nov 2015 09:29:54 +0000 (14:59 +0530)]
tier/ctr: Ignore bitrot related fops

Ignore bitrot related fops since they are internal fops.

Change-Id: I5db8cf4e3fa1b186a6987eed54287bc0e964fbd4
BUG: 1278326
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12512
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agoxlators/nfs add mount-nfs-auth.t to ignored tests list
Dan Lambright [Thu, 5 Nov 2015 15:04:42 +0000 (10:04 -0500)]
xlators/nfs add mount-nfs-auth.t to ignored tests list

mount-nfs-auth.t fails spuriously in regression. After discussion with
NFS leads agree to put it into ignored list until the problem is solved.

Change-Id: I44efc3332409ef963819f31d1775138d8a04a0f9
BUG: 1278476
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12521
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agocluster/ec: Fix bad management of lock owners
Xavier Hernandez [Wed, 28 Oct 2015 13:00:41 +0000 (14:00 +0100)]
cluster/ec: Fix bad management of lock owners

Since the addition of parallel reads patch for ec, a lock can have
more than one owner at the same time. The list of owners was stored
inside the 'owner_list' field of each fop.

The problem was with fops that required more than one lock (like
rename). In this case the same field was used to add the fop to
more than one list, casing an overwrite of the previous list.

This has been solved moving the 'owner_list' field from ec_fop_data_t
to ec_lock_link_t structure.

Change-Id: I6042129f09082497b80782b5704a52c35c78f44d
BUG: 1276031
Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-on: http://review.gluster.org/12445
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agomgmt/glusterd: Store arbiter-count and restore it
Pranith Kumar K [Fri, 30 Oct 2015 10:26:14 +0000 (15:56 +0530)]
mgmt/glusterd: Store arbiter-count and restore it

Problem:
1) Glusterd doesn't remember about arbiter information of replica volume in
   store.  When glusterd goes down and comes backup, arbiter volumes will
   become replica volumes.

2) Glusterd doesn't import/export arbiter information to/from the other peers.

3) Volume info doesn't show any arbiter count in the output.

Fix:
1) Persist arbiter information in glusterd-store
2) Import/Export arbiter information of the volume
3) Change volume info output to show arbiter count.

Change-Id: I2db81e73d2694b01f7d07b08a17b41ad5a55c361
BUG: 1276675
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12475
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agotier/dht: Ignoring replica for migration counting
Joseph Fernandes [Thu, 29 Oct 2015 06:36:57 +0000 (12:06 +0530)]
tier/dht: Ignoring replica for migration counting

We used to count replica files for migration counting even though
they were ignore for migration as the replica brick didnt have
the ownership (as per the replication xlator either AFR/EC).
As a result the number of files migrated would show a wrong count,
i.e each replicated file would be counted 1 + number of replica.

This patch ignores such cases.

Change-Id: I91aa352ee3b0a5029790653266e9333f3947d0ac
BUG: 1276141
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12453
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agov info for disperse count fails while upgrading
Hari Gowtham [Tue, 3 Nov 2015 12:53:56 +0000 (18:23 +0530)]
v info for disperse count fails while upgrading

The upgrade from 3.7.5-3 to 3.7.5-5 causes the type and number
of bricks for the cold tier to be printed wrong.

Change-Id: Ia45b97c35fef88f9c66e15e5bdb93fd30cb342af
BUG: 1277481
Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/12495
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agogeo-rep: Update last_synced_time in XSync
Aravinda VK [Tue, 28 Jul 2015 09:22:00 +0000 (14:52 +0530)]
geo-rep: Update last_synced_time in XSync

During XSync crawl, last_synced time in status file was not updated.
This patch fixes the issue by updating status file when stime xattr
is updated after Xsync or Changelog Crawl.

Change-Id: I4dc3a2d4c3d8378a939da0868caf1aef4f789599
Signed-off-by: Aravinda VK <avishwan@redhat.com>
BUG: 1247536
Reviewed-on: http://review.gluster.org/11771
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agoglusterd: move new feature (tiering) enum op to the last of the array
Gaurav Kumar Garg [Fri, 30 Oct 2015 11:09:16 +0000 (16:39 +0530)]
glusterd: move new feature (tiering) enum op to the last of the array

Currently new feature tiering have GD_OP_DETACH_TIER and GD_OP_TIER_MIGRATE
enum in the middle of the glusterd_op_ enum array. In multi nodes
cluster when one of the node upgraded from lower version to higher
version and upon executing command can end up in a mismatch in enum ops
at the receiver ends causing command execution fail.

Fix is to put every new feature glusterd operation enum code to last of
the enum array.

Change-Id: I640f811065e8c84add624237aa80fed43fde5967
BUG: 1276643
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/12473
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
8 years agocorrection of message displayed after attach tier
hari gowtham [Mon, 2 Nov 2015 09:51:19 +0000 (15:21 +0530)]
correction of message displayed after attach tier

the message after attach tier is saying rebalance.
It is changed according to tiering.

Change-Id: I1834511f86483fa60f404d7defe5be59c025e9d6
BUG: 1277081
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/12488
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agocluster/tier : Files skipped during tier query parsing
N Balachandran [Fri, 30 Oct 2015 07:16:22 +0000 (12:46 +0530)]
cluster/tier : Files skipped during tier query parsing

The tier query parsing code was using fscanf to read each record.
As space is a delimiter for fscanf, filenames containing spaces
caused the parsing to return unexpected values causing various
issues in the tier process, including crashes due to buffer
 overflows.

Change-Id: Ife602cb7ecb158fccbc2c89e4d2959bd97098a87
BUG: 1276562
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12469
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agoglusterd : vol replace-brick fails when transport.socket.bind-address is set in glusterd
Mohamed Ashiq Liyazudeen [Thu, 29 Oct 2015 15:10:00 +0000 (20:40 +0530)]
glusterd : vol replace-brick fails when transport.socket.bind-address is set in glusterd

Change-Id: Id8c29aa46b526bc003a1d7023714b67805e35a99
BUG: 1276386
Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud@redhat.com>
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/12461
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
8 years agosnapshot: Inherit snap-max-hard-limit from original volume
Avra Sengupta [Wed, 28 Oct 2015 07:00:34 +0000 (12:30 +0530)]
snapshot: Inherit snap-max-hard-limit from original volume

A snapshot should inherit snap-max-hard-limit from the original
volume while being created and when being restored to, it should
restore the same.

Similarly a clone taken from a snapshot should inherit
snap-max-hard-limit from the snapshot.

Change-Id: If8e90e2ffc10e22086b803ac8e2638a16bcec968
BUG: 1275616
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/12437
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
8 years agosnapshot: Don't display snapshot's hard-limit and soft-limit in vol info
Avra Sengupta [Wed, 28 Oct 2015 09:48:07 +0000 (15:18 +0530)]
snapshot: Don't display snapshot's hard-limit and soft-limit in vol info

The snap-max-hard-limit being displayed in the volume info
currently is propagated from system's snap-max-hard-limit as
that is a global option common for all volumes, and hence ends
up showing the system's snap-max-hard-limit.

We should not be displaying snap-max-hard-limit and
snap-max-soft-limit in the volume info at all, as these are
snap config options and should be set and displayed via snap
config command.

Modified bug-1113476.t to test the same behaviour.

Change-Id: I90891f0cf7fb39fd686787297c7f7cd8c1e7daa1
BUG: 1276018
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/12443
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
8 years agoio-stats: fix BSD build failure
Atin Mukherjee [Mon, 2 Nov 2015 05:33:26 +0000 (11:03 +0530)]
io-stats: fix BSD build failure

Change-Id: Ieb372cb686d32a09c6df31ec849f1b3c52e0e1cd
BUG: 1277024
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/12484
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
8 years agotests: Move ec-readdir.t to bad tests
Pranith Kumar K [Mon, 2 Nov 2015 02:26:51 +0000 (07:56 +0530)]
tests: Move ec-readdir.t to bad tests

Change-Id: Ie7f6d25cbc617ff347aeb7d77fc0a60924c83f09
BUG: 1276989
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12481
Tested-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
8 years agoquota: add version to quota xattrs
vmallika [Thu, 15 Oct 2015 07:11:13 +0000 (12:41 +0530)]
quota: add version to quota xattrs

When a quota is disable and the clean-up process terminated
without completely cleaning-up the quota xattrs.
Now when quota is enabled again, this can mess-up the accounting

A version number is suffixed for all quota xattrs and this version
number is specific to marker xaltor, i.e when quota xattrs are
requested by quotad/client marker will remove the version suffix in the
key before sending the response

Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb
BUG: 1272411
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/12386
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agodebug/io-stats: Add FOP sampling feature
Richard Wareing [Wed, 24 Jun 2015 00:03:11 +0000 (17:03 -0700)]
debug/io-stats: Add FOP sampling feature

Summary:
- Using sampling feature you can record details about every Nth FOP.
  The fields in each sample are: FOP type, hostname, uid, gid, FOP priority,
  port and time taken (latency) to fufill the request.
- Implemented using a ring buffer which is not (m/c) allocated in the IO path,
  this should make the sampling process pretty cheap.
- DNS resolution done @ dump time not @ sample time for performance w/
  cache
- Metrics can be used for both diagnostics, traffic/IO profiling as well
  as P95/P99 calculations
- To control this feature there are two new volume options:
  diagnostics.fop-sample-interval - The sampling interval, e.g. 1 means
  sample every FOP, 100 means sample every 100th FOP
  diagnostics.fop-sample-buf-size - The size (in bytes) of the ring
  buffer used to store the samples.  In the even more samples
  are collected in the stats dump interval than can be held in this buffer,
  the oldest samples shall be discarded.  Samples are stored in the log
  directory under /var/log/glusterfs/samples.
- Uses DNS cache written by sshreyas@fb.com (Thank-you!), the DNS cache
  TTL is controlled by the diagnostics.stats-dnscache-ttl-sec option
  and defaults to 24hrs.

Test Plan:
- Valgrind'd to ensure it's leak free
- Run prove test(s)
- Shadow testing on 100+ brick cluster

Change-Id: I9ee14c2fa18486b7efb38e59f70687249d3f96d8
BUG: 1271310
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/12210
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
8 years agocore: fix Ubuntu code audit (cppcheck) results
Kaleb S. KEITHLEY [Wed, 3 Jun 2015 13:59:30 +0000 (09:59 -0400)]
core: fix Ubuntu code audit (cppcheck) results

This change includes an additional fix (forward port) of a fix
made on the release-3.x branches to address a comment made after
the original change was merged on the master branch.

* release-3.7
* Change-Id: Ie15c5919e5bf9b0a1c66e20dc42d80fdfa8bd7f4
* BZ: 1227808
*  http://review.gluster.org/11069

Change-Id: I4fc2672ab1a17998b2e40bc43eb6a3e15058a086
BUG: 1109180
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/11067
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
8 years agoglusterd: fix info file checksum mismatch during upgrade
anand [Thu, 29 Oct 2015 16:06:57 +0000 (21:36 +0530)]
glusterd: fix info file checksum mismatch during upgrade

issue: probing a new  node(>=3.6)  from 3.5 cluster is moving the peer to rejected state.

fix: Disperse vol support is added from 3.6 release, so write disperse fields (disperse_count=0
and redundancy_count=0) in vol info file  only if cluster version supported.

Change-Id: I11d5e2e337b9bbaddc8e52ca7295ba481beb1132
BUG: 1276423
Signed-off-by: anand <anekkunt@redhat.com>
Reviewed-on: http://review.gluster.org/12464
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
8 years agoafr/glusterd: Fix naming issue in tier related changes
Mohammed Rafi KC [Tue, 27 Oct 2015 04:07:56 +0000 (09:37 +0530)]
afr/glusterd: Fix naming issue in tier related changes

changing some of the function names added recently as
part of the tiering changes.

Change-Id: I238831128ee00cdf83f8a80be937d3528d133099
BUG: 1275489
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12431
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agosnapshot : copying nfs-ganesha export file
Jiffin Tony Thottan [Thu, 27 Aug 2015 17:56:40 +0000 (23:26 +0530)]
snapshot : copying nfs-ganesha export file

While taking snapshot, the export file used by the volume should
copy to snap directory. So that when restore of snapshot happens,
the volume can retain all its configuration for exporting via
nfs-ganesha. The export file is stored at "/etc/ganesha/export" in
the following format "export.<volname>.conf"

The fix handles given cases in the following manner :

case a: The nfs-ganesha(global) is ON during snapshot and restore.
        i.) Volume was exported during snapshot. When we restore snapshot,
            then volume should be exported back with old configuration file.
        ii.) Volume was unexported during snapshot. When we restore snapshot,
             then volume should unexported again.

case b: The nfs-ganesha is ON during snapshot and OFF during restore
        Volume was exported during snapshot. When we restore snapshot, the
        conf will be copied to corresponding location and if nfs-ganesha enabled
        again, then volume will be exported.

For the clones, export conf file will created in /etc/ganesha/export and then
export it via ganesha.

Change-Id: Ideecda15bd4db58e991cf6c8de7bb93f3db6cd20
BUG: 1257709
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: http://review.gluster.org/12034
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
8 years agocluster/tier enable CTR on attach tier
Dan Lambright [Fri, 23 Oct 2015 16:10:42 +0000 (12:10 -0400)]
cluster/tier enable CTR on attach tier

CTR is currently disabled by default, and must be manually enabled
for tiering to start. This is an overhead on the administrator and
easy to overlook. Enable it automatically when a tier is attached.

Change-Id: I0c29de8762faec1bfe6d1376a57eeef3357ad15a
BUG: 1274847
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12420
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
8 years agonfs : avoid invalid usage of `cs` variable in nfs fops
Jiffin Tony Thottan [Thu, 29 Oct 2015 06:59:04 +0000 (12:29 +0530)]
nfs : avoid invalid usage of `cs` variable in nfs fops

Due to changes from http://review.gluster.org/#/c/12162/ a path variable
is added to nfs3_log_common_res() and usually `cs->resolvedloc.path` is
passed for that. But in certain fop function `cs` may not filled due error
and when it is logged using nfs3_log_common_res() results in a crash.
This patch will fix the same.

Change-Id: I5a709818923e7884bd04e329834ee352a1b3a58f
BUG: 1276243
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: http://review.gluster.org/12458
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
8 years agogeo-rep: Handle FXATTROP and XATTROP
Kotresh HR [Thu, 24 Sep 2015 07:29:08 +0000 (12:59 +0530)]
geo-rep: Handle FXATTROP and XATTROP

GEO-REP INTEROP WITH SHARD FEATURE

If it is FXATTROP or XATTROP in changelog,
add the gfid to rsync queue.

Change-Id: If68d38d7ed00b70a4618cfcc8e75df3fbadbf724
BUG: 1265148
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/12226
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
8 years agofuse: Avoid redundant lookup on "." and ".."
Susant Palai [Fri, 16 Oct 2015 13:57:22 +0000 (09:57 -0400)]
fuse: Avoid redundant lookup on "." and ".."
credit: R. Gowdappa

Change-Id: I3bc1534e499f2eccd114db69a29c0b2ce82775db
BUG: 1273315
Reviewed-on: http://review.gluster.org/12374
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
8 years agocluster/dht/rebalance: rebalance failure handling
Susant Palai [Wed, 26 Aug 2015 08:49:29 +0000 (04:49 -0400)]
cluster/dht/rebalance: rebalance failure handling

At current state rebalance aborts basically on any failure
like fix-layout of a directory, readdirp, opendir etc. Unless it is
not a remove-brick process we can ignore these failures.

Major impact:  Any failure in the gf_defrag_process_dir means there
are files left unmigrated in the directory.

Fix-layout(setxattr) failure will impact it's child subtree i.e.
the child subtree will not be rebalanced.

Settle-hash (commit-hash)failure will trigger lookup_everywhere for
immediate children until the next commit-hash.

Note: Remove-brick opertaion is still sensitive to any kind of failure.

Change-Id: I08ab71909bc832f03cc1517172525376f7aed14a
BUG: 1257076
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/12013
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agocluster/afr: disable self-heal lock compatibility for arbiter volumes
Pranith Kumar K [Mon, 26 Oct 2015 11:26:25 +0000 (16:56 +0530)]
cluster/afr: disable self-heal lock compatibility for arbiter volumes

Problem:
afrv2 takes locks from infinity-2 to infinity-1 to be compatible with <=3.5.x
clients. For arbiter volumes this leads to problems as the I/O takes full file
locks.

Solution:
Don't be compatible with <=3.5.x clients on arbiter volumes as arbiter volumes
are introduced in 3.7

Change-Id: I48d6aab2000cab29c0c4acbf0ad356a3fa9e7bab
BUG: 1275247
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12426
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
8 years agocore: use syscall wrappers instead of direct syscalls
Kaleb S. KEITHLEY [Thu, 1 Oct 2015 20:16:52 +0000 (16:16 -0400)]
core: use syscall wrappers instead of direct syscalls

various xlators and other components are invoking system calls
directly instead of using the libglusterfs/syscall.[ch] wrappers.

If not using the system call wrappers there should be a comment
in the source explaining why the wrapper isn't used.

Change-Id: I8ef94c48728666465abf126c778b70c9e5c00e47
BUG: 1267967
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12273
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agocli : 'gluster volume help' output sorted alphabetically
Mohamed Ashiq [Wed, 15 Jul 2015 08:49:49 +0000 (14:19 +0530)]
cli : 'gluster volume help' output sorted alphabetically

'gluster volume help' output is not sorted alphabetically.
This makes little harder for the user to search or get to know of
few gluster volume commands usage just from gluster cli.

Change-Id: I855da2e4748a5c2ff3be319c50fa9548d676ee8a
BUG: 1242894
Signed-off-by: Mohamed Ashiq <mliyazud@redhat.com>
Reviewed-on: http://review.gluster.org/11663
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
8 years agocore: use syscall wrappers instead of direct syscalls - miscellaneous
Kaleb S. KEITHLEY [Thu, 1 Oct 2015 20:31:19 +0000 (16:31 -0400)]
core: use syscall wrappers instead of direct syscalls - miscellaneous

various xlators and other components are invoking system calls
directly instead of using the libglusterfs/syscall.[ch] wrappers.

If not using the system call wrappers there should be a comment
in the source explaining why the wrapper isn't used.

Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c
BUG: 1267967
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12276
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agocore: use syscall wrappers instead of direct syscalls -- glusterd
Kaleb S. KEITHLEY [Fri, 16 Oct 2015 17:52:28 +0000 (13:52 -0400)]
core: use syscall wrappers instead of direct syscalls -- glusterd

various xlators and other components are invoking system calls
directly instead of using the libglusterfs/syscall.[ch] wrappers.

If not using the system call wrappers there should be a comment
in the source explaining why the wrapper isn't used.

Change-Id: I28bf2a5f7730b35914e7ab57fed91e1966b30073
BUG: 1267967
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12379
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agoafr: write zeros to sink for non-sparse files
Ravishankar N [Wed, 21 Oct 2015 15:35:46 +0000 (21:05 +0530)]
afr: write zeros to sink for non-sparse files

Problem: If a file is created with zeroes ('dd', 'fallocate' etc.) when
a brick is down, the self-heal does not write the zeroes to the sink
after it comes up. Consequenty, there is a mismatch in disk-usage
amongst the bricks of the replica.

Fix: If we definitely know that the file is not sparse, then write the
zeroes to the sink even if the checksums match.

Change-Id: Ic739b3da5dbf47d99801c0e1743bb13aeb3af864
BUG: 1272460
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12371
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agofeatures/shard: Force cache-refresh when lookup/readdirp/stat detect that xattr value...
Krutika Dhananjay [Tue, 20 Oct 2015 06:16:10 +0000 (11:46 +0530)]
features/shard: Force cache-refresh when lookup/readdirp/stat detect that xattr value has changed

Change-Id: Ia3225a523287f6689b966ba4f893fc1b1fa54817
BUG: 1272986
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12400
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agocluster/tier do not log error message on lookup heal for files on hot tier
Dan Lambright [Mon, 26 Oct 2015 18:19:24 +0000 (14:19 -0400)]
cluster/tier do not log error message on lookup heal for files on hot tier

On fix-layout heal files are scanned. Files found are exist on the hot or cold
subvolume. Those not found in the cold tier would exist on the hot. They
should not be flagged as an error.

Replace INFO with TRACE for common tier migration logs. Frequent migration
was growing the log files too quickly.

On migratation failures, do not acrue files towards cycle limit's budget.

Change-Id: Ie832ee07c43bce5477ae81c939d1fe8416a11615
BUG: 1275383
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12430
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
8 years agocluster/ec: update version and size on good bricks
Ashish Pandey [Fri, 23 Oct 2015 07:57:51 +0000 (13:27 +0530)]
cluster/ec: update version and size on good bricks

Problem: readdir/readdirp fops calls [f]xattrop with
fop->good which contain only one brick for these operations.
That causes xattrop to be failed as it requires at least
"minimum" number of brick.

Solution: Use lock->good_mask to call xattrop. lock->good_mask
contain all the good locked bricks on which the previous write
opearion was successfull.

Change-Id: If1b500391aa6fca6bd863702e030957b694ab499
BUG: 1274629
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/12419
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agoglusterd: call glusterd_store_volinfo in bump up op-version
Atin Mukherjee [Mon, 14 Sep 2015 11:39:29 +0000 (17:09 +0530)]
glusterd: call glusterd_store_volinfo in bump up op-version

After an upgrade, op-version is expected to be updated through gluster volume
set. If the new version introduces any feature which changes volinfo structure
without storing the default values of these new options would result into cksum
issues.

Change-Id: I57b4667f3403839811735bf66bef29e5200a9241
BUG: 1262805
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/12171
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
8 years agofeatures/changelog: record mknod if tier-dht linkto is set
Saravanakumar Arumugam [Fri, 23 Oct 2015 06:27:42 +0000 (11:57 +0530)]
features/changelog: record mknod if tier-dht linkto is set

This is a series of patches which aims to fix geo-replication
in a Tiering Volume.

Problem:
Consider, a file is placed in volume initially and then hot tier is
attached. During any operation on the file, due to lookup a linkto
file is created in hot tier.

Now, any namespace operation carried out on the file is recorded in
both cold and hot tier.
There is a room for races when both changelogs are replayed.

Solution:
So, We are going to replay (namespace related)operations
only in the hot tier.

Why?
a. If the file is directly placed in Hot tier, all fops will be
recorded in HOT tier.

b. If  the file is already present in Cold tier, and if any fop is
carried out, it creates linkto file in Hot tier.
Now, operations like UNLINK, RENAME are captured in Hot tier(by means of linkto file).

This way, we can get both tier's operation in HOT tier itself.

But, We may miss initial Data sync immediately after creating the
file as it is only recording MKNOD. So, if MKNOD encountered
with sticky bit set, queue DATA operation for the corresponding gfid.
( This geo-rep related changes are addressed in this patch: http://review.gluster.org/12326/ )

So, If tier-dht linkto is set, we need to record the corresponding
MKNOD. Earlier this was avoided as it was set as INTERNAL fop.
(This is addressed here in this patch)

Change-Id: I25514fe3e25f68592a8d6361507f8c8a4fcb70b1
BUG: 1266875
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/12417
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
8 years agofeatures/changelog: ignore recording tiering rebalance fops
Saravanakumar Arumugam [Mon, 28 Sep 2015 11:01:54 +0000 (16:31 +0530)]
features/changelog: ignore recording tiering rebalance fops

Recording of tiering rebalance process's fops like Creation
and Deletion of file must be avoided.
Ignore the fops using corresponding pid.

Change-Id: Ifdc7765598d04d033f93e6339e9b188f7566cb65
BUG: 1266875
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/12239
Reviewed-by: Aravinda VK <avishwan@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
8 years agotests: Separate logs for each test
Raghavendra Talur [Sun, 6 Sep 2015 18:54:05 +0000 (00:24 +0530)]
tests: Separate logs for each test

Change-Id: Ib286e3d4d7c432dab8073fce582ccbf723eb31d2
BUG: 1251592
Signed-off-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-on: http://review.gluster.org/12110
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agotier: Typo while setting the wrong value of low/hi watermark
hari gowtham [Tue, 27 Oct 2015 06:21:33 +0000 (11:51 +0530)]
tier: Typo while setting the wrong value of low/hi watermark

While setting the wrong value of watermark-hi/low the output
shows "compatiblevalue" whereas it should be "compatible value"

Change-Id: I29c8f9a954928d22e436465f4ebc30bd08640138
BUG: 1275502
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/12432
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
8 years agotests: fix timeout in mount-nfs-auth.t
Jeff Darcy [Mon, 19 Oct 2015 20:05:17 +0000 (16:05 -0400)]
tests: fix timeout in mount-nfs-auth.t

The mount timeout was too short.  The normal configuration-change path
(construct graph, call reconfigure) and the auth-refresh path might in
effect run serially.  Therefore we have to wait for the *sum* of those
two intervals.  As with all too-short-timeout problems, the result was
that the test would run fine most of the time.  However, it has caused
spurious failures on my own patches a half dozen times, and I have a
half dozen other emails about it nuking other people's as well (most
often but not always on NetBSD).

The fix, obviously, is to calculate and use the right timeout value for
NFS mount actions.  Other actions and timeouts have been left alone.

Change-Id: Ic8f013c8c830e33c48bcc6d1b603d6d22a8ba3c5
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/12396
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
8 years agofeatures/shard: Support geo-rep for sharded volume
Kotresh HR [Thu, 24 Sep 2015 10:12:14 +0000 (15:42 +0530)]
features/shard: Support geo-rep for sharded volume

Approach:
      Shard xlator on slave side is by passed for all the fops
to geo-rep mount. So each shard on master is considered as a
separate file for geo-rep and it syncs them separately on to
slave. The extended attribute in which shard maintains the
size is also synced from master and shard on slave doesn't
calculate by itself.

Pre-requisites:
      1. If master is sharded volume, slave also should be sharded.
      2. Slave's shard configurations should be same as master.
      3. Geo-rep config of xattr sync should not be disabled.

All other dependant patches:
      1. http://review.gluster.org/#/c/12205/
      2. http://review.gluster.org/#/c/12206/
      3. http://review.gluster.org/#/c/12225/
      4. http://review.gluster.org/#/c/12226/

Change-Id: I474220d69fa030b1e06a4fa0868c34fabe02efcf
BUG: 1265148
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: http://review.gluster.org/12228
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agogeo-rep: Avoid cold tier bricks during ENTRY operation
Saravanakumar Arumugam [Wed, 14 Oct 2015 06:19:49 +0000 (11:49 +0530)]
geo-rep: Avoid cold tier bricks during ENTRY operation

This is a series of patch which aims to fix geo-replication
in a Tiering Volume.

Problem:
Consider, a file is placed in volume initially and then hot tier is
attached. During any operation on the file, due to lookup a linkto
file is created in hot tier.

Now, any namespace operation carried out on the file is recorded in
both cold and hot tier.
There is a room for races when both changelogs are replayed.

Solution:
So, We are going to replay (namespace related)operations
only in the hot tier.

Why?
a. If the file is directly placed in Hot tier , all fops will be
recorded in HOT tier.
b. If  the file is already present in Cold tier, and if any fop is
carried out, it creates linkto file in Hot tier.
Now, operations like UNLINK, RENAME are captured in Hot
tier(by means of linkto file).

This way, we can get both tier's operation in HOT tier itself.

Now, once the file is demoted to COLD tier, any namespace operation
carried out on the cold tier can be avoided as we directly RECORD
the same in HOT tier.

How?
1. Check whether the brick is cold tier and skip ENTRY operation.
2. Also, if it is cold tier brick, use Xsync(which is used during initial run).
   This will help in getting all cold tier bricks changes using File System crawl
   and helps in avoiding races with hot tier brick(which can happen
   if historychangelog used in cold tier brick).

Dependent patches:
1. http://review.gluster.org/12239
2. http://review.gluster.org/12326

Change-Id: I7692b1dbb8813a7e253451bca02f8f09a5782dde
BUG: 1266875
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/12355
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
8 years agomount/fuse: use a queue instead of pipe to communicate with thread
Raghavendra G [Tue, 20 Oct 2015 10:57:14 +0000 (16:27 +0530)]
mount/fuse: use a queue instead of pipe to communicate with thread
doing inode/entry invalidations.

Writing to pipe can block if pipe is full. This can lead to deadlocks
in some situations. Consider following situation:

1. Kernel sends a write on an inode. Client is waiting for a response
   to write from brick.
2. A lookup happens on behalf of different application/thread on the
   same inode. In response, mdc tries to invalidate the inode.
3. fuse_invalidate_inode is called. It writes a invalidation request
   to pipe. Another thread which reads from this pipe writes the
   request to /dev/fuse. The invalidate code in fuse-kernel-module,
   tries to acquire lock on all pages for the inode and is blocked as
   a write is in progress on same inode (step 1)
4. Now, poller thread is blocked in invalidate notification and cannot
   receive any messages from same socket (on which lookup response
   came). But client is expecting a response for write from same
   socket (again step1) and we've a deadlock.

The deadlock can be solved in two ways:
1. Use a queue (and a conditional variable for notifications) to pass
   invalidation requests from poller to invalidate thread. This is a
   variant of using non-blocking pipe, but doesn't have any limit on the
   amount of data (worst case we run out of memory and error out).

2. Allow events from sockets, immediately after we read one
   rpc-msg. Currently we disallow events till that rpc-msg is read from
   socket, processed and handled by higher layers. That way we won't run
   into these kind of issues. Also, it'll increase parallelism in way of
   reading from sockets.

This patch implements solution 1 above.

Change-Id: I8e8199fd7f4da9eab46a719d9292f35c039967e1
BUG: 1273387
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/12402

8 years agogeo-rep: Add data operation if mknod with tier attribute
Saravanakumar Arumugam [Fri, 9 Oct 2015 14:59:30 +0000 (20:29 +0530)]
geo-rep: Add data operation if mknod with tier attribute

This is a series of patches which aims to fix geo-replication
in a Tiering Volume.

Problem:
Consider, a file is placed in volume initially and then hot tier is
attached. During any operation on the file, due to lookup a linkto
file is created in hot tier.

Now, any namespace operation carried out on the file is recorded in
both cold and hot tier.
There is a room for races when both changelogs are replayed.

Solution:
So, We are going to replay (namespace related)operations
only in the hot tier.

Why?
a. If the file is directly placed in Hot tier, all fops will be
recorded in HOT tier.

b. If  the file is already present in Cold tier, and if any fop is
carried out, it creates linkto file in Hot tier.
Now, operations like UNLINK, RENAME are captured in Hot tier(by means of linkto file).

This way, we can get both tier's operation in HOT tier itself.

But, We may miss initial Data sync immediately after creating the
file as it is only recording MKNOD. So, if MKNOD encountered
with sticky bit set, queue DATA operation for the corresponding gfid.
(This is addressed here in this patch)

So, If tier-gfid linkto is set, we need to record the corresponding
MKNOD. Earlier this was avoided as it was set as INTERNAL fop.
(This changelog related changes are addressed in the patch:
 - http://review.gluster.org/12417)

Change-Id: I2fa84cfa2b0f86506c3d15d484138ab9651e4f83
BUG: 1266875
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/12326
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Aravinda VK <avishwan@redhat.com>
8 years agoafr: wind writes only on subvols where preop succeeded
Ravishankar N [Fri, 23 Oct 2015 05:49:30 +0000 (11:19 +0530)]
afr: wind writes only on subvols where preop succeeded

1. Call local->transaction.wind() only on subvols where pre-op
succeeded.

2. Update op_errno in afr_changelog_cbk call path. This fixes a bug in
commit 7945121dda340ec8f25711b2ad3ca70b544de967 where we return EUCLEAN
to the application if pre-op fails on all bricks.

Change-Id: Iab8776e49a992e7a255314bba542742f7607f3ec
BUG: 1272362
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12415
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agotier/ctr: Correcting the internal fop calculation
Joseph Fernandes [Fri, 23 Oct 2015 06:57:32 +0000 (12:27 +0530)]
tier/ctr: Correcting the internal fop calculation

Correcting the internal fop calculation method, as it had wrong logic.

Change-Id: I1d0b40a1e27548147203ddd503794059652ac049
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12418
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agolibglusterfs: replace default functions with generated versions
Jeff Darcy [Tue, 6 Oct 2015 17:19:01 +0000 (13:19 -0400)]
libglusterfs: replace default functions with generated versions

Replacing repetitive code like this with code generated from a more
compact "canonical" definition carries several advantages.

 * Ease the process of adding new fops (e.g. GF_FOP_IPC).

 * Ease the process of making global changes to existing fops (e.g.
   adding "xdata").

 * Ensure strict consistency between all of the pieces that must be
   compatible with each other, through both kinds of changes.

What we have right now is just a start.  The above benefits will only
truly be realized when we use the same definitions to generate stubs,
syncops, and perhaps even parts of gfapi or glupy.

This same infrastructure can also be used to reduce code duplication and
potential for error in many of our translators.  NSR already uses a
similar technique, using a few hundred lines of templates to generate a
few *thousand* lines of code.  The ability to make a global "aspect"
change (e.g. to quorum checking) in one place instead of seventy has
already been demonstrated there.

Other candidates for code generation include the AFR/EC transaction
infrastructure, or stub creation/resumption in io-threads.

Change-Id: If7d59de7a088848b557f5aea00741b4fe19017c1
BUG: 1271325
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/9411
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
8 years agotests/tier: Move common functions to tier.rc
N Balachandran [Tue, 20 Oct 2015 16:53:17 +0000 (22:23 +0530)]
tests/tier:  Move common functions to tier.rc

Move common functions in tier .t files to tier.rc

Change-Id: Ibc312d987be9d93e7cc7fc47d0bf598bb1c944c2
BUG: 1272319
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12404
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agocluster/tier: add pause tier for snapshots
Dan Lambright [Mon, 5 Oct 2015 19:52:02 +0000 (19:52 +0000)]
cluster/tier: add pause tier for snapshots

Snaps of tiered volumes cannot handle files undergoing migration.
We implement a helper mechanism to "pause" migration. Any files
undergoing migration are aborted. Clean up is done to remove
sticky bits and data at the destination. Migration is restarted
after snap completes.

For testing an internal switch is added. It is not exposed externally.

gluster volume set vol1 tier-pause [true|false]

Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12304
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agoTier/cli: removing warning message for tiering
hari gowtham [Wed, 21 Oct 2015 06:42:09 +0000 (12:12 +0530)]
Tier/cli: removing warning message for tiering

The warning message for tiering being under experimental staus is removed.

Change-Id: I7d1d535d380b672c70f03ecc0d24a113600ea43f
BUG: 1273726
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/12407
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agocluster/dht : op_ret not set correctly in dht_fsync_cbk
N Balachandran [Tue, 20 Oct 2015 10:23:15 +0000 (15:53 +0530)]
cluster/dht : op_ret not set correctly in dht_fsync_cbk

local->op_ret was not set correctly in dht_fsync_cbk in case
of files being migrated

Change-Id: If73ae04368ea0c7f6868c8704dfc2deb2faee753
BUG: 1273372
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12401
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agocluster/tier do not abort migration if a single brick is down
Dan Lambright [Tue, 20 Oct 2015 00:42:56 +0000 (20:42 -0400)]
cluster/tier do not abort migration if a single brick is down

When a bricks are down, promotion/demotion should still be possible.
For example, if an EC brick is down, the other bricks are able to
recover the data and migrate it.

Change-Id: I8e650c640bce22a3ad23d75c363fbb9fd027d705
BUG: 1273215
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12397
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
8 years agotests: return success if the last test ends up with core and a bad test
Atin Mukherjee [Fri, 9 Oct 2015 16:41:23 +0000 (22:11 +0530)]
tests: return success if the last test ends up with core and a bad test

Change-Id: Ie2695ebff8678851edb6b0b6e1de37e1f5ec9077
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/12328
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agosnapshot: Fix snapshot clone postvalidate
Avra Sengupta [Thu, 15 Oct 2015 10:38:03 +0000 (16:08 +0530)]
snapshot: Fix snapshot clone postvalidate

In glusterd_snapshot_clone_postvalidate(), we were deleting
snap object and snap vol, by looking up snapname. Hence, it
was deleting the orignal snapshot from which the clone was
being created

Instead it should fetch the clonename, the respective
clone vol, and its corresponding snap object, and delete them.

Also glusterd_snap_remove(), needs to differentiate a clone
snap object from a snaphsot snap object, as in case of a clone
snap object, we don't have any persisted data in
/var/run/gluster/snaps/ and hence is shouldn't try to delete
anything there.

Change-Id: I02bb22a3898d5720e318a02d6cc32d25f75d317d
BUG: 1272339
Signed-off-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-on: http://review.gluster.org/12364
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
8 years agoafr: do not wind write if pre-op fails on all children
Ravishankar N [Fri, 16 Oct 2015 00:53:29 +0000 (06:23 +0530)]
afr: do not wind write if pre-op fails on all children

1. When winding the pre-op, transaction.pre_op[i] is set. If the pre-op fails,
transaction.failed_subvols[i] is set. If if fails on all chidren, we can
directly proceed to unlock (via afr_changelog_post_op_now) without trying
to wind the write, fail and then go to unlock.

2. 'fop_subvols' seems to be an unused variable, hence removing it.

Change-Id: I9525628daf48082e979b0093fa0478934495e61f
BUG: 1272362
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12368
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Reviewed-by: Anuradha Talur <atalur@redhat.com>
8 years agofeatures/snap : cleanup the root loc in statfs
Ashish Pandey [Tue, 8 Sep 2015 06:57:50 +0000 (12:27 +0530)]
features/snap : cleanup the root loc in statfs

Problem : In svc_statfs function, wipe_loc is getting called on loc
          passed by nfs. This loc is being used by svc_stat which
          throws erro if loc->inode is NULL.

Solution : wipe_loc should be called on local root_loc.

Change-Id: I9cc5ee3b1bd9f352f2362a6d997b7b09051c0f68
BUG: 1260848
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/12123
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agocluster/tier remove suprious log messages on valid failed migration
Dan Lambright [Mon, 19 Oct 2015 13:04:07 +0000 (09:04 -0400)]
cluster/tier remove suprious log messages on valid failed migration

On a write to a replica volume, we record in all brick's databases an entry.
When the tier daemon runs, it will only move the file if it is the true
owner of the file as defined by the XATTR_NODE_UUID_KEY.

Change-Id: Ib82717f87a3f94f3d0d9f969773de9e88d6aaf22
BUG: 1273043
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12391
Reviewed-by: Joseph Fernandes
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agocluster/tier update man pages for tier feature
Dan Lambright [Fri, 16 Oct 2015 18:16:32 +0000 (14:16 -0400)]
cluster/tier update man pages for tier feature

Add to gluster man pages instructions for tier commands.

Change-Id: I0918460eeaba22bb6a11238d4f5501fa8e61da88
BUG: 1272557
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12380
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
8 years agocluster/tier: Changed tier xattr-name value
N Balachandran [Tue, 13 Oct 2015 11:41:29 +0000 (17:11 +0530)]
cluster/tier: Changed tier xattr-name value

Each tier layer (for future stacking implementations)
must have a unique xattr name. We are currently using
the name of the tier subvolume excluding the volume name.

Change-Id: Id4adea61dc1c8473fb1d4d7364d1940278c6e129
BUG: 1259298
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12350
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agocluster/ec : Remove index entries if file/dir does not exist
Ashish Pandey [Tue, 13 Oct 2015 17:55:59 +0000 (23:25 +0530)]
cluster/ec : Remove index entries if file/dir does not exist

Problem: During write and rebalance if a brick is down, index
entries will be created. If the same file gets migrated to
other subvol by rebalance process, these index entries will
remain in index directory. During heal, these indices should
be removed when we get ENOENT or ESTALE for a index.

Solution: Capture correct errno and take appropriate action
to purge these indices.

Change-Id: I1aad8b99e4df2e139648e3bf971e4cb1c4b38699
Bug: 1271358
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: http://review.gluster.org/12353
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agoprotocol/server: define the max number of inodes in lru list as a number
Raghavendra Bhat [Mon, 28 Sep 2015 11:17:01 +0000 (16:47 +0530)]
protocol/server: define the max number of inodes in lru list as a number

* The max number of inodes in the lru list of the inode table was being defined
  in terms of memory units (GF_UNIT_MB) instead of number. And the description
  of the option was also referring to it in memory units instead of number.

Change-Id: I48f07e7d2826406697eb2a13714ab22feae81d89
BUG: 1266883
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/12242
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agocluster/dht : Do not migrate files with POSIX locks held
N Balachandran [Tue, 13 Oct 2015 09:32:00 +0000 (15:02 +0530)]
cluster/dht : Do not migrate files with POSIX locks held

dht_migrate_file does not migrate file locks to the dst file.
Any locks held on the source file are lost once the migration
is complete. This issue is magnified in the case of a tier volume
as file migrations occur more frequently and repeatedly as compared
to a DHT rebalance.

The fix makes 2 changes:
1. Before starting the actual migration process, check if there are
 any locks held on the file. If yes, do not migrate the file.
2. The rebalance process tries to lock on the entire file just before
 moving into the Phase 2 of the file migration. If the lock acquisition
fails, the file migration does not proceed.
If the lock is granted, the file migration proceeds.

This still leaves a small window where conflicting locks can be granted to
different clients. If client1 requests a lock on the src file just after
it is converted to a linkto file and client2 requests a lock on the dst
data file, they will both be granted, but all FOPs will be redirected
to the dst data file. This issue will be taken up in a subsequent patch.

Change-Id: I8c895fc3cced50dd2894259d40a827c7b43d58ac
BUG: 1271148
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12347
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agolibglusterfs: pass buffer size to gf_store_read_and_tokenize function
Gaurav Kumar Garg [Tue, 13 Oct 2015 09:10:55 +0000 (14:40 +0530)]
libglusterfs: pass buffer size to gf_store_read_and_tokenize function

Previously if user set an option where length of key=value goes beyond
PATH_MAX (4096) character then tokenzing the option at the time of
reading configuration file will fail.
This is because of the we was having restraction in fgets to read maximum
of PATH_MAX (4096) length of character.
Consequence of this is when user try to restart glusterd, after setting
key=value length beyond PATH_MAX (4096) character, glusterd will not restart.

With this fix instead of PATH_MAX, consumer of gf_store_read_and_tokenize
function will decide the size of the buffer length.

Change-Id: I655a8ce982effdfff8f3e785ea31f543dbe39301
BUG: 1271150
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/12346
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
8 years agocli/quota : rm -rf on /<mountpoint>/<dir> is not showing quota header
Manikandan Selvaganesh [Tue, 13 Oct 2015 08:58:03 +0000 (14:28 +0530)]
cli/quota : rm -rf on /<mountpoint>/<dir> is not showing quota header

Currently, when 'gluster v quota <VOLNAME> list' command is issued
after an rm -rf on /run/gluster/vol/<directory>, quota output header is
not shown. It is because the list_count was properly calculated with
'gluster v quota <VOLNAME> remove /path' and not with an rm -rf. The patch
fixes this issue.

Change-Id: I5266a8b0b9322b7db1b9e1d6b0327065931f4bcb
BUG: 1269375
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/12345
Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
8 years agocli : freeing the allocated memory
Manikandan Selvaganesh [Mon, 12 Oct 2015 09:33:28 +0000 (15:03 +0530)]
cli : freeing the allocated memory

Change-Id: Ibcbad94c091a9c24fe5aff2d7e8bcd9ac88da7bf
BUG: 1248521
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/12337
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com>
Reviewed-by: Kaushal M <kaushal@redhat.com>
8 years agoglusterd: disabling enable-shared-storage option should not delete volume
Gaurav Kumar Garg [Thu, 24 Sep 2015 12:34:23 +0000 (18:04 +0530)]
glusterd: disabling enable-shared-storage option should not delete volume

Previously when you create volume with "glusterd_shared_storage" name
and if user disable enable-shared-storage option then gluster will
delete the "glusterd_shared_storage" volume.

With this fix gluster will do appropriate validation of
enable-shared-storage option and it will not delete volume with
"glusterd_shared_storage" name if it is a user created volume.

Change-Id: I2bd92f938fb3de6ef496a934933bdcea9f251491
BUG: 1266818
Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com>
Reviewed-on: http://review.gluster.org/12232
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agotier/shd: inline warning when compiled with gcc v.5
Mohammed Rafi KC [Mon, 12 Oct 2015 11:52:20 +0000 (17:22 +0530)]
tier/shd: inline warning when compiled with gcc v.5

Change-Id: I487a26263d6e940eed364a831e99f9b8390bc96a
BUG: 1226881
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12342
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Anoop C S <anoopcs@redhat.com>
Tested-by: Anoop C S <anoopcs@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agofeatures/shard: Return ENOTSUP as opposed to ENOTCONN in unimplemented fops
Krutika Dhananjay [Wed, 29 Jul 2015 08:57:07 +0000 (14:27 +0530)]
features/shard: Return ENOTSUP as opposed to ENOTCONN in unimplemented fops

Change-Id: Idba1070b11c5c1de26ef57e6843c93c105b8b8a5
BUG: 1270694
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12340
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agofeatures/shard: Dump private members and addresses in statedump
Krutika Dhananjay [Mon, 12 Oct 2015 07:42:46 +0000 (13:12 +0530)]
features/shard: Dump private members and addresses in statedump

Change-Id: I3c5e5bd93288c4c9a2665a26c0d6a76e67ecf914
BUG: 1270694
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12334
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
8 years agoprotocol/client: give preference to loc->gfid over inode->gfid
Ravishankar N [Wed, 16 Sep 2015 11:05:19 +0000 (16:35 +0530)]
protocol/client: give preference to loc->gfid over inode->gfid

There are xlators which perform fops even before inode gets linked. Because of
this loc.gfid is preferred at the time of inodelk/entrylk but by the time
unlock can happen, inode could be linked with a different gfid than the one in
loc.gfid (because of the way dht was giving preference) Due to this unlock goes
on a different inode than the one we sent inodelk on, which leads to hang.

Credits to Pranith for the fix.

Change-Id: I7d162d44852ba876f35aa1bb83e4afdb184d85b9
BUG: 1266834
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/12233
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agotier/shd: make shd commands compatible with tiering
Mohammed Rafi KC [Tue, 8 Sep 2015 07:34:45 +0000 (13:04 +0530)]
tier/shd: make shd commands compatible with tiering

tiering volfiles may contain afr and disperse together
or multiple time based on configuration. And the
informations for those configurations are stored in
tier_info. So most of the volgen code generation
need to be changed to make compatible with it.

Change-Id: I563d1ca6f281f59090ebd470b7fda1cc4b1b7e1d
BUG: 1261276
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12135
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agoquota/marker: dir_count accounting is not atomic
vmallika [Wed, 7 Oct 2015 09:54:46 +0000 (15:24 +0530)]
quota/marker: dir_count accounting is not atomic

Consider below scenario:

Quota enabled on pre-existing data
Now quota-crawl process will start healing xattrs
Now if write is performed where healing is not complete, there is a
possibility that 'update txn' is started before 'create xattr txn', in
this case dir count can be missed on a dir where quota size xattr is not
yet created.

Solution is to get size xattr and if xattr is missing, add 1 for
dir_count, this requires one additional fop if done in marker during
each update iteration
Better solution is to us xattrop GF_XATTROP_ADD_ARRAY64_WITH_DEFAULT

Change-Id: Idc8978860a3914e70c98f96effeff52e9a24e6ba
BUG: 1243798
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/11694
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agotier/shd: create shd volfile for tiering
Mohammed Rafi KC [Thu, 20 Aug 2015 06:49:51 +0000 (12:19 +0530)]
tier/shd: create shd volfile for tiering

Currently shd graph will only start if it is replicate
or disperse volume. But in case of tiering, volume type
will be tier. So we need to start shd if any of the cold
or hot is compatible with shd volume.

Change-Id: Ic689746ac7d2fc6a9eccdabd8518dc9139829de2
BUG: 1261276
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/11962
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agotier/ctr: CTR DB named lookup heal of cold tier during attach tier
Joseph Fernandes [Tue, 4 Aug 2015 15:08:06 +0000 (20:38 +0530)]
tier/ctr: CTR DB named lookup heal of cold tier during attach tier

Heal hardlink in the db for already existing data in the cold
tier during attach tier. i.e during fix layout do lookup to files
in the cold tier.

CTR xlator on the  brick/server side does db update/insert of the hardlink on a namelookup.
Currently the namedlookup is done synchronous to the fixlayout that is
triggered by attach tier. This is not performant, adding more time to
fixlayout. The performant approach is record the hardlinks on a compressed
datastore and then do the namelookup asynchronously later, giving the ctr db
eventual consistency

Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9
BUG: 1252586
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/11828
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agocluster/tier: add watermarks and policy driver
Dan Lambright [Fri, 18 Sep 2015 04:49:06 +0000 (00:49 -0400)]
cluster/tier: add watermarks and policy driver

This fix introduces infrastructure to support different
policies for promotion and demotion.

Currently the tier feature automatically promotes and demotes
files periodically based on access. This is good for testing
but too stringent for most real workloads. It makes it
difficult to fully utilize a hot tier- data will be demoted
before it is touched- its unlikely a 100GB hot SSD will have
all its data touched in a window of time.

A new parameter "mode" allows the user to pick promotion/demotion
polcies.

The "test mode" will be used for *.t and other general testing.
This is the current mechanism.

The "cache mode" introduces watermarks. The watermarks
represent levels of data residing on the hot tier.

"cache mode" policy:

The % the hot tier is full is called P.

Do not promote or demote more than D MB or F files.

A random number [0-100] is called R.

Rules for migration:

if (P < watermark_low) don't demote, always promote.

if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P.

if (P > watermark_hi) always demote, don't promote.

gluster volume set {vol} cluster.watermark-hi %
gluster volume set {vol} cluster.watermark-low %
gluster volume set {vol} cluster.tier-max-mb {D}
gluster volume set {vol} cluster.tier-max-files {F}
gluster volume set {vol} cluster.tier-mode {test|cache}

Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2
BUG: 1257911
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12039
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
8 years agoPorting developer guide to source code repo from glusterdocs project
Humble Devassy Chirammal [Thu, 24 Sep 2015 09:23:52 +0000 (14:53 +0530)]
Porting developer guide to source code repo from glusterdocs project

Change-Id: Ib8d9c668ebb05863918e6ec2b89908f206626f38
BUG: 1206539
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/12227
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Tested-by: Raghavendra Talur <rtalur@redhat.com>
8 years agocluster/tier: fix transpoint endpoint not connected in tier.t (rare)
Dan Lambright [Fri, 9 Oct 2015 16:18:03 +0000 (12:18 -0400)]
cluster/tier: fix transpoint endpoint not connected in tier.t (rare)

The script did not cleanly unmount/mount gluster and change the current
working directory when stopping and starting the volume. Most of the
time this problem would self-resolve before subsequent tests, but
very occasionally races would lead to the errors/failures.

Change-Id: I128b913a71e2745512ee81c3d71852311e3b4a1b
BUG: 1270328
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12327
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
8 years agoglusterfsd: Initialize ctx, cmd_args
Pranith Kumar K [Wed, 7 Oct 2015 13:09:42 +0000 (18:39 +0530)]
glusterfsd: Initialize ctx, cmd_args

Change-Id: I9c71ae264665b7bba609c7f86cf42a52a6b47260
BUG: 1269696
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12311
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
8 years agocluster/ec: Implement gfid-hash read-policy
Pranith Kumar K [Tue, 8 Sep 2015 10:53:36 +0000 (16:23 +0530)]
cluster/ec: Implement gfid-hash read-policy

Add a policy in ec to performs reads from same bricks as long as they
are good. Based on the gfid of the file/directory it determines the
bricks to be considered for reading.

Change-Id: Ic97b5c54c086a28b5e07a330a4fd448551b49376
BUG: 1261260
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/12133
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
8 years agogfapi: xattr key length check to avoid brick crash
Milind Changire [Tue, 22 Sep 2015 13:00:22 +0000 (18:30 +0530)]
gfapi: xattr key length check to avoid brick crash

Added check to test if xattr key length > max allowed for OS
distribution and return:
EINVAL if xattr name pointer is NULL or 0 length
ENAMETOOLONG if xattr name length > max allowed for distribution

Typically the VFS does this in the kernel for us.  But since we are
bypassing the VFS by providing the libgfapi to talk directly to the
brick process, we need to add such checks.

Change-Id: I610a8440871200ae4640351902b752777a3ec0c2
BUG: 1263056
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/12207
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
8 years agofeatures/shard: Regulate memory consumption by individual shards' inode_t objects
Krutika Dhananjay [Tue, 29 Sep 2015 09:43:37 +0000 (15:13 +0530)]
features/shard: Regulate memory consumption by individual shards' inode_t objects

Shard translator will now maintain an lru list of inodes associated with
individual shards of constant size, and will make sure that at no point the
number of these inodes will exceed the configured limit.
This is to keep the memory consumption by the thousands of shards of every large
file from exploding.

Change-Id: I5e60eea5dcf3130257fb431ca70cfaba53cae7f3
BUG: 1252263
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/12254
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
8 years agotiering/glusterd: keep afr/ec xlators name constant
Mohammed Rafi KC [Mon, 7 Sep 2015 09:16:33 +0000 (14:46 +0530)]
tiering/glusterd: keep afr/ec xlators name constant

afr uses the translator name for locking purpose,
so it is mandatory to keep afr/ec xlators name constant
across graph change

currently when a tier is attached, afr names are appended
either with hot or cold. ie that breaks the above
mentioned constraint.

Change-Id: I3699dcdaa8190bab3ba81cbc01e8fa126d37ba0d
BUG: 1261276
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12134
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
8 years agofeature/quota: Make message-id for quota start from 120000
Susant Palai [Wed, 23 Sep 2015 09:01:47 +0000 (05:01 -0400)]
feature/quota: Make message-id for quota start from 120000

Change-Id: I2076fcab51f4ecc529dffd89ca6ee9eb99d80f09
BUG: 1265531
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/12218
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agoquota: fix crash in quota_fallocate
vmallika [Thu, 8 Oct 2015 07:07:42 +0000 (12:37 +0530)]
quota: fix crash in quota_fallocate

list head was not initialized and brick
was crashing with fallocate.

This patch fixes the issue

Change-Id: I9757b88eab61054892f0fe3de63af2683cd4fef7
BUG: 1269754
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/12314
Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
8 years agotier/ctr: Solution for db locks for tier migrator and ctr using sqlite version less...
Joseph Fernandes [Fri, 18 Sep 2015 14:27:54 +0000 (19:57 +0530)]
tier/ctr: Solution for db locks for tier migrator and ctr using sqlite version less than 3.7 i.e rhel 6.7

Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support
WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above.
As a result we cannot have to progreses concurrently accessing sqlite, without
running into db locks! Well WAL is also need for performace on CTR side.

Solution: This solution is to use CTR db connection for doing queries when WAL mode is
absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will
do the query and create/update the query file suggested by tier migrator.

Pending: Well this solution will stop the db locks but the performance is still an issue for CTR.
We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR
performance by doing in memory udpates on the IO path and later flush the updates to
the db in a batch/segment flush.

Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124
BUG: 1240577
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12191
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Luis Pabon <lpabon@redhat.com>