Erroneous SwapFree on LXC causes problems with CDH install

Installing CDH 5.7 on Linux Containers (LXC) hosted on Proxmox 4. Everything was going well until Cluster Setup, and which point it failed on Start YARN (MR2 included)

Completed only 0/1 steps. First failure: Failed to execute command Start on service YARN (MR2 Included)

Log /var/log/hadoop-yarn/ showed:

org.apache.hadoop.service.AbstractService: Service containers-monitor failed in state INITED; cause: java.lang.NumberFormatException: For input string: "18446744073709550364"
java.lang.NumberFormatException: For input string: "18446744073709550364"

Looking down the stack trace, this came from org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin.readProcMemInfoFile, which the source code shows is reading /proc/meminfo. Looking at this file on each node showed:

[root@cdh57-01-node-02 hadoop-yarn]# cat /proc/meminfo
MemTotal:       24576000 kB  
MemFree:        22123008 kB  
MemAvailable:   22123008 kB  
Buffers:               0 kB  
Cached:          1194376 kB  
SwapCached:            0 kB  
Active:         73536116 kB  
Inactive:       21903364 kB  
Active(anon):   64138128 kB  
Inactive(anon): 11784916 kB  
Active(file):    9397988 kB  
Inactive(file): 10118448 kB  
Unevictable:    26832052 kB  
Mlocked:        26832052 kB  
SwapTotal:             0 kB  
SwapFree:       18446744073709550384 kB  
Dirty:              2008 kB  

Erm ... SwapFree is 16 million petabytes???

In my LXC configuration in Proxmox I'd set zero swap, thinking that disabling swap would be a good idea. Evidently not.

As soon as I updated the container Swap to 128Mb, the SwapFree looked better:

[root@cdh57-01-node-02 hadoop-yarn]# cat /proc/meminfo
SwapTotal:        131072 kB  
SwapFree:         129840 kB  

To apply this to all the six container nodes, I could have used the Proxmox web GUI, but took advantage of the CLI to save some time with a little bash iteration over the six container IDs (111 to 116) and the pct set command

for i in 11{1..6}; do pct set $i -swap 512;done  

To check the value across each node at once, I used pdsh from my laptop to run the same command on each node directly:

rmoff@asgard:~> pdsh -l root -w cdh57-01-node-0[1-6] "grep Swap /proc/meminfo"|sort  
cdh57-01-node-01: SwapCached:            0 kB  
cdh57-01-node-01: SwapFree:         515496 kB  
cdh57-01-node-01: SwapTotal:        524288 kB  
cdh57-01-node-02: SwapCached:            0 kB  
cdh57-01-node-02: SwapFree:         523056 kB  
cdh57-01-node-02: SwapTotal:        524288 kB  
cdh57-01-node-03: SwapCached:            0 kB  
cdh57-01-node-03: SwapFree:         523476 kB  
cdh57-01-node-03: SwapTotal:        524288 kB  
cdh57-01-node-04: SwapCached:            0 kB  
cdh57-01-node-04: SwapFree:         523760 kB  
cdh57-01-node-04: SwapTotal:        524288 kB  
cdh57-01-node-05: SwapCached:            0 kB  
cdh57-01-node-05: SwapFree:         522272 kB  
cdh57-01-node-05: SwapTotal:        524288 kB  
cdh57-01-node-06: SwapCached:            0 kB  
cdh57-01-node-06: SwapFree:         519456 kB  
cdh57-01-node-06: SwapTotal:        524288 kB  

From the Cloudera Manager Cluster Setup page I then clicked Retry and YARN came up successfully.

Robin Moffatt

Read more posts by this author.

Yorkshire, UK