OK, 謝謝 Penk 的介紹,那我們就開始吧。
自由軟體工作者,目前在 Canonical 公司服務,負責開發 Ubuntu OEM 衍生版本,平時會在台北參加 TOSSUG 以及 Hacking Thursday 的社群聚會。
簡單的自我介紹一下。(停頓三秒就跳過)
英屬曼島商肯諾有限公司
台北 101 大樓 46-47F
http://www.canonical.com/careers
敝公司現在還有在徵人,以上是職缺,如果有興趣的話,可以到上面的網址找這些職缺的詳細內容。
這裡算是前情提要,首先簡單介紹一下 Linux 上面使用到的虛擬化技術,這邊講的以 x86 為主。
利用 Intel Virtualization Technology (Intel VT) 與 AMD Virtualization (AMD-V) 等 CPU 虛擬化支援功能,將 PC 硬體上運作的 OS 直接拿到虛擬機器下執行。
Almost complete simulation of the actual hardware to allow software, which typically consists of a guest operating system, to run unmodified.
完全虛擬化就是模擬整個電腦,讓一般安裝在實體電腦上的 OS 也能夠安裝到虛擬環境裡面,不用做額外的修改,但是如果要跑得更順暢的話, 就是要安裝額外的驅動程式,有在使用 VirtualBox 的人應該知道,在 VirtualBox 裡面安裝完一個 Linux 系統後需要再安裝一些驅動程式。
http://commons.wikimedia.org/wiki/File:Hardware_Virtualization_(copy).svg
這是從 WikiPedia 上面找來的圖片,大概長得像是這樣,裡面每個 Guest OS 都有自己的虛擬硬體。
Guest OS 需要知道自己在虛擬化環境底下執行,Kernel 與驅動程式必須修正。半虛擬化方式的 guest OS 稱為 PV guest;半虛擬化方式的驅動程式稱為 PV driver。
A hardware environment is not simulated; however, the guest programs are executed in their own isolated domains, as if they are running on a separate system. Guest programs need to be specifically modified to run in this environment.
而半虛擬化則是要安裝修改過的 Linux kernel 跟驅動程式,好吧。。。 我目前不是熟這些東西,只是在這裡提出來有這樣的東西。
由作業系統提供的功能來隔離 guest OS 的執行環境,但是共用 Host OS 上面的 Kernel,在 guest OS 裡面看起來就像是一個獨立的環境。
The same OS kernel is used to implement the "guest" environments. Applications running in a given "guest" environment view it as a stand-alone system.
現在這個就是今天要講的主題,也就是它不屬於完全虛擬化跟半驅擬化, 它只是創建了一個特別的容器,而這個容器裡面所使用的 Linux Kernel 跟外面是同一個, 只是系統環境被 Linux kernel 所提供的一些功能給隔開了。
官方網站 https://linuxcontainers.org
"LXC is often considered as something in the middle between a chroot on steroids and a full fledged virtual machine. The goal of LXC is to create an environment as close as possible as a standard Linux installation but without the need for a separate kernel."
“LXC 往往被視為在加強版的 Chroot 環境和一個完全成熟的虛擬機器之間的某種存在。LXC 的目標是創造一個盡可能接近標準的 Linux 安裝環境,但是不需要額外的系統內核。”
現在講到今天的主題 Linux Container,上面的敘述是從官方網站引述的。(照著中文唸一遍)
根據 2014.07.16 的統計資料 by git shortlog -sne
551 Stéphane Graber <stgraber [at] ubuntu.com> 529 Serge Hallyn <serge.hallyn [at] ubuntu.com> 243 Dwight Engen <dwight.engen [at] oracle.com> 200 Daniel Lezcano <daniel.lezcano [at] free.fr> 190 dlezcano <dlezcano> 140 Daniel Lezcano <dlezcano [at] fr.ibm.com> 116 Michel Normand <normand [at] fr.ibm.com> 80 KATOH Yasufumi <karma [at] jazz.email.ne.jp> 77 S.Çağlar Onur <caglar [at] 10ur.org> 65 Christian Seiler <christian [at] iwakd.de> 59 Natanael Copa <ncopa [at] alpinelinux.org> 47 Serge Hallyn <serge.hallyn [at] canonical.com> 29 Michael H. Warfield <mhw [at] WittsEnd.com> 26 Qiang Huang <h.huangqiang [at] huawei.com> ...
我們先來看一下開發者成員,上面在 2014.07.16 在 git repository 上面執行後面那段指令之後的輸出結果。
合併重覆之後的前五名
576 Serge Hallyn <serge.hallyn [at] ubuntu.com> 551 Stéphane Graber <stgraber [at] ubuntu.com> 530 Daniel Lezcano <dlezcano [at] fr.ibm.com> 243 Dwight Engen <dwight.engen [at] oracle.com> 116 Michel Normand <normand [at] fr.ibm.com>
原作者 Daniel Lezcano 來自 IBM
主要的商業公司支援來自 Canonical, IBM, Oracle
接著我們把重覆的部份合併,就可以發現主要是這三間公司聘請全職的開發人員在做貢獻。 為什麼 Canonical 也就是敝公司會投入 lxc 的開發呢?
當然是因為要應用到自己的產品上面啦,
首先是 Ubuntu Juju
Ubuntu Juju 是一個雲端快速建構的工具跟平台,目標是讓使用者輕鬆無痛地建立起網站, 如果是你是在本機上安裝使用它,就是會使用到 LXC, 這裡有一段 YouTube 的影片大家會後可以看一下,不過我們先來看一下 Demo
接下來再來看一下 Ubuntu Touch
Ubuntu Touch 是 Canonical 為了手機與平板所開發的一套系統,它與一般的 Ubuntu 共用所有的軟體套件, 但是額外新增了一些軟體散布的機制,我們來快速看一下 Ubuntu Touch 內部設計的文件,看哪裡有用到 LXC。
man lxc
... * General setup * Control Group support -> Namespace cgroup subsystem -> Freezer cgroup subsystem -> Cpuset support -> Simple CPU accounting cgroup subsystem -> Resource counters -> Memory resource controllers for Control Groups * Group CPU scheduler -> Basis for grouping tasks (Control Groups) * Namespaces support -> UTS namespace -> IPC namespace -> User namespace -> Pid namespace -> Network namespace * Device Drivers * Character devices -> Support multiple instances of devpts * Network device support -> MAC-VLAN support -> Virtual ethernet pair device * Networking * Networking options -> 802.1d Ethernet Bridging * Security options -> File POSIX Capabilities ...
我們來看 Linux kernel 裡面提供了哪些功能,如果你去 man lxc 這個指令, 你就會看到裡面有一段 Linux kernel 編譯選項的敘述, 如果去 Linux kernel source tree 裡面去找這些編譯選項的說明就會看到接下來的東西。
This option adds support for grouping sets of processes together, for use with process control subsystems such as Cpusets, CFS, memory controls or device isolation.
See:
- Documentation/scheduler/sched-design-CFS.txt (CFS) - Documentation/cgroups/ (features for grouping, isolation and resource control)
Control Group 又稱為 cgroup 是主要的功能選項,接下來許多 cgroup subsystem 又稱為 controller 都是依賴在這個選項之下。
cgroup 的功能是讓 process 能夠分開在不同的 group 裡面,然後我們可以對每個 group 透過 controller 做不同的操作。
Provides a simple namespace cgroup subsystem to provide hierarchical naming of sets of namespaces, for instance virtual servers and checkpoint/restart jobs.
2.6.24–2.6.39
Namespace controller 是讓 cgroup 去使用到 namespace 功能。
namespace 是另外一個主要的功能,等一下會做比較詳細的說明,這裡先跳過。
Provides a way to freeze and unfreeze all tasks in a cgroup.
看一下大概就知道這是用來凍結所有 process 的東西。
This option will let you create and manage CPUSETs which allow dynamically partitioning a system into sets of CPUs and Memory Nodes and assigning tasks to run only within those sets. This is primarily useful on large SMP or NUMA systems.
簡單說就是指定 process 能夠跑在哪一個 CPU 上面。
Provides a simple Resource Controller for monitoring the total CPU consumed by the tasks in a cgroup.
統計每個 process 的 CPU 使用量。
This option enables controller independent resource accounting infrastructure that works with cgroups.
提供一些共通的機制去計算各種資源的使用量。
Provides a memory resource controller that manages both anonymous memory and page cache. (See Documentation/cgroups/memory.txt)
Note that setting this option increases fixed memory overhead associated with each page of memory in the system. By this, 8(16)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory usage tracking struct at boot. Total amount of this is printed out at boot.
Only enable when you're ok with these trade offs and really sure you need the memory resource controller. Even when you enable this, you can set "cgroup_disable=memory" at your boot option to disable memory resource controller and you can avoid overheads. (and lose benefits of memory resource controller)
This config option also selects MM_OWNER config option, which could in turn add some fork/exit overhead.
控制記憶體資源的使用量。
This feature lets CPU scheduler recognize task groups and control CPU bandwidth allocation to such task groups. It uses cgroups to group tasks.
Process 的 CPU 排程的控制。
Provides the way to make tasks work with different objects using the same id. For example same IPC id may refer to different objects or same user id or pid may refer to different tasks when used in different namespaces.
讓容器裡面可以使用跟容器外面一樣的 ID ,例如 Process ID / User ID / IPC ID,
至少在容器裡面看起來是跟外面一樣的,實際上當然不會一樣,只是容器以為是獨立的環境。
例如,容器內有 init 它的 PID 是 1,容器外面也有 init 它的 PID 也是 1, 但是容器裡面的 init 從容器外面來看就不是 1 了,而是其它的數字。
來實際看一下 init 的例子。
In this namespace tasks see different info provided with the uname() system call
讓容器內的 uname 跑出不一樣的結果。(以 sudo lxc-start -n wheezy-sh4 裡面的 uname -m 為例)
In this namespace tasks work with IPC ids which correspond to different IPC objects in different namespaces.
讓 IPC ID 在容器內獨立。
This allows containers, i.e. vservers, to use user namespaces to provide different user info for different servers.
When user namespaces are enabled in the kernel it is recommended that the MEMCG and MEMCG_KMEM options also be enabled and that user-space use the memory control groups to limit the amount of memory a memory unprivileged users can use.
讓 User ID 在容器內獨立,並且可以讓一般的 User ID 受到某些記憶體使用量的限制。
Support process id namespaces. This allows having multiple processes with the same pid as long as they are in different pid namespaces. This is a building block of containers.
讓 Process ID 在容器內獨立。
Allow user space to create what appear to be multiple instances of the network stack.
允許用戶空間可以建立多個網路實體,就很多 Ethernet interface 的樣子。
Enable support for multiple instances of devpts filesystem. If you want to have isolated PTY namespaces (eg: in containers), say Y here. Otherwise, say N. If enabled, each mount of devpts filesystem with the '-o newinstance' option will create an independent PTY namespace.
在容器內建立 /dev/tty1 之類的東西,等一下會提到 lxc-console 這個指令會使用到這個功能。
This allows one to create virtual interfaces that map packets to or from specific MAC addresses to a particular interface.
Macvlan devices can be added using the "ip" command from the iproute2 package starting with the iproute2-2.6.23 release:
"ip link add link <real dev> [ address MAC ] [ NAME ] type macvlan"
To compile this driver as a module, choose M here: the module will be called macvlan.
這應該是將網路切成許多不同的區域網路空間,彼此獨立互相不會受到影響。
This device is a local ethernet tunnel. Devices are created in pairs. When one end receives the packet it appears on its pair and vice versa.
將 Linux Container 裡面的網路跟外面的網路連接在一起,有點像是虛擬網路線對接。
If you say Y here, then your Linux box will be able to act as an Ethernet bridge, which means that the different Ethernet segments it is connected to will appear as one Ethernet to the participants. Several such bridges can work together to create even larger networks of Ethernets using the IEEE 802.1 spanning tree algorithm. As this is a standard, Linux bridges will cooperate properly with other third party bridge products.
In order to use the Ethernet bridge, you'll need the bridge configuration tools; see <file:Documentation/networking/bridge.txt> for location. Please read the Bridge mini-HOWTO for more information.
If you enable iptables support along with the bridge support then you turn your bridge into a bridging IP firewall. iptables will then see the IP packets being bridged, so you need to take this into account when setting up your firewall rules. Enabling arptables support when bridging will let arptables see bridged ARP traffic in the arptables FORWARD chain.
將一個 Ethernet 當成好多不同的 Ethernet 使用,但是實際上是同一個 Ethernet 實體裝置。
This enables filesystem capabilities, allowing you to give binaries a subset of root's powers without using setuid 0.
(Removed from linux kernel 2.6.33 and above versions.)
某些檔案系統權限的功能,不過後來這個選項已經被移掉不用了。
$ sudo apt-get install lxc lxc-templates
接下來簡單介紹幾個 lxc 的指令,首先當然要先安裝到系統上面才可以使用。
$ lxc-checkconfig Kernel configuration not found at /proc/config.gz; searching... Kernel configuration found at /boot/config-3.13.0-32-generic --- Namespaces --- Namespaces: enabled Utsname namespace: enabled Ipc namespace: enabled Pid namespace: enabled User namespace: enabled Network namespace: enabled Multiple /dev/pts instances: enabled --- Control groups --- Cgroup: enabled Cgroup clone_children flag: enabled Cgroup device: enabled Cgroup sched: enabled Cgroup cpu account: enabled Cgroup memory controller: enabled Cgroup cpuset: enabled --- Misc --- Veth pair device: enabled Macvlan: enabled Vlan: enabled File capabilities: enabled Note : Before booting a new kernel, you can check its configuration usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
$ tree /usr/share/lxc/templates /usr/share/lxc/templates ├── lxc-alpine ├── lxc-altlinux ├── lxc-archlinux ├── lxc-busybox ├── lxc-centos ├── lxc-cirros ├── lxc-debian ├── lxc-download ├── lxc-fedora ├── lxc-gentoo ├── lxc-openmandriva ├── lxc-opensuse ├── lxc-oracle ├── lxc-plamo ├── lxc-sshd ├── lxc-ubuntu └── lxc-ubuntu-cloud 0 directories, 17 files
$ sudo lxc-create -t debian -h
$ sudo lxc-create -t debian -n sid -- -r sid -a amd64
$ sudo lxc-destroy -n sid
$ sudo lxc-start -d -n sid
$ sudo lxc-freeze -n sid
$ sudo lxc-unfreeze -n sid
$ sudo lxc-stop -n sid
$ sudo lxc-ls -f NAME STATE IPV4 IPV6 AUTOSTART --------------------------------------------------- sid FROZEN 10.0.3.56 - NO
$ sudo lxc-info -n sid Name: sid State: FROZEN PID: 13843 IP: 10.0.3.56 CPU use: 0.59 seconds Memory use: 24.69 MiB KMem use: 0 bytes Link: vethL2RL9Y TX bytes: 2.49 KiB RX bytes: 24.61 KiB Total bytes: 27.09 KiB
$ sudo lxc-console -n sid
這裡就是前面有提到的一個 devpts 的 Linux kernel 編譯選項, 這邊就是模擬純 console 環境的 tty1, 你可以重複執行這個指令來取得 tty2, tty3 以此類推。
https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-series/
http://steamcommunity.com/linux
在 Ubuntu 12.04 上面的 Demo http://youtu.be/IorxJsw09vY
sudo apt-add-repository ppa:ubuntu-lxc/stable sudo apt-get update sudo apt-get install steam-lxc sudo mkdir -p /var/lib/lxc /var/cache/lxc sudo steam-lxc create sudo steam-lxc run
Space | Forward |
---|---|
Right, Down, Page Down | Next slide |
Left, Up, Page Up | Previous slide |
P | Open presenter console |
H | Toggle this help |