Unnatural Acts with AI

January 26th, 2010

I’m pretty sure this is not what the AI team had in mind when they gave us bootable AI.  But in my quest to see what the oldest piece of gear I can run OpenSolaris, here’s a fun one:

jack@opensolaris:~$ uname -a
SunOS opensolaris 5.11 snv_130 sun4u sparc SUNW,UltraAX-i2 Solaris
jack@opensolaris:~$ cat /etc/release
                      OpenSolaris Development snv_130 SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 18 December 2009
jack@opensolaris:~$
telnet> send brk
Type  'go' to resume
ok banner
Sun Fire V100 (UltraSPARC-IIe 500MHz), No Keyboard
OpenBoot 4.0, 1024 MB memory installed, Serial #51701117.
Ethernet address 0:3:ba:14:e5:7d, Host ID: 8314e57d.
ok go
jack@opensolaris:~$ prtdiag
System Configuration:  Sun Microsystems  sun4u Sun Fire V100 (UltraSPARC-IIe 500MHz)
System clock frequency: 100 MHz
Memory size: 1024 Megabytes

Pretty much took forever to install, but it works like a champ.  More news as it occurs!

Random movie thoughts on a momentuous day

January 21st, 2010

So, the EU has approved the acquisition of Sun by Oracle.  This coming Monday makes 15 years that I have been at Sun.  And this morning two movie scenes keep rolling around in my head.

First is from 1776, the great old musical about how the US came to be.  Right after the Continental Congress unanimously votes, after a long struggle, for independence from England, they all sit stunned that it has passed and John Adams, played by William Daniels just says, "It’s done.  It’s done."  They really didn’t know what was in store, but they were confident of a bright future, albeit one that would require a huge effort on everyone’s part.  (Just before this, he sang one of my favorite songs about commitment and pushing forward.)

The second is from Camelot.  I suspect a lot of us long-time Sun folks are feeling just a tad nostalgic right now for the place where "the rain would never fall till after sundown."  For many, Sun has been "for one brief shining moment" a really special place to be.  I’ve always thought of the people at Sun first when I think about this company.  Some of the finest, smartest, most willing to pitch in and help each other no matter what folks you will ever find.

Scott McNealy said it best in his keynote at Oracle OpenWorld this past October.  He said he wanted people to remember that Sun Kicked Butt, Had Fun, Didn’t Cheat, Loved Our Customers, and Changed Computing Forever.   That about sums it up.

Now, the next chapter in Sun is about to start.  I think it’s going to be bright with lots of opportunity and a great time all around.  But, no matter how wonderful it is, it will be different and we’ll look back and miss many of the good times at Sun.

I’m excited about the next chapter.

Bootable AI ISO is way cool

January 15th, 2010

Alok Aggarwal posted, just before Christmas, a blog mentioning that the ISO images for the Auto Installer in OpenSolaris are now bootable.  Not just for x86 but also for SPARC.

This is huge!  While it does not provide a LiveCD desktop environment for SPARC, it does give us a way to easily install OpenSolaris on  SPARC gear.  Previously, it was necessary to set up an AI install server (running on an x86 platform since that was the only thing you could install natively) and use WAN Boot to install OpenSolaris on the SPARC boxes.  Well, that was a tough hurdle for some of us to get over.

Now, you can burn the AI ISO to a CD and boot it directly.  The default manifest on the disk will install a default system from the pkg.opensolaris.org  release repository.   Or, better yet, build a simple AI manifest that changes the release repository to the dev repo and put it somewhere you can fetch via http.  When you boot up, you will be prompted for the URL of the manifest.  AI will fetch it and use it to install the system.

{2} ok boot cdrom - install prompt
Resetting ...
Sun Fire 480R, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.10.8, 16384 MB memory installed, Serial #57154911.
Ethernet address 0:3:ba:68:1d:5f, Host ID: 83681d5f.
Rebooting with command: boot cdrom - install prompt
Boot device: /pci@8,700000/ide@6/cdrom@0,0:f  File and args: - install prompt
SunOS Release 5.11 Version snv_130 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: opensolaris
Remounting root read/write
Probing for device nodes ...
Preparing automated install image for use
Done mounting automated install image
Configuring devices.
Enter the URL for the AI manifest [HTTP, default]: http://<my web server>/bootable.xml

See!  This is really easy and gives new life to really old gear.  In this case, the manifest is super simple, too.  I just grabbed the default manifest from an AI image and changed the repository and package to install.

$ pfexec lofiadm -a `pwd`/osol-dev-130-ai-x86.iso
/dev/lofi/1
$ pfexec mount -o ro -F hsfs /dev/lofi/1 /mnt
$ cp /mnt/auto_install/default.xml /etc/apache2/2.2/htdocs/bootable.xml

Edit this file and change

<main url="http://pkg.opensolaris.org/release" publisher="opensolaris.org"/>

to

<main url="http://pkg.opensolaris.org/dev" publisher="opensolaris.org"/>

Or as a speedup, add the mirror to pkg.opensolaris.org :

<main url="http://pkg.opensolaris.org/dev" publisher="opensolaris.org"/>
<mirror url="http://pkg-na-2.opensolaris.org/dev"/>

And change

<pkg name="entire"/>

to

<pkg name="entire@0.5.11-0.130"/>

You can add a mirror site for the repo in this manifest.  Or you can list other packages that you want to be installed as the system is installed.  The docs for the AutoInstaller talk about how to create and modify a manifest.

Some caveats that I found:  First, NWAM and DHCP might take longer than you think.  If you quickly try to type in the URL for the manifest, you may find that you have no network yet and become concerned.  I spent the better part of a day on this.  Then, I let it sit for a couple of minutes before trying the manifest URL and life was good.  My DHCP server is particularly slow on my network.

Second, not using the mirror, on a slow system took a really long time to install.  Have not diagnosed it to network download time or processing time.  I think some of both since things like the installation phase of babel_install took nearly an hour on one system.

Third, there must be a lower bound on what sort of system will work.  T2000 works just fine.  SF480R has worked fine.  My SF280R is busted – as soon as it’s fixed, I’ll try it.  Not so great on E220 and E420 systems.  They appear to work, but at the very end it says it failed.  The only failure message I can see this time is due to the installer finding a former non-global zone environment on the disk. But so far, my experience on UltraSPARC-II systems is that once the installation completes, it hangs on the first reboot or fails to boot at all.  I am not surprised that systems that are no longer supported are not supported by AI.  I think I saw in Alok’s notes that OBP 4.17 was the minimum supported.  That means my USII boxes are right out, and  I think even the SF280.  I hate doing firmware updates, so I have not updated the SF480.

Fourth, when I tried to install on a system that previously had the root disk mirrored with SVM, zpool create for the root pool failed.  I had to delete the metadbs and the metadevices before I could proceed.

But, I am very impressed!  Bootable AI media is way cool.  Keep your eyes and ears open, though, for more developments in the AutoInstaller in the coming months.

ATLOSUG January Slides Posted

January 13th, 2010

Slides from the January meeting of ATLOSUG – the Atlanta OpenSolaris User Group – are posted at mediacast.sun.com.

Next meeting will be February 9, 2010.  Check our web site for details.

Time to move to OpenSolaris completely

January 13th, 2010

The last build of SXCE, the Solaris Express Community Edition, Build 130 has been released.  So, what?

Well, this means that it’s time for all of us laggards who have been basking in the glow of new features and capabilities given to us by the Solaris developers, but who have not been willing to take the plunge into OpenSolaris completely, need to get off the fence and move straight away to OpenSolaris.

I made that move over the holidays.  Got a new laptop.  Perfect time to make the move.  Used to be, I would run SXCE natively on my laptop and run OpenSolaris in a VirtualBox.  My rationale is that I do a lot of demonstrations for customers and I wanted my laptop to look as much like the production Solaris 10 as possible, while still getting the cool new stuff. 

Now, I run OpenSolaris native on the laptop and run Solaris 10 in a VirtualBox when I need it.

Turns out the migration has been remarkably painless.  My only hassle was actually moving my own data from one laptop to the other.

I guess that in the eggs and bacon breakfast of OSes, I have moved from being the chicken (involved in the process) to being the pig (fully committed).  And this is some tasty, thick sliced, smoked bacon!  Mmmm.

Sillyt ZFS Dedup Experiment

December 14th, 2009

Just for grins, I thought it would be fun to do some "extreme" deduping.  I started out created a pool from a pair of mirrored drives on a system running OpenSolaris build 129.  We’ll call the pool p1.  Notice that everyone agrees on the size when we first create it.  zpool list, zfs list, and df -h all show 134G available, more or less.  Notice that when we created the pool, we turned deduplication on from the very start.

# zpool create -O dedup=on p1 mirror c0t2d0 c0t3d0
# zfs list p1
NAME   USED  AVAIL  REFER  MOUNTPOINT
p1      72K   134G    21K  /p1
# zpool list p1
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
p1     136G   126K   136G     0%  1.00x  ONLINE  -
# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     134G    21K   134G     1%    /p1

So, what if we start copying a file over and over?  Well, we would expect that to dedup pretty well.  Let’s get some data to play with.  We will create a set of 8 files, each one being made up of 128K of random data.  Then we will cat these together over and over and over and over and see what we get.

Why choose 128K for my file size?  Remember that we are trying to deduplicate as much as possible within this dataset.  As it turns out, the default recordsize for ZFS is 128K.  ZFS deduplication works at the ZFS block level.  By selecting a file size of 128K, each of the files I create fits exactly into a single ZFS block.  What if we picked a file size that was different from the ZFS block size? The blocks across the boundaries, where each file was cat-ed to another, would create some blocks that were not exactly the same as the other boundary blocks and would not deduplicate as well.

Here’s an example.  Assume we have a file A whose contents are "aaaaaaaa", a file B containing "bbbbbbbb", and a file C containing "cccccccc".  If our blocksize is 6, while our files all have length 8, then each file spans more than 1 block.

# cat A B C > f1
# cat f1
aaaaaaaabbbbbbbbcccccccc
111111222222333333444444
# cat B A C > f2
# cat f2
bbbbbbbbaaaaaaaacccccccc
111111222222333333444444

The combined contents of the three files span across 4 blocks.  Notice that the only block in this example that is replicated is block 4 of f1 and block 4 of f2.  The other blocks all end up being different, even though the files were the same.  Think about how this would work as files numbers of files grew.

So, if we want to make an example where things are guaranteed to dedup as well as possible, our files need to always line up on block boundaries (remember we’re not trying to be a real world – we’re trying to get silly dedupratios).  So, let’s create a set of files that all match the ZFS blocksize.  We’ll just create files b1-b8 full of blocks of /dev/

# zfs get recordsize p1
NAME  PROPERTY    VALUE    SOURCE
p1    recordsize  128K     default
# dd if=/dev/random bs=1024 count=128 of=/p1/b1

# ls -ls b1 b2 b3 b4 b5 b6 b7 b8
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b1
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b2
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b3
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b4
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b5
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b6
 257 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b7
 205 -rw-r–r–   1 root     root      131072 Dec 14 15:28 b8

Now, let’s make some big files out of these.

# cat b1 b2 b3 b4 b5 b6 b7 b8 > f1
# cat f1 f1 f1 f1 f1 f1 f1 f1 > f2
# cat f2 f2 f2 f2 f2 f2 f2 f2 > f3
# cat f3 f3 f3 f3 f3 f3 f3 f3 > f4
# cat f4 f4 f4 f4 f4 f4 f4 f4 > f5
# cat f5 f5 f5 f5 f5 f5 f5 f5 > f6
# cat f6 f6 f6 f6 f6 f6 f6 f6 > f7

# ls -lh
total 614027307
-rw-r–r–   1 root     root        128K Dec 14 15:28 b1
-rw-r–r–   1 root     root        128K Dec 14 15:28 b2
-rw-r–r–   1 root     root        128K Dec 14 15:28 b3
-rw-r–r–   1 root     root        128K Dec 14 15:28 b4
-rw-r–r–   1 root     root        128K Dec 14 15:28 b5
-rw-r–r–   1 root     root        128K Dec 14 15:28 b6
-rw-r–r–   1 root     root        128K Dec 14 15:28 b7
-rw-r–r–   1 root     root        128K Dec 14 15:28 b8
-rw-r–r–   1 root     root        1.0M Dec 14 15:28 f1
-rw-r–r–   1 root     root        8.0M Dec 14 15:28 f2
-rw-r–r–   1 root     root         64M Dec 14 15:28 f3
-rw-r–r–   1 root     root        512M Dec 14 15:28 f4
-rw-r–r–   1 root     root        4.0G Dec 14 15:28 f5
-rw-r–r–   1 root     root         32G Dec 14 15:30 f6
-rw-r–r–   1 root     root        256G Dec 14 15:49 f7

This looks pretty weird.  Remember our pool is only 134GB big.  Already the file f7 is 256G and we are not using any sort of compression.  What does df tell us?

# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     422G   293G   129G    70%    /p1

Somehow, df now believes that the pool is 422GB instead of 134GB.  Why is that?  Well, rather than reporting the amount of available space by subtracting used from size, df now calculates its size dynamically as the sum of the space used plus the space available.  We have lots of space available since we have many many many duplicate references to the same blocks.

# zfs list p1
NAME   USED  AVAIL  REFER  MOUNTPOINT
p1     293G   129G   293G  /p1
# zpool list p1
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
p1     136G   225M   136G     0%  299594.00x  ONLINE  -

zpool list tells us the actual size of the pool, along with the amount of space that it views as being allocated and the amount free.  So, the pool really has not changed size.  But the pool says that 225M are in use.  Metadata and pointer blocks, I presume.

Notice that the dedupratio is 299594!  That means that on average, there are almost 300,000 references to each actual block on the disk.

One last bit of interesting output comes from zdb.  Try zdb -DD on the pool.  This will give you a histogram of how many blocks are referenced how many times.  Not for the faint of heart, zdb will give you lots of ugly internal info on the pool and datasets. 

# zdb -DD p1
DDT-sha256-zap-duplicate: 8 entries, size 768 on disk, 1024 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced         
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
——   ——   —–   —–   —–   ——   —–   —–   —–
  256K        8      1M      1M      1M    2.29M    293G    293G    293G
 Total        8      1M      1M      1M    2.29M    293G    293G    293G

dedup = 299594.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 299594.00

So, what’s my point?  I guess the point is that dedup really does work.  For data that has a commonality, it can save space.  For data that has a lot of commonality, it can save a lot of space.  With that come some surprises in terms of how some commands have had to adjust to changing sizes (or perceived sizes) of the storage they are reporting.

My suggestion?  Take a look at zfs dedup.  Think about where it might be helpful.  And then give it a try!

ATLOSUG December Meeting slides posted

December 11th, 2009

We had a great meeting of ATLOSUG, the Atlanta OpenSolaris User Group, this past Tuesday.  20+ people attended our first meeting with our new host, GCA Technology Services, at their training facility in Atlanta.  A big "Thank You" to Dawn and GCA for hosting our group.

Our topic this time was "What’s New In ZFS" and we talked about some of the new features that have gone into ZFS recently, especially DeDupe.  George Wilson of the ZFS team was kind enough to share some slides that he had been working on and they are posted here.

Our next meeting will be Tuesday, January 12 at GCA.  Details and info can be found on the ATLOSUG website at http://hub.opensolaris.org/bin/view/User+Group+atl-osug/

Gizmodo reviews Virtual Box

October 18th, 2009

Nice review at Gizmodo, here, of VirtualBox.  Title is Virtualize Any OS for Free.  Check it out.

ATLOSUG COMSTAR slides posted

August 17th, 2009

Slides from last week’s meeting of the Atlanta OpenSolaris User Group (ATLOSUG) are posted now on the group website – http://opensolaris.org/os/project/atl-osug

We had a good group of about 16 people in attendance and a great discussion around how and why to use COMSTAR. 

The next meeting will be held on Sept. 8.  The topic will be how COMSTAR and other OpenSolaris technologies fit together in the Sun Unified Storage family of products.  Hope to see you there!

Quick Review of Pro OpenSolaris

June 29th, 2009

Pro OpenSolaris – Harry Foxwell and Christine Tran

Several (too many) weeks ago, I said that I was going to read and review Harry & Christine’s new book, Pro OpenSolaris. Finally, I am getting around to doing this.

Overall, I was pleased with Pro OpenSolaris.  It does a good job at what it tries to do.  The key is to recognize when it is the right text and when others might be the right text.  Right in the Introduction, the authors are clear that this is an orientation tour.  They say "We assume that you are a professional system administrator … and that your learning style needs only an orientation and in indication of what should be learned first in order to take advantage of OpenSolaris."  That’s a good summary of the main direction of the book.  And at this, it does a very nice job!

This means that Pro OpenSolaris is not an exhaustive reference manual on all of the features and nuances of OpenSolaris.  Instead, it’s a broad overview of what OpenSolaris is, how it got to be what it is, what is key features and differentiators are, and why I might choose to use OpenSolaris instead of some other system.  That’s important to realize from the outset.  If you are looking for the thousand-page reference guide, this is not the one.  If you have heard about OpenSolaris and want to explore a bit more deeply, to decide whether or not OpenSolaris is something that might help your business or might be a tool you can use, this is a great place to start.
Pro OpenSolaris spends a good bit of time on the preliminaries.  There is an extensive section on the philosophical differences between the approaches and requirements of different open source licenses and styles of licenses.  Pro OpenSolaris explains clearly why OpenSolaris uses the CDDL license as opposed to other licenses and how this fits in with the overall goal of the OpenSolaris project.

Pro OpenSolaris helps you get started, with a lengthy discussion of how to go about installing OpenSolaris either on  bare metal or in a virtual machine.

Compare this to the OpenSolaris Bible (Solter, Jelinek, & Miner), which really does aspire to be the thousand-page reference guide.  In the OpenSolaris Bible, licensing and installation are given only a short discussion, since they are not central to the book’s focus.  Instead, the reader is directed to other places for that discussion.

But that’s why it’s important to have both books.  Pro OpenSolaris gives the tour of the important parts of the OpenSolaris operating system, how and why I might use them, and why they are important, but it does not go deeply into the details.  That’s probably wise for an operating system that is still growing and changing substantially with each new release.

One thing that particularly interested me in Pro OpenSolaris was the fact that it includes large sections on both the OpenSolaris Webstack which includes IPS-packaged versions of the commonly used pieces of an AMP stack – notably, Apache, MySQL, PHP, lighttpd, nginx, Ruby, Rails, etc – all compiled and optimized for OpenSolaris and including key add-ons such as DTrace providers where applicable.  Pro OpenSolaris also has a nice, long chapter on NetBeans and its role as a part of an overall OpenSolaris development environment.

What’s my take overall?  Pro OpenSolaris is a quick read that will give you a good understanding of what OpenSolaris is and why you would want to use it; what it’s key features are and why they are important; and how you can use these to your best advantage.  There are lots of examples and technical details so that you can see that what Harry & Christine talk about is for real.  I would recommend this as part of your library.  But I would also recommend the OpenSolaris Bible.  The two complement each other nicely to complete the picture.