Friday, May 3, 2013

CUDA - Figuring out how many threads have launched Part 2

Part one is here.

Last time I had determined that I could see how many threads had been launched on my GTS 450.  However, each one of those threads was making an atomicAdd() call to a pointer in host memory. Two things that  are quite bad.

If I want the host to be able to read what the threads are doing, then I need to keep the memory location in host.  However, I don't want to have every single thread updating that memory location. Thankfully, I don't have to.  Blocks are loaded onto individual SMPs at once and retired all at once. Not only that, but I can get the block size quite easily from inside the kernel with the blockDim variable.  Implementing this change, my new kernel looks like this:

    __global__ void wasteTime(int nIter, int *threads){

        int blkThreads = 0;
        if(     threadIdx.x == 0 &&
                threadIdx.y == 0 &&
                threadIdx.z == 0){
            blkThreads = blockDim.x * blockDim.y * blockDim.z;
            atomicAdd(threads, blkThreads);
        }

        int j;
        float data = 1.0;
        for (j = 0; j < nIter; j++){
            data = cos(data) + data;
        }

        if(     threadIdx.x == 0 &&
                threadIdx.y == 0 &&
                threadIdx.z == 0){
            atomicSub(threads, blkThreads);
        }
    }

I also made a modification to the main routine after the kernel call to be able to make more updates as to how many threads have been launched on the device:

        wasteTime<<<grid, block, 0, stream>>>(nIter, threads);

        struct timespec delay;
        delay.tv_sec = 0;
        delay.tv_nsec = 200000000L;

        printf("%d threads running\n", *threads);
        nanosleep(&delay, NULL);

        while (*threads > 0){
            printf("%d threads running\n", *threads);
            nanosleep(&delay, NULL);
        }

        cudaFree(threads);

This also requires a #include <time.h> up at the top of the source. One last change is that I swapped the grid to be 4x4x4, to launch more blocks.  The full code is here. Running the code generated the following results:


0 threads running
2048 threads running
2048 threads running
2048 threads running
1024 threads running

As you can see, the first time I called the printf statement, no threads hadn't been spawned.  The total number of threads launched was actually (4 x 4 x 4) x (4 x 4 x 4) = 4096 , but  I apparently got a maximum number of threads (for this problem and GPU) that could be resident on the GPU as being 2048.  Another good thing, however, was that I can see the number of threads decreasing at the right right before it finishes.

There is a small problem in the kernel, especially with small block sizes though.  Any time there is a divergence in threads within a warp (set of 32 threads), all of the threads in that warp perform all of the commands for all of the divergent paths with the non-applicable threads having their results masked out.  What this means for my code is that threads 1-31 will all wait and twiddle their thumbs while thread 0 goes off and does its DMA atomicAdd and atomicSub. With the size of my block equaling 64, this means that a full half of my threads are affected by the time delay associated with 1/64 of my threads.

I can attack this in a few ways. The first is to do the atomic commands only on a single thread from the entire grid. Another is to set the value of *threads prior to the launch and remove the atomicAdd, but leave in the atomicSub. I think I want to leave the atomicSub at the end of the kernel anyway (and calculate blkThreads within that if statement) because I like seeing individual blocks retired. Changing the kernel to have only block 0,0,0 thread 0,0,0 atomicAdd the number of threads for the grid and then have thread 0,0,0 for each block atomicSub it back down (code here) has the following results:

0 threads running
4096 threads running
3072 threads running
2048 threads running
1024 threads running

The count starts off at 4096 and goes down as the blocks retire. Having the host increment the number of threads prior to launch, though, may be even better.  This would allow me to see how many threads have been scheduled for launch, which is what I really want to know for load balancing purposes.

To avoid race conditions I'll have to still use atomicAdd, just from the host, especially if I branch out into threading, where we may have multiple threads trying to do this all at once. Simple right? Well, you can try it, but I'll spoil it, atomicAdd is a "device" function, it can only be called from the device. Obviously the idea of atomics aren't new to CUDA though, so we can look else where.

I could try to use the C++11 standard library atomics, but there are a few  but that requires me to create the variables as a std::atomic<int> object. This is problematic because I also need to allocate the memory using cudaMallocHost to get my DMA access.  Hmm, well, I can also use intrinsics. GCC has an intrinsic __sync_fetch_and_add.  This works in C (which means I can drop it into my runningThreads.cu file), but it is, as I said, intrinsic to GCC, making it less portable. This could bite me later if I want to use the Intel or PGI compilers.

I implemented the increment with __sync_fetch_and_add and changed the loop so that it would launch a new grid (with a total of 3 grids being launched) when there was less than 30% of the threadsPerGrid still on the device. This could just as easily be set based on some percentage of cudaDeviceProp::maxThreadsPerMultiProcessor * cudaDeviceProp::multiProcessorCount. In any case, the code can be found here, and the output was:

16384 threads running
16384 threads running
12288 threads running
12288 threads running
8192 threads running
8192 threads running
20480 threads running
20480 threads running
16384 threads running
16384 threads running
12288 threads running
12288 threads running
8192 threads running
8192 threads running
20480 threads running
20480 threads running
16384 threads running
16384 threads running
12288 threads running
12288 threads running
8192 threads running
8192 threads running
4096 threads running
4096 threads running

Looking pretty good. Next time I'll throw some OpenMP threads into the mix and maybe an idea I just had about a way I could use the C++11 atomics to capture the number of currently launched threads, but I haven't even tried to implement that yet.

Tuesday, April 30, 2013

CUDA - Figuring out how many threads have launched Part 1

I've talked about multiple threads sharing a single GPU once before. That got me thinking though, what if I don't want all of them to be using the GPU simultaneously.  What if the GPU actually becomes the bottleneck in my computational code? In all likelyhood, anything I implement on a GPU will probably also have a CPU implementation somewhere, because I would like my code to run on any system. If it can run on an accelerated system, all the better.

Now that I've laid out my motivation, it's time to figure out what I can do about it.  It is pretty simple to figure out how much of the devices memory has been utilized with the function cudaMemGetInfo(). But what about the number of blocks or threads? Now, it may be that my Google-Fu is weak, but I couldn't find a built-in way to let me determine the current number of threads running, so I decided to see if I could build one myself.

I started off using the same code I used last time, though I modified my  wasteTime routine so that it would cut back on the registers used a bit.  The first iteration of the new wasteTime function looks like this:

     __global__ void wasteTime(int nIter, int *threads){

        atomicAdd(threads, 1);

        int j;
        float data = 1.0;
        for (j = 0; j < nIter; j++){
            data = cos(data) + data;
        }
    }

In the main function, I got rid of the OpenMP (for now) to focus on getting this call right.

     int main(int argc, char *argv[]){
        int nIter;
        nIter = 100000;
        int *threads;
        cudaMallocHost(&threads, sizeof(int));

        *threads = 0;

        dim3 grid(2, 1, 1);
        dim3 block(4, 4, 4);

        cudaStream_t stream;

        cudaStreamCreate(&stream);

        wasteTime<<<grid, block, 0, stream>>>(nIter, threads);

        printf("%d threads started\n", *threads);
        cudaStreamDestroy(stream);
        printf("%d threads started\n", *threads);

        cudaFree(threads);
    }

A few of notes before I run this:
  1. The threads pointer in main is the same threads pointer in the CUDA code, this is done via pinned memory on the host and direct memory access (DMA) from the GPU to the host.
  2. DMA from the device to the host is incredibly expensive.
  3. DMA is not available for all compute cards, though it is for all Fermi-based cards and above, and honestly, Fermi is old enough now that you should be able to assume that if people are going to accelerate their code, they'll have it.
  4. Atomic operations are expensive because they get serialized.
  5. I have changed the grid and block dimensions for launch so that they can be a bit more general. I get the feeling I'll be using this general project as a framework for a bunch of different tests. If I don't, no biggie.
  6. The source code up to this point is here.
I've left in the streams from last time because I want to parallelize this later.

Building and running (still on my 4 core AMD with a GTS 450) I get an answer of:

0 threads started
71 threads started

Hmm, interesting. Doing some basic math real quick, we know that we have 2 blocks, and each block has 4*4*4 = 64 threads. The first time that the printf statement is called, we haven't actually launched any threads yet, or at least they haven't modified the global memory yet. We then destroy the stream, which I had thought would synchronize the device and the host, and when we call printf again, we only have 71 threads, more than 1 block, but not fully 2.

Now we'll go back and synchronize the device to the host. This will be a simple change where we just add a call to cudaDeviceSynchronize() right after the second printf call, and we'll put another printf right after that to see how things have changed. Also, I've modified the wasteTime function and given it an atomicAdd(threads, -1); at the end, so that instead of seeing how many threads have started, I can see how many threads are currently running.  I've also modified the printf statements to reflect this. The code for the new version is here. The results from 3 separate runs are:

ross@kronix:~/dev/cuda/runningThreads$ Debug/runningThreads 
0 threads running
113 threads running
0 threads running
ross@kronix:~/dev/cuda/runningThreads$ Debug/runningThreads 
0 threads running
77 threads running
0 threads running
ross@kronix:~/dev/cuda/runningThreads$ Debug/runningThreads 
0 threads running
68 threads running
0 threads running

As you can see there is a fair amount of entropy in the number of threads "caught" running by the middle printf, which is between the thread destroy and the device synchronize.

I definitely have some more to add to this, such as not having every thread run the atomic functions, and spawning into multiple OpenMP threads as well. I'll be following this post up with some future posts detailing my thought process and approach to solving this problem I may or may not have invented out of thin air.

In fact, you may be wondering why I even care about this at all. The reason is I may be working on a project soon which will literally have thousands of concurrent tasks each one of which may or may not be able to saturate a GPU for a short time. If I am even running on an accelerated system, it will still have a high CPU to GPU ratio on each node. While spawning some of these tasks onto the GPU to free up CPU time will be beneficial, if I end up being stuck waiting for the GPU to crunch through all of the tasks I've given it, because I haven't paid attention to load balancing, well, that would be the opposite of accelerated functionality, now wouldn't it. That last sentence is long and has a lot of commas, but I'm going to leave it like that anyway.

On a more pragmatic tangent, like "why wouldn't you just develop for CPUs and then accelerate with a Xeon Phi?", The fact that I can do these kinds of tests and experiments at home is one of the nice things about how NVIDIA has set up their system. I can tinker with my low cost system, but port that code and expertise over to the production systems at work with minimum effort. While I'd love to get my hands on an Intel Xeon Phi and do some work with it (a minimum $2k proposition at the low end, including the fact that my current desktop couldn't run it), I would pretty much have to purchase the Intel Compilers to go with it to be able to build toward it. Contrast that with my $65 video card and free compilers, and the choice isn't that hard at the moment. That and I find it fun.

Still, let me reiterate that: I would love to get my hands on an Intel Xeon Phi.

Finally, on a non-GPU related note, I want to thank Tod for giving me some space on his server. He set up my account a while ago for D&D stuff, but I don't think he'll mind if I put some code up for others to try out. If you want me to take it down (or actually link to you, other than tod.net) let me know.

Saturday, April 27, 2013

Thoughts on Spring Cleaning

I feel like my wife and I were pretty productive today. First I finally got around to mowing the lawn for the first time this season. There was one point where the grass was so thick and I was moving too fast (at a walk) that I actually stalled the mower. Going over that patch a bit slower let me take it out though.

After that, and lunch, we started in on what we refer to as the office. This is where our computers are and it was in pretty bad shape. Truth be told it still isn't actually clean, but I can actually see the surfaces of both of the desks that make up my corner. I also raised my two monitors (having experienced 2 monitors, I far prefer the increased real estate to a single monitor) up onto a mostly unused shelf that was sitting behind them.

This made an incredible difference for my entire desk. Not only did it make the space they were sitting in usable, it made the mostly unused space behind them (and under the shelf) usable as well. Now I'll just have to get used to the raised position of the monitors. While I am slouching pretty bad right now, for the most part I have found that I have had better posture with them standing higher.

I'm sure this post has been pretty boring so far, so here are some amusing tidbits from our adventures in cleaning.

  1. We found a small LEGO creator kit, I built the race car and it is now sitting on the shelf with my monitors, I may or may not have been playing with the race car all evening to the annoyance of my wife.
  2. We filled a medium sized trash bin twice with recyclables.
  3. We found a birthday card that was supposed to go to somebody a few weeks ago (technically this was found in the kitchen, and they did get the present at the appropriate time).
  4. We found a hospital bill that was over a year old.
  5. Pants which had a loose button, and therefore were on the sewing table for over year were fixed.
  6. ???
  7. Profit
Now to get ready for the D&D adventure I'm running tonight.

Wednesday, April 24, 2013

Sharing a (nVidia) GPU between OpenMP Threads

In case you couldn't tell from the title, this will be somewhat code heavy.

I was thinking about one of my problems for work and thought that at some point it may be worth while for me to know whether or not I could share a GPU between different threads using OpenMP. I thought it was possible, since they would all share the same context (still not sure what that means, but I   know that MPI processes do not share the same context, and that they can't share a GPGPU for pre-Kepler hardware).

A couple of disclaimers, I only write in CUDA when I'm doing GPGPU work, which I know ties me to nVidia hardware, but that's the way it is.  Also, this code has no error checking, because it was a quick and dirty test this evening. I'm pretty sure I spent more time on writing this blog post than I did on  writing the code.

At work I have access to most compilers as well as a GTX 680 (in my workstation, it's sweet), some K10s, a K20, a system where I can get up to 4 nodes with each node having one M2050 (Fermi), and a system with multiple M2070s per node (Fermi). At home though, well, I'll let some of the device properties speak for themselves.

 
==== Device 0 ===
                  name: GeForce GTS 450
    concurrent kernels: T
                   SMP: 4
   Max Threads per SMP: 1536
  Max Theads per Block: 1024
            Global Mem: 1.00 G
               Compute: 2.1
   Asyncronous Engines: 1

So yeah, not the most powerful GPU out there, but it has computer capacity 2.1, which was why I bought it. It only cost ~$65 at the time. But I think my next one might be in the GTX Titan family. But that is a different post.

As an aside, if anybody wants the code to generate the above queries the code is here.

The first thing that I decided I had to do was create a kernel which could waste some serious time so that I wouldn't have the possibility that one kernel from a thread would be done before the overhead from the next even started. I didn't want to deal with allocating and freeing memory, though. I came up with a very simple and useless kernel:
 
    __global__ void wasteTime(int nIter){
        int j;
        float data = 1.0;
        for (j = 0; j < nIter; j++){
            data = cos(data) + sin(data);
        }
    }

Next, I set up a simple program to call the kernel from a bunch of different threads.
 
    int main(int argc, char *argv[]){
        int nIter;
        nIter = 100000;

    #pragma omp parallel
        {
            wasteTime<<<1,128>>>(nIter);
        }
    }

I created this project using the nVidia Nsight Eclipse Edition (I'm running on the Linux box), which has nvvp (a handy-dandy GPGPU profiler) built in, which will allow me to easily see if I was able to overlap my kernel calls. Here is the a profile of that first attempt
Figure 1.
So the good news is that my kernel takes more than half a second, which makes it a supreme time waster for this situation.  Also, at first glance, it appears as though I have 3 kernels that are overlapping. First glance is wrong though, and my initial inkling that it is wrong was that I have 4 cores on my CPU, so I should see 4 kernels if everything was working correctly. Looking a bit closer, I'm pretty sure I actually have only 1 kernel running there. To check I put in a printf statement inside the parallel section. If I do have more than 1 thread running, I should see more than one output line.

    int main(int argc, char *argv[]){
        int nIter;
        nIter = 100000;

    #pragma omp parallel
        {
            wasteTime<<<1,128>>>(nIter);
            printf("done.\n");
        }
    }

Figure b.

Hmm, only one 'done.' That's a problem. And then I remember, I have to pass in some flags to make sure that it actually compiles the #pragma omp parallel statement. I didn't see any warnings about not being able to handle the #pragma statements, which I usually get. That may have something to do with nvcc, or it may have something to do with me only checking to see if successfully built, and not looking through the console output.  It took me a bit to figure out where they go when using Nsight Eclipse edition (I often role my own Makefiles), but I'll give you a couple of screen shots so that you can easily find them if you do the same thing.  You can get the dialog below by selecting 'Project' ->'Properties'.
Figure iii.

Note, I only opened the dialog once, I did not have 2 dialogs one on top of the other, just thought I would make sure that was clear. I then had to go into my Debug folder and call 'make clean' (though there was probably another way to do this), rebuild and reprofile, yielding the following results:

Figure we're getting closer.

Well, we got the 'done' to show up 4 times, and I can also see the 4 individual threads in the profiler as well.  The bad news is that there is absolutely no overlap of the time wasting kernels.  There should be plenty of room on the GPU, after all, there are 4 SMP, and each one can handle 1.5k threads. I've also established that it can handle concurrent kernels.

And that's when I remember it, streams.  At least, I think that is the answer. Basically all of the threads are calling into the GPU's default stream (which appears to be stream 2 according to the profiler). Not only that, the default stream doesn't do things quite as asynchronously as other streams. To be able to get concurrent kernels I have to have each of the threads call into a different stream. That should be easy enough.

    int main(int argc, char *argv[]){
        int nIter;
        nIter = 100000;

        cudaStream_t stream;

    #pragma omp parallel
        {
            cudaStreamCreate(&stream);
            wasteTime<<<1,128, 0, stream>>>(nIter);
            cudaStreamDestroy(stream);
            printf("done.\n");
        }
    }

Figure Whoo popcorn superposition (kernel overlap)!

I've collapsed the CPU threads as they aren't all that important for what I want to test. However, as we can see stream 7 and stream 10 are both executing at the same time. The red line in there is apparently overhead directly associated with the profiler, and I may look into that some more.  I also ran the (from the debug directory) /urs/bin/time -p ./ompTest which gave me an elapsed time of 1.07 seconds. That is obviously more than 1x the time to run wasteTime(), but less than 2x the time to run wasteTime() so the profiler might be messing up some other stuff in there too.

Finally, final version of my (somewhat small) source code file is here. If you've made it this far, hopefully this post has been of some use to you.

Saturday, April 20, 2013

Thoughts on exercise

I don't get to exercise as much as I would like to. Or rather, I don't make as much time for exercise as I aught to. I enjoy exercise in general. In the past I have enjoyed soccer, ultimate, and even a bit of basketball (though I cannot claim to ever have had much skill in that last one). For solo exercise, I actually somewhat enjoy using an elliptical, and I think that I would like to run at least a half-marathon at some point. Maybe I should make myself an official 'bucket list', but that is not what this post is about.

The primary problems for me are motivation and time management. In terms of motivation, I almost always enjoy the exercise while I am doing it, and I typically feel pretty good afterward as well. For some reason, though, it can be hard to drag myself out of bed or away from the computer to actually start the activity.

The issue of time management has become more pronounced lately. I used to hit the gym on Monday, Wednesday, and Friday mornings. However, about a month ago we got a dog. What this means, though, is that I typically take him on a walk every morning and evening, which takes about 20 minutes. So while I do get some exercise every day, it is not very intense. I have considered running a bit on these walks, and I may start doing that.

Another option is to start going to the gym in the evenings, but that takes time away from my family. So this post comes to an end without any real resolution, but rather just some thoughts which have been bouncing around my head lately.

Thursday, April 11, 2013

Thoughts on loading ... loading ... loading ... done

Almost all of the coding that I do is command line based. I work in an environment where pretty much everything has to be accessed via the command line, and even when we do work with a GUI, it typically is to set up a command line string to be passed to a system call. That is just the nature of the work when you operate in a batch environment.

Anyway, back to my point, sometimes there are times when I have to watch a screen doing some boring stuff like tell my how many of my 12k files have been copied over to the work space before the job can start. For things like this I would kind of like a more interesting screen than:

100 of 12224 files copied
200 of 12224 files copied
300 of you get the picture.

I think I may start to work on a library (written in C, probably using ncurses) to do some text based animations.  There are a few which were interesting in a thread over at stack overflow which could be kind of neat, but few of them actually give information about the progress of the process.

I had some ideas, like a bulge going through a pipe and filling up a bucket, along with some information about the progress.  An example might look something like:

 
---------/ \----\
---------\_/---\ \
             |     |
             |     |
 x of y      |-----|
 items       |     |
             +-----+

Another option would be something along the lines of files flying toward something, maybe a giant mouth that wants to eat it:

 
                           /-----\
               ____       / o     \
        --    /   /       +--\     \
       --    /___/            |    |
________________________  +--/     /
     #.# % done           \       /
     x of y items          \-----/


Another one which I won't draw out here at all, but I thought was funny was a cow which eats some grass on one side of the screen, then moves over to the other side of the screen and, um, deposits the grass in a slightly altered form.

I figure the API would look something along the line of:
 
    int pipe(int progress, // the number of items / tasks which have finished
             int max,      // the to total number of items to be done
             int substeps, // the number of steps to generate for an animation
             const char* itemName); // replaces 'items' above

Of course, I am by no means tied to this API, and it may even make sense to make the library in C++ so that a single animation could be considered an object, making it easier to have state.

Finally, while this was inspired by something at work, it is by no means appropriate for me to develop while at work. Though if I do get it developed on my own time, and like the results, I have no problem with incorporating it into my projects at work. I'm welcome to other ideas for animations, API, or any other form of help/support.

Sunday, April 7, 2013

Thoughts on running a D&D adventure

Again, not a coding post, but I didn't feel like working on one of those tonight. I'd rather talk a bit about the D&D adventure I ran last night.

So I'm the DM for a (generally) biweekly gaming group.  We are spread out geographically, but all in the Eastern time zone. We use a couple of tools for communication, mostly Skype for voice and roll20 for map-support. Anyway, one of the members of my group is a paladin for Garl Glittergold. The group is at level 4 now, and I decided that Glittergold (also known as Watchful Protector, The Joker, Priceless Gem, and Sparkling Wit) has all of his paladins undergo a quest at around this point in their development.

As it turns out I also knew that the group was going to head through a portal near the very beginning of this particular adventure. They decided to all jump through simultaneously, which made it really easy. "As you all are jumping into the portal, you notice it change from a glowing blue to a glowing yellow."

They found themselves standing in a 30' x 30' room, with a 10' wide ethereal glowing scroll on the south side of the room that looked something like this:
They then saw that the northern "wall" was more like a shimmering/ glowing ethereal curtain. When the walked through that, they found themselves in a 50' x 50' room where the north, east, and west "walls" were all ethereal curtains.  The north wall and the west wall were glowing, the east wall wall was dark. Essentially they had walked into a giant 5 x 5 grid, where each tile was 50' x 50'. If they followed a path laid out by the glowing light, it would eventually take them to a room where all of the walls were either stone, or dark, but in that room would be a large sized lantern archon (I know they aren't large size, but Glittergold enlarged this one. That's my story and I'm sticking to it) in the north east corner of the room. Not that any of my PCs have knowledge: the planes, but an NPC cleric with them did.

Anyway, when any body part of the party got into the same square as part of the archon, it said "Eeee" and left, leaving 2 new walls glowing.  The party had found the puzzle.

following the archon the players find that it repeats a pattern of positions. Actually the solve the function of this part puzzle before they got through the pattern the first time.  The pattern was:

Black lines indicate rooms, faint blue lines indicate 10' squares, and yes, that is a screen capture of Excel. While the paladin was investigating the 2nd position, they kicked all of the non-gnomes out of that "room". Around this time they also discovered they could ask the archon questions, and it would answer, though not always in a useful manner. For example: Q: "When is the puzzle over?" A: "When you have solved it or have given up." 

The non-gnomes headed south, where one of them noticed some very faint lines on the ground, lines which formed an "F".  They then moved to the room (marked above with a 9 and a 6) where they found the lines associated with the letter L. It was at this point that one of my players very quickly pieced together that the rooms correlated directly with the alphabet, starting in the upper left and reading like a book. He then told the other party members to have the archon move. When they had gotten to the 4th position, they were trying to figure out was "HATO" meant. They finally asked the archon if it's position within each room mattered, to which it responded with an actually helpful "yes".

From that they quickly recalled where it had shown up the previous 3 times, and generated the string "HeAdToOv", where the capital letter is the room, and the lowercase letter is the location within the room. They eventually got the string "HeAdToOvErLoOkHiLl" which is pretty obviously "Head to Overlook Hill". 

There were some other small things which they thought might have been in important (Only the rooms with the letters C, D, F, I, L, O, Q, T, U, X, and Y had glowing letters on the floor, and they seemed convinced that the poem itself had some sort of clue to solving he riddle. Of course, the full puzzle isn't over yet. They don't even know where Overlook Hill is. And, as the last thing the did, they all stepped back into a portal, not knowing what lies on the other side >=).

If I were to run it again, I think I would have tracked how many questions the group asked the archon, and once they had asked 20, the only way the archon would respond is "I have already answered 20 questions".  Another option might be to have the archon only answer yes/no questions. One of my favorite question and answers was "Are your movements spelling out a word?" to which the archon answered "no". This was truthful because the archon's movements were spelling out multiple words.

So there you have it. Feel free to drop this puzzle into a game if you are a GM, it can fit in quite a few places whenever you want them to spend some time working out a message. Heck, if you have a paladin of Garl Glittregold in your group and want to use the scroll and the cheesy poem, feel free.

Thursday, April 4, 2013

Thoughts on Connections


So this is non-alcoholic beer, right?
This is another blog post about something not related to computers or code, though I've got a couple of those posts kicking around in my head. I'll try to get one up this weekend, but no promises.

So a couple of days ago my wife shared a photo by Elise Andrew on her Facebook page (IFLS, I'm going to try not to swear here). Through that post I'll link what she linked which is here. Yes, that is an awesome looking jet pack on a toddler. You could probably say that the link is relevant to my interests.


Anyway, we decided that we wanted to do it too. I mean, it's awesome, why wouldn't we want to make one? But to make one we would need some 2-liter bottles. We don't just buy 2-liter bottles though, we buy soda in 2-liter bottles.  I decided to buy some of the soda seen at right.

Now, this soda is fairly tasty on its own. But it can be improved, and quite simply. "How?" you may ask, but I believe that you, dear reader, already know the answer.

If you have not figured it out by the time you read this line, then the images are probably broken. But yes, we added vanilla ice cream to our already tasty root beer so that we might consume our root beer floats.


Tastiness ensued. And that is how a toddler rocket-pack craft project turned into a root beer float.

Monday, April 1, 2013

Thoughts on April 1st

I'm not a big April Fool's Day prankster. My wife has pulled of a few good ones in the past; and I have sometimes provided some help in wording and/or initial idea, but for some reason I don't ever think of ones that involve myself.  Also, at this point I don't think I could really pull them on my children.  One of them wouldn't even know what is going on, and the other takes things so literally that he doesn't get the idea of a practical joke yet.

But this does raise a question (note: it does not beg a question) for me. Why do we pull pranks on people in the first place? I feel like the purpose of pranks is to make people look foolish, silly, and/or stupid. I have realized for a long time that I am not a fan of schadenfreude. Laughing at other people's misfortunes is not for me. People who know me in real life can also attest that I'm not really a fan of sitcoms because much of the humor is derived from people being in awkward situations. I wouldn't want to be in that situation, why would I laugh at somebody who is in that situation?

It may very well be that I don't know how to take a joke. When I look at what I do find funny, I gravitate toward puns, some physical comedy, and general silliness. Perhaps interestingly, I'm typically not good at crafting the type of humor I enjoy.

I often tell people that I don't enjoy movies or TV shows with plot. However, I can watch documentaries, sports, stand-up comedy, and game shows very easily. I seem to have a dislike for tension build-up. Unfortunately, tension build-up seems to be a requirement for drama. This is in no way a criticism of drama, as obviously people have enjoyed it for thousands of years, but more of an introspective on myself.

This dislike for tension, of course, generates tension of its own. A common social activity is to watch a movie, either in the theater or on a couch at home. If I am always against watching movies, that doesn't bode well for interacting well socially, and this has created problems in the past, and will likely cause problems in the future.

Maybe I'll come to appreciate tension build-up, hoaxes, pranks, and the like. Maybe I'm just a fool.

Wednesday, March 27, 2013

Thoughts on Gay Marriage

Or, you know, marriage.

This will be rambletastic, you have been warned.

For a long time, probably over a decade, I have thought that consenting adults should be allowed to enter into legally binding, and respected, agreements.  This has progressively changed from allowing "Civil Unions" with all of the rights of marriage to my current position of just calling it what it is, marriage.  I have asked people who are against it to provide me with one non-religious reason for disallowing it, and the one that I heard most recently was that he did not want to have to explain to his grand daughter why two people were doing something that they thought was immoral and the government was condoning it.

Really? Outlawing the ability of two people who love each other to care for each other because you can't talk to children about how different people have different views of what is immoral? Though I suppose that is how prohibition started too. Of course, this is still a religious argument.

Even before I was married, I held the opinion that two people of the same gender getting married would in no way devalue my marriage.  If two people who love each other, and who happen to be of the same gender, get to marry, and this somehow has a deleterious effect on my marriage, that says a whole lot more about my marriage than it does about the gay couple who love each other.

In terms of the national good, I would think that allowing gays to marry each other would actually stimulate the economy, at least a little, as there are more people employing reception halls, DJs, bakeries, and everything else that goes with a wedding, whether or not the ceremony proper is held in a church.  Though I guess more lawyers will probably be getting paid too as the inevitable divorces come through.

That of course points us to the "sanctity of marriage" argument. I'm not certain of the current statistics, but I was under the impression that it is still something like 50% of marriages end in divorce. In this country marriage hasn't been sacred for quite some time. When a celebrity has been married for over 10 years we applaud them. When non-celebrities are married for more than 30 years, we, as a society, marvel at their matrimonial longevity.

Maybe I should have made this post yesterday, or many days ago. Regardless, the members of the Supreme Court will not see it prior to their decision, nor should online essays sway their decision one way or the other. They should base their decision on their interpretation of the Constitution, including the part which provides equal protection under the law.

This also brings up my final thought, the only place the word "faith" shows up in the original Constitution has the effect of saying that marriages between same-sex partners performed in New York have to be upheld in Utah.  The full faith and credit clause. Though the word faithfully shows up 2 places, one in an oath for the President, and the other stating that the president will ensure that the laws are faithfully executed.

Monday, March 25, 2013

Bring out your dead!

First non code-related post!

My wife and I have been playing a lot of Zombicide recently, and we are backing Zombicide: Season Two as well.  If you don't want to click through the links, it is a cooperative board game for 1-6 people where you are trying to survive the mission at hand.  You can pick up items which make killing zombies easier, but the more experience you rack up (by killing zombies) the more zombies spawn each turn, making it a balance of killing vs. running away.

The game comes with about 10 missions in the rule book, but the website has a bunch of other missions for download.  There is also a game editor so you can easily create your own missions.  They even run a weekly contest where the winning submitted mission gets an exclusive survivor sent to them.

My wife and I are working on a mission which we will probably be play testing in the next week or so. We hope to eventually submit it.  If it wins, I'll be sure to post a link to it here.  If it doesn't, I'll probably  post a link to our write up so that others can play it anyway.

Hmm, I wonder if Zombicide will ever make it onto #TableTop?

Sunday, March 24, 2013

Building gcc 4.8

So apparently on 22 March 2013, GNU put gcc 4.8.0 into production.  So I decided to build it on my home computer (AMD Phenom2, Ubuntu 12.04 64-bit)

Here are the steps that I took for a pretty smooth installation.

Edit: I initially used 4.1, however, I have since built gcc-4.8.0 on other (non-debian) based systems where I have not had root access. Step 4.2 should work in those cases. This has worked on CentOS 5.9 on an IvyBridge Core i7 and CentOS 6.3 on a SandyBridge based Xeon. My prefix directory in those cases were $HOME/app/gcc/4.8.0.
  1. Download the source code
  2. Unzip the file (tar -xjf <file name> for .bz2 file, like what I linked to)
  3. Make a build directory (mkdir gcc-build && cd gcc-build)
  4. Install some supporting parts 
    1. Option 1: sudo apt-get install libgmp-dev libmpfr-dev libmpc-dev libc6-dev libc6-dev-i386
    2. Option 2 (from source code directory): contrib/download_prerequisites (prereqs will then be built as well)
  5. Run configuration (../gcc-4.8.0/configure --prefix=/app/gcc/4.8.0)
  6. make
  7. sudo make install
I use environment modules, so I've put together a module file in modulefiles/compiler/gcc/4.8.0 which currently looks like this:

 
#%Modules1.0
##
##

conflict compiler

prepend-path PATH /app/gcc/4.8.0/bin
prepend-path LD_LIBRARY_PATH /app/gcc/4.8.0/lib64
prepend-path MANPATH /app/gcc/4.8.0/share/man
prepend-path C_INCLUDE_PATH /app/gcc/4.8.0/include
prepend-path CPLUS_INCLUDE_PATH /app/gcc/4.8.0/include

setenv CC /app/gcc/4.8.0/bin/gcc
setenv CXX /app/gcc/4.8.0/bin/g++
setenv FC /app/gcc/4.8.0/bin/gfortran

After that, a 'module load compiler/gcc/4.8.0' and the 4.8 version of gcc, g++, and gfortran are my default C, C++ and FORTRAN compilers. Pretty simple over all.

Text Editors

I write code on occasion.  I've found that most people who write code have a favorite text editor for doing that.  I haven't found anybody for whom that text editor is Notepad, but I'm getting ahead of myself.  This happens to be a subject where I have some opinions.  However, I have to be honest and say that I haven't really investigated the  multitude of text editors out there.  Another thing to note is that I spent my formative computer years using Windows products, so I like having certain keyboard shortcuts, most notably CNTL-C for copy, CNTL-X for cut, CNTL-V for paste, CNTL-S for save, CNTL-Z for undo, and CNTL-Y for redo.  I know there are plenty of text editor editorials and guides out there, but this is mine.

Geany
Right now my text editor of choice is Geany.  Geany seems to fit my style pretty well.  It has automatic text highlighting for the languages I use most, it has tabs, and it also is available easily on Windows as well as Linux.  I know that it is in the repositories for Ubuntu and an RPM can be found for Red Hat/Fedora/CentOS.  I have also been able to easily build it from source on CentOS when I did not have root access, though it does have a few library dependencies (GTK++ is one of them I think).  I haven't decided if Geany is a heavyweight text editor or a lightweight IDE.  However, it loads up pretty fast, and if I have other files open, can autocomplete variables, class names, and functions out of those open files.  If I'm writing in XML, it will also automatically created close tags, which is can be pretty nice.

Notepad
I hate Notepad.  Notepad wasn't quite the first text editor I ever used (I think that was edit, back on DOS) but it was close.  There are many reasons I dislike Notepad.  Here are the main ones:

  • No tabs: If I am working with text files, there is a reasonable chance I am working with many text files simultaneously.  Having each file in a separate Window is infuriating
  • No text highlighting: text highlighting is very helpful when working with code, 'nuff said
  • No line numbers: again, something very helpful when working with code
  • And the kicker: no support for new lines without a carriage return
Seriously, I can't read any text file generated on a *nix system because Notepad doesn't realize that \n means newline, it requires \r\n.  I can't imagine this is that hard to fix, and sure anybody who is actually going to write code at all is going to get a better alternative (the previously mentioned Geany, Notepad++ is pretty good too).  However, to not include something at least marginally better than Notepad seems like a big FU to developers.

Notepad++
Notepad++ is an excellent choice for replacing Notepad on Windows.  It fixes all 4 gripes I posted (but not all that I have, believe me) about Notepad.  The main reason that it is not my primary text editor on Windows is that I can't get it for Linux.  I work in Linux often enough, that having my text editor similar in both Windows and Linux is enough of a boon to push Geany above Notepad++.  Because of this, I haven't actually used Notepad++ in probably over a year.  However, it is still a quality text editor.

vi / vim
If I only have access to a command line editor, or if all I want to do is make a very small modification to a text file, I use vim (or plain vi is vim is not available).  vi is incredibly powerful if you know how to use it.  I know how to use very small parts of it, enough to move around somewhat quickly, do search/replace, a little bit of cut and paste, and mass deletions.  However, these don't follow my normal keyboard shortcuts, so sometimes I mess up pretty badly (which is why repositories are my friend).

If I have to do lots of coding on a system with no GUI (and therefore no Geany) what I'll actually do is develop on a system with Geany, and then just rsync files up to the GUI-lacking system.

Emacs
I don't hate emacs, but I don't typically use it.  If a system has emacs, it probably has vi as well, and I have already become proficient enough with vi.  Also, the control characters really throw me off.  I know that I could create my own initialization file which sets those to my preferences, but then I would have to port that file around everywhere.  I haven't even gotten to the point of porting a .bashrc around everywhere (Note to self: I should probably push a .bashrc onto my Google Drive at some point).

UltraEdit
UltraEdit was the first text editor I ever used that I actively liked.  I actually paid for the license back in high school when I was using it (and the license cost $30) for version 9.  I haven't used it in years due to the cost associated with it, and by then I had found other text editors that I liked, but I do remember the column edit mode was pretty sweet, where I could grab a rectangle of text in the middle of other text.  Maybe I'll grab the free trial and play around with it some more, see if it is worth the money and capable of ousting Geany from my top spot.  Another bonus (at least on the version that I had used to have): built in Hex editor.

I think that is all for now.  I may talk about IDEs I have used (Eclipse, JavaBeans, Visual Studio Express, MonoDevelop, more Geany) at some later point.  Or this blog could go the way of many other journals I have started, and I just forget about it (I'm looking at you livejournal).

Saturday, March 23, 2013

Cypher which has probably been done before

I had an idea for a somewhat simple cypher a few weeks ago, and I finally got around to coding it up in Python. Basically, it takes a letter, finds the index of that letter within the cypher key, and then adds the index to a running offset value. The offset value is then modulous by the length of the cypher key, and the location within the cypher key associated with the new index is output. Optionally, a static value can be added to every offset.
If anybody actually sees this, let me know what you think.

 
'''
A module for creating and decoding a simple cypher-text
'''

from collections import OrderedDict

cyStr = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ,.;:"\'!@#$%^&*()[]{}-?/'

def rollingEncode(string, offset=0, cyString=cyStr):
 cyString = "".join(OrderedDict.fromkeys(cyString).keys())
 ncyStr = len(cyString)

 if ncyStr == 0:
  return string

 anslist = []
 cyPos = 0
 
 for a in string:
  index = cyString.find(a)
  if index != -1:
   cyPos += index + offset
   while cyPos < 0:
    cyPos += ncyStr
   cyPos = cyPos % ncyStr
   anslist.append(cyString[cyPos])
  else:
   anslist.append(a)
   
 return ''.join(anslist)
 
def rollingDecode(string, offset = 0, cyString=cyStr):
 cyString = "".join(OrderedDict.fromkeys(cyString).keys())
 ncyStr = len(cyString)

 if ncyStr == 0:
  return string

 anslist = []
 cyPos = offset
 
 for a in string:
  index = cyString.find(a)
  if index != -1:
   index -= cyPos
   while index < 0:
    index += ncyStr
    
   index = index % ncyStr
   
   anslist.append(cyString[index])
   cyPos += index + offset
   
   while cyPos < 0:
    cyPos += ncyStr
   
   cyPos = cyPos % ncyStr
  else:
   anslist.append(a)
   
 return ''.join(anslist)
   
if __name__ == '__main__':
 import sys
 if len(sys.argv) > 1:
  if len(sys.argv) > 2:
   cypher = sys.argv[2]
  else:
   cypher = cyStr
   
  print rollingEncode(sys.argv[1], cyString=cypher)
  sys.exit(0)
 
 cypher = 'The quick brown fox jumped over the lazy dog.!?'
 a = "This is a test. WHOOOO!"
 offsets = [-500, -50 -5, 0, 5, 50, 500]
 for o in offsets:
  b = rollingEncode(a, o, cypher)
  c = rollingDecode(b, o, cypher)
  print "offset =", o, "cypher = '" + cypher + "'"
  print "a =", a
  print "b =", b
  print "c =", c
  print ""
  
  b = rollingEncode(a, o)
  c = rollingDecode(b, o)
  print "offset =", o, "cypher = '" + cyStr + "'"
  print "a =", a
  print "b =", b
  print "c =", c
  print ""