MediaWiki on CentOS 7

I’ve set up a few installations of MediaWiki in the past on different operating systems. It’s not the type of thing I do a lot, but when I do, It’s usually not too much trouble.

I was migrating an old MediaWiki installation from a Windows 2007 server to CentOS 7. I figured it would be pretty routine. It turns out there were a couple little hiccups along the way that I figure I should document to help me out in the future.

The first problem I ran into is that CentOS 7 doesn’t ship with a new enough version of of PHP for the current release of Media Wiki.

Uninstall built in version of PHP

yum erase php
yum erase php-common

Install yum repo for newer PHP release

rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -Uvh https://mirror.webtatic.com/yum/el7/webtatic-release.rpm

Install PHP packages

yum install php55w* --skip-broken

This allowed MediaWiki to run finally, and I was able to create a new LocalSettings.php. From there I restored the SQL database without any issue. Don’t forget to install MariaDB.

yum install mariadb-server mariadb

To import the old SQL data to the new DB, first creat a DB with the same name, and then pupolate it with the sql file.

# Get into mysql shell
mysql --password=********
# Create DB with same name as old DB
mysql> CREATE DATABASE my_wiki;
# Exit shell

# Import old SQL data to new DB
mysql --password=******** my_wiki < /wiki.sql

I also copied over the images directory from the old installation.

The next issue I had is permissions. SELinux wasn’t even on my mind, and provided all sort of permission problems. I ran the following commands on my html directory where MediaWiki is installed in an attempt to fix everything.

sudo chown apache:apache -R /var/www/html/

find . -type f -exec chmod 0644 {} \;
find . -type d -exec chmod 0755 {} \;

sudo chcon -t httpd_sys_content_t /var/www/html -R
sudo chcon -t httpd_sys_rw_content_t /var/www/html/images/ -R

At this point, MediaWiki is running, but I couldn’t see any images, or upload any files. I tweak settings in LocalSettings.php but to no avail. I enabled logs and better error messages by adding these lines to LocalSettings.php

$wgShowExceptionDetails = true;
$wgDebugLogFile = "/var/log/mediawiki/debug-{$wgDBname}.log";

When I upload a file using MediaWiki, I get the error:

Could not create directory "mwstore://local-backend/local-public/1/1e".

The log file shows the following error:

[FileOperation] mkdir(): Permission denied
[FileOperation] FSFileBackend::doPrepareInternal: cannot create directory /1/1e

I am unable to upload images, so I Google the error message and check my settings. Almost all of the sites talking about this error suggested permission issues with the images folder. I checked my permissions over, and over again with no luck. The rest fo the sites talked about having the $wgTmpDirectory set correctly, which I’m pretty sure I did.

$wgEnableUploads = true;
$wgUseImageMagick = true;
$wgImageMagickConvertCommand = "/usr/bin/convert";
$wgTmpDirectory = '/var/www/html/images/temp';

What bothers me is that the error message in the log looks like it is trying to mkdir at the root level, however MediaWiki switches a lot between url paths (relative) and absolute. It wasn’t clear. The next debugging step was using strace to see what the OS was doing. I attached strace to one of the apache processes and uploaded a file, and lo:

mkdir("/temp", 0777)                    = -1 EACCES (Permission denied)

It is trying to use my root directory as the images fodler location, even though I have a different tmp directory specified in my settings file.

It turns out that there are default setting in the img_auth.php file that don’t appear to be correct. They get automatically set based on the environment. I chose to explicitely set them in the LocalSettings.php document instead.

# Redefining Upload Directory and Path from img_auth.php
$wgUploadDirectory = '/var/www/html/images';
$wgUploadPath = '/images';

Finally. My images instantly start working and I can upload files again. Thank goodness! This wasn’t a difficult thing, but it is surprising that I couldn’t find any help on the internet for this specific problem. And since I don’t install/perform maintenance on MediaWikis very often, there is a lot of stuff I have to relearn. So here it is, some notes on this issue for my future self, in case I ever need it.

2014 Crypto Challenge

In 2014, the Khan Academy had a cryptography challenge, called the 2014 Cryptochallenge. It consisted of a narrative leading to clues that were progressively more difficult to decode. The idea was to introduce people unfamiliar with ciphers and cryptography to some basic ciphering methods through examples and clues. I stumbled upon it well after it was debuted, but still had fun going through the challenge.

Please note that while I am displaying the code I wrote to decode the messages, and describe the methodology to arrive at the answers, I will not reveal the decoded messages because I don’t want to ruin the fun for others.

It all starts with the final clue, actually. In the challenge, the goal is to decipher this message that is found:

clue_4b

After finding out about a pair of burglars in the vicinity, you find a ciphered communique. The goal is to decode the message by finding clues left behind the pair of villains who communicate using various ciphers. Although this main message is the one we want to decode, we have to work through their previous communications in order to figure out the method of encoding.

Clue 1

clue_1

The first clue is a simple cipher. Khan Academy gives us the clue that it is a Caeser Cipher, which is where the value of each letter is shifted n places. For example, we could encode the word “CAT” by shifting all the letters 3 values, and end up with “FDW”. Although easy to decode if you know the method, to someone who has no idea, it will just look like gibberish.

This is simple to brute force, as there are only 25 ways the shift the alphabet. I wrote the following C code to decipher the message in all 25 ways, and then I looked through the results for reasonable output.

void cipher(const char *input, char *output, int shift) {
    for (int i=0 ; i<=strlen(input) ; i++)
    if ('A' <= input[i] && input[i] <= 'Z') //Capitals
        output[i]=(input[i]-'A'+shift)%26+'A';
    else if ('a' <= input[i] && input[i] <= 'z') //Lowers
        output[i]=(input[i]-'a'+shift)%26+'a';
    else
    output[i]=input[i];
}

int main(int argc, char *argv[]) {
    const char *messageA="vwduwljudeehghyhubwklqjlfrxogilqgsohdvhuhwxuqdqbeoxhsulqwviruydxowdqgdodupghvljqedvhgrqzklfkedqnbrxghflghrqldpvhwwlqjxsvdihkrxvhfr";

    const char *messageB="gluhflishjrvbadvyyplkaohavbyjpwolypzavvdlhrvuuleatlzzhnlzdpajoavcpnlulyljpwolyrlfdvykpzaolopkkluzftivsvmklhaoputfmhcvypalovsilpuluk";

    for (int i=1 ; i<=26; i++) {
        char outputA[strlen(messageA)];
        cipher(messageA, outputA, i);
        printf("1/2, Offset: %d\n%s\n\n", i, outputA);

        char outputB[strlen(messageB)];
        cipher(messageB, outputB, i);
        printf("2/2, Offset: %d\n%s\n\n", i, outputB);
    }

    return 0;
}

Clue 2

clue_2

For this message, they give us the clue of Polyalphabetic Cipher. For this type of cipher, there is a key word that is repeated. Each letter of the alphabet is assigned a value based on it’s order (A=0, B=1… Z=25). The value of the key letter is added to the value of the message letter. If the value is greater than 25, you wrap the value around back to 0 and continue. From Wikipedia, we see this example:

Plaintext:  ATTACKATDAWN
Key:        LEMONLEMONLE
Ciphertext: LXFOPVEFRNHR

A (the 0th letter of the alphabet) plus L (the 11th), when added together would produce 11, which is L.
T (19th) plus E (4th) equals 23, which corresponds to X. Etc.

To reverse the cipher, all you need to do is subtract the values of the keyword from the scrambled text.

The key we will use was alluded to in the decoded first clue. For this clue, the cipher key was doubled; that is each letter of the word was used twice before advancing to the next letter. This is apparently a common practice. What tipped me off is that the thieves previous communication began with START, and ended with END. I presumed that they might do the same with this message. By subtracting the supposed value START from the scrambled message, I could confirm that the keyword was in fact doubled, and then plugged it into the program I wrote to decode the whole message.

void cipher(const char *key, const char *input, char *output) {
    int n = 0; // Key index

    for (int i=0 ; i<strlen(input) ; i++) {
        int char_enc = input[i] - 'A'; // Get encoded char value
        int char_key = key[n++ % strlen(key)] - 'A'; // get key char value
        int char_dec = (26 + char_enc - char_key) % 26; // Reverse vigenere cipher
        output[i] = char_dec + 'A'; // Store deciphered message
    }
}

int main(int argc, char *argv[]) {
    const char *code="SSKKUULLLL"; // Doubled letters for key is common
    const char *message="KLKBNQLCYTFYSRYUCOCPHGBDIZZFCMJWKUCHZYESWFOGMMETWWOSSDCHRZYLDSBWNYDEDNZWNEFYDTHTDDBOJICEMLUCDYGICCZHOADRZCYLWADSXPILPIECSKOMOLTEJTKMQQYMEHPMMJXYOLWPEEWJCKZNPCCPSVSXAUYODHALMRIOCWPELWBCNIYFXMWJCEMCYRAZDQLSOMDBFLJWNBIJXPDDSYOEHXPCESWTOXWBLEECSAXCNUETZYWFN";

    char output[strlen(message)];
    cipher(code, message, output);
    printf("%s\n", output);
    return 0;
}

Clue 3

clue_3

Clue three was a tough one. I had to look up the hint, which points to a Kahn Academy video called “visual telegraph.” Within the video, we learn of a character encoding method called “Polybius Square.” It explains how characters can be communicated using pairs of numbers that describe a row and column in a character lookup table. Interestingly, in this clue, the message is composed entirely of numbers! We can assume that the table of characters is five by five in size because we never see any numbers smaller than one or larger than five. Additionally, we can also assume that the message starts with the word START again. This can help us figure out the orientation of the lookup table. I made a program to take pairs of numbers and print out the message using the table, and it worked like a charm!

void cipher(const char *input, char *output) {
    char key[5][5]={{'A','F','K','P','U'},
                    {'B','G','L','Q','V'},
                    {'C','H','M','R','W'},
                    {'D','I','N','S','X'},
                    {'E','J','O','T','Y'}};

    for (int i=0 ; i<strlen(input)-2 ; i=i+2) {
        //Get coordinates from key grid
        printf("%c",key[input[i]-'1'][input[i+1]-'1']);

        //Save ascii to output
        output[i]=key[input[i]-'1'][input[i+1]-'1'];
    }
}

int main() {
    const char *message="44541134541123335344541242434244325141212311311353155442544244424344325141534354324234411125513553341342432253431144543453432251343142143251341253341215541534513351444411225144425442444415345123551543213451111311212351425431533321424351445315341434512542531544335154325341443043513544";

    char output[strlen(message)];
    cipher(message, output);
    printf("%s\n", output);
    return 0;
}

The Final Clue

The final clue stumped me for a while. I ended up getting busy and forgetting about the cryptochallenge. A few months later I found the files on my computer and resumed where I had left off.

Thanks to the previous messages, we found a “safe house” that yielded clues to how this final cipher was put together. Some key insights:

  • The message is converted to digits using a method similar to clue 3
  • There is a binary key which is obtained from a newspaper clipping where consonants are 0 and vowels are 1
  • The symbols represent 2 digits; 1 digit for the vertical/horizontal and 1 digit for the diagonal positions
clue_4c
Clue indicating glyph’s numerical values
clue_4d
Clue indicating glyph’s numerical precedence
clue_4f
Clue indicating visual telegraph layout, and process for ciphering
clue_newspaper
Clue used for cipher pad

There were a lot of unknowns, and a few rounds of guess-and-check. One of the key distractions was thinking that each symbol was a letter. That is not the case! From the clues above, we can basically figure out that the process goes like this:

  1. Each letter first gets transformed into a 2 digit number using a Polybius Square (like clue 3).
  2. From there, the 2 digit number (values maxing out at 66) gets transformed to a 6 bit binary number.
  3. The binary number then gets logically XOR’d with the binary numbers produced by the newspaper article. (This is called the pad)
  4. The final binary gets split up into 3 bit chunks which get encoded into the final glyphs we found.

To successfully decode the message, we need to reverse the process. Looking at the clues, I made a guess as to what value the glyph positions indicated. After running the program, I didn’t get meaningful output, so I reversed them, and voila, it worked.

This time, I wrote the program in Python since Python is so much more forgiving with types.

The program first takes the pad text (the newspaper article) and converts it into binary.

I converted the glyphs to digits by hand, which was the most time consuming part. I separated the first digit and second digit into two different arrays. Vertical/horizontal lines are called ‘messageA’ and diagonal lines are called ‘messageB’.

I then convert these values sequentially into binary to create a “messagebin” array. Each value in the array gets XOR’d against the corresponding position’s value in the newspaper binary pad array. The result is processed 3 bits at a time and converted into an array of digits.

The list of digits is processed in pairs, and letters are looked up from a table. There was a clue showing that the table of letters was a clockwise spiral starting from the lower left. By adding digits to the spiral at the end, it worked out to fit perfectly in a 6 by 6 arrangement.

After some finagling, the program successfully deciphered the message! I was then able to thwart the would-be criminals!

def bin(x):
    return ''.join(x & (1 << i) and '1' or '0' for i in range(1,-1,-1))

# Convert pad to binary
vowels = ["a","e","i","o","u","y"]
padbin = []
pad = "The whole-grain goodness of blue chip dividend stocks has its limits. Utility stocks, consumer staples, pipelines, telecoms, and real estate investment trusts have all lost ground over the past month, even while the broader market has been flat. With the bond market signalling an expectation of rising interest rates, the five-year rally for steady blue-chip dividend payers has stalled. Should you be scared if you own a lot of these stocks, either directly or through mutual funds or exchange-traded-funds? David Baskin, president of Baskin Financial Services, has a two-pronged answer: Keep your top-quality dividend stocks, but be prepared to follow his firm's example in trimming holdings in stocks such as TransCanada Corp., Keyera Corp., and Pembina Pipeline Corp. Lets have Mr Baskin run us through his thinking on divedend"

for c in pad:
    if ord(c) < ord('A') or ord(c) > ord('z'): continue
    if c.lower() in vowels :
        padbin.append(1)
    else:
        padbin.append(0)

# Message based on pictograms
#  2 0 3
#   \|/
# 3 - - 1    where straight lines are first digit (A)
#   /|\      and slanted lines are second digit (B)
#  1 2 0

messageA = [2,3,2,2,0,3,3,0,0,2,2,0,3,2,1,3,0,3,0,3,0,0,2,0,3,2,
            2,1,0,2,1,1,2,0,1,0,2,2,2,2,2,3,2,1,2,1,3,2,
            3,3,0,2,2,1,1,1,0,3,2,1,2,0,0,1,3,0,3,2,3,0,1,
            0,1,3,3,2,0,0,0,3,1,1,3,0,1,2,3,1,0,0,3,2,3,1,
            0,0,0,0,3,0,2,2,3,1,0,1,2,0,0,1,3,0,1,1,2,0,2,0,
            2,1,0,3,3,3,2,3,2,0,1,3,3,0,3,1,3,1,0,0,0,0,
            2,2,0,3,2,0,2,2,3,2,0,0,3,2,2,1,3,0]

messageB = [0,3,2,1,0,3,0,1,2,0,2,2,2,0,1,3,3,0,3,2,3,0,2,1,3,3,
            3,0,3,2,3,3,0,1,1,3,2,0,0,0,2,3,0,3,3,3,3,2,
            0,3,1,0,1,0,2,1,0,2,3,3,2,2,0,0,1,2,3,0,1,3,2,
            1,1,3,2,3,2,1,0,2,0,0,0,1,0,3,1,0,2,0,0,3,1,0,
            3,3,1,2,3,2,3,1,0,2,3,2,2,0,3,3,1,0,0,1,1,3,3,2,
            0,3,2,2,0,1,3,3,0,2,2,3,0,0,0,2,0,3,3,1,3,3,
            3,2,2,0,0,3,2,3,2,3,2,2,1,0,3,3,0,2]

# Convert pictogram elements to binary
messagebin = ""
for i in range(len(messageA)):
    messagebin += bin(messageA[i])
    messagebin += bin(messageB[i])

# XOR pad and message together in binary
charcoords = []
tmp = ""
i = 0
while i < len(padbin) and i < len(messagebin):
    tmp += str(padbin[i]^int(messagebin[i]))
    # Set length of binary digits here (Probably 3 digit?)
    if len(tmp) == 3:
        charcoords.append(int(tmp, 2))
        tmp = ""
    i += 1

# Convert message from binary to alphabet based on Polybius square
key = [["F","G","H","I","J","K"],
       ["E","X","Y","Z","0","L"],
       ["D","W","7","8","1","M"],
       ["C","V","6","9","2","N"],
       ["B","U","5","4","3","O"],
       ["A","T","S","R","Q","P"]]

for i in range(0, len(charcoords)-1, 2):
    if charcoords[i] > 5 or charcoords[i+1] > 5:
        print "?",
    else:
        print key[charcoords[i]][charcoords[i+1]],

This was a very enjoyable exercise, and taught me some basics about ciphers (and patience). I hope they put on similar exercises in the future!

Project Euler: 10

check

Problem 10 of Project Euler is to find the sum of all primes under 2,000,000.

This isn’t a hard problem in most languages. I have recently been going through and re-solving problems in various languages for the fun of it. Just for fun, I decided to solve some problems using Bash scripting. Bash has all the constructs of a real programming language, however, the main draw back for problems like this is that it’s slow. VERY slow. Well, lets see what we can do to speed this problem up.

Brute Force

The brute force approach is pretty simple. Loop through every number, n, less than 2,000,000, and try dividing it by every number from 2 to n-1. If it is divisible, break out of the loop and try the next number.

sum=2
for (( i=3; i<2000000; i++ )); do
    prime=1
    for (( d=2; d<$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

This solution has no optimization, but will still yield a result in a relatively small time-frame for most languages. This would take incredibly long to process primes smaller than 2,000,000. For a benchmark, I have run this through primes less than 2,000, and it took 6.430 seconds. We need to optimize.

Optimize

There are a lot of simple changes that we can do to get our compute times down. For instance, if we only loop through odd numbers, we have half as many to process.

Another classic technique to optimize a prime test is to check for factors less than or equal to the square root of the number. The reason this works is that if a factor of our number, n, is larger than the square root of n, we will find the other factor has to be smaller than the square root of n, and will thus be detected.

sum=2
for (( i=3; i<2000; i+=2 )); do
    prime=1
    for (( d=2; d*d<=$i; d++ )); do
    if [ $(($i % $d)) -eq 0 ]; then
        prime=0
        break;
    fi
    done
    if [ $prime -eq 1 ]; then
    sum=$(($sum + $i))
    fi
done
echo $sum

These two changes alone bring our compute time down to .319 seconds for primes under 2,000! I still wouldn’t recommend trying computing all primes under 2,000,000. I tried, and killed it after 3 days without a result!

Sieve of Eratosthenes

Prime sieves are a super fast way of computing prime numbers without having to do any division. This method is attributed to Eratosthenes of Cyrene, a Greek mathematician living about 200 BC. (You can read more about the history on Wikipedia).

Basically, you have an array of every number you want to check. Start by making each element of the array equal to 1, indicating it is a prime.  Go through the array, starting at two. Whenever you hit a value that is 1, mark all the multiples of that value as 0’s. So, we look at element 2, which is prime because it’s value is 1. element whose index is a multiple of 2 should then get marked as 0, since that cannot be prime. You then advance to the next element, and repeat until you’ve visited every number in the array.

limit=2000
primes[$limit]=0
primeList[0]=2
primeIndex=0
sum=0
for (( i=2; i<$limit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$limit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

echo $sum

Implementing this method definitely improves results for our benchmark of primes less than 2,000; .186 seconds! However, this approach does not scale up to the desired 2,000,000 because Bash arrays are super slow as they get larger. Even though this approach worked great in our benchmark, it will still take upwards of 3 days to complete for all the primes under 2,000,000.

Combination of Methods

By using a sieve method combined with our optimized method, I was able to finally get the correct answer in 34 minutes using a Bash script.

I used a sieve to generate the primes between 1 and sqrt(2000000). This was fast since the array wasn’t nearly as big. I then used the array of primes that I generated to check every odd number between 2 and 2,000,000.

Another shortcut to reduce computation is to make a second array containing the squares of each prime. This means that we don’t need to perform multiplication in each loop when we are checking for primality.

limit=2000000
sum=0

# Generate primes up to sqrt of 2000000 using sieve
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        sum=$(($sum+$i))
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

# Make sure primeLimit is an odd number
[[ $(($primeLimit % 2)) -eq 0 ]] && primeLimit=$(($primeLimit + 1))

# Use known primes from above to check each remaining number for primality
for (( i=$primeLimit; i<$limit; i+=2 )); do
    prime=1
    for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
        if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
            prime=0
            break
        fi
    done
    # Record!
    if [[ $prime -eq 1 ]]; then
        sum=$(($sum+$i))
    fi
done

echo $sum

34 minutes. Finally, a “reasonable” result (by some definitions of reasonable).

Multi-Processing

The last step I took was by splitting up the task of finding primes into multiple processes. This allowed all 8 of my cores to work on the solution. I moved my prime search method into a function that took a start and stopping point as parameters, and then echoed the sum for that portion back to a while loop that added them all together.

limit=2000000

# Generate primes up to sqrt of limit
primeLimit=$(echo "scale=0;sqrt($limit*2)" | bc)
primes[$primeLimit]=0
primeList[0]=2
primeListSquared[0]=4
primeIndex=0
for (( i=2; i<$primeLimit; i++ )); do
    if [[ ${primes[$i]} -eq \0 ]]; then
        primeList[$primeIndex]=$i
        primeListSquared[$primeIndex]=$i*$i
        primeIndex=$(($primeIndex+1))
        for (( n=$i; n<$primeLimit; n+=$i )); do
            primes[$n]=1
        done
    fi
done

function checkRange() {
    start=$1
    end=$2
    rangeSum=0

    # Make sure range starts at an odd number
    [[ $start -lt 2 ]] && start=3 && rangeSum=2
    [[ $(($start % 2)) -eq 0 ]] && start=$(($start + 1))

    # Use known primes to check all remaining numbers for primality
    for (( i=$start; i<$end; i+=2 )); do
        prime=1
        for (( p=0; ${primeListSquared[$p]}<=$i; p++ )); do
            if [[ $(($i % ${primeList[$p]})) -eq 0 ]]; then
                prime=0
                break
            fi
        done
        # Record!
        if [[ $prime -eq 1 ]]; then
            rangeSum=$(($rangeSum+$i))
        fi
    done
    echo $rangeSum
}

# Split up task into chunks, and check with multiple processes
sum=0
range=$(($limit / 50))
for (( start=0; start<$limit; start+=$range )); do
    checkRange $start $(($start+$range)) &
done | while read result; do
    sum=$(($sum+$result))
    echo $sum
done | tail -1

This method brought the compute time down to 3 minutes 22.075 seconds.

Conclusion

Project Euler gives the guideline that if your solution takes more than a minute, you’re probably doing it wrong. For this particular task though, I think 3 minutes is rather successful given the constraints of Bash. It was a fun project to try to optimize such a relatively simple algorithm within the constraints of this little scripting language.

If you can find any way to speed up the results, please let me know!

Visit the solution on my GitHub page.

© 2007-2015 Michael Caldwell