Running WordPress with Nginx on ArchLinux

I just moved this blog over to Nginx server from Apache httpd server. I’m pretty satisfied with the overall result. I had to take some time to convert my current httpd configuration over to Nginx, since the new server does not support .htaccess or mod_redirects. This is my current requirements for move over:

  • The site is available on both HTTP and HTTPS.
  • “wp-admin” session is forced to use SSL.
  • I have “quicklook” (to check my server status) and “webalizer” directories under the blog, and they are protected by HTTP BasicAuth.
  • HTTP BasicAuth is to be carried out via SSL.
  • To enforce gzip compression on HTTP connection while disabling it on HTTPS.

Basically I followed the ArchLinux wiki for the implementation, and I will briefly describe what I did.

Nginx (pronounced “Engine X”) is a light-weight open-source http server. Its low resource consumption is the primary purpose for the moveover, and it’s suitable for my server on the cloud.

Firstly, I needed to install the package. And installed “php-cgi” package which is used to provide fastcgi interface to PHP.

~$ sudo pacman -S nginx php-cgi

Then, I configured fastcgi daemon, and add it to rc.d. So the following script was needed to be added to /etc/rc.d as “fastcgi”

#!/bin/bash

. /etc/rc.conf
. /etc/rc.d/functions

case "$1" in
  start)
	stat_busy 'Starting Fastcgi Server'
	if /usr/bin/php-cgi -b 127.0.0.1:9000 &
	then
		add_daemon fastcgi
		stat_done
	else
		stat_fail	fi
	fi
	;;
  stop)
	stat_busy 'Stopping Fastcgi Server'
	[ -e /var/run/daemons/fastcgi ] && kill $(pidof php-cgi) &> /dev/null;
	if [ $? -gt 0 ]; then 
		stat_fail
	else
		rm_daemon fastcgi
		stat_done
	fi
	;;
  restart)
	$0 stop
	$0 start
	;;
  *)
	echo "Usage: $0 {start|stop|restart}"
sac

And I gave it an executable permission:

~$ sudo chmod +x /etc/rc.d/fastcgi

What that script does is to have php-cgi process to listen on port 9000. Now, we would be able to start/stop/restart the daemon with “sudo /etc/rc.d/fastcgi start”. But the script will not be automatically started when the unit is rebooted. It needs to be added to /etc/rc.conf. So I added fastcgi to the rc.conf. Here’s the snippet.

...
DAEMONS=(syslog-ng ... fastcgi nginx ...)
...

Then I edited the /etc/nginx/conf/nginx.conf file to point to my blog physical directory. We need to add two servers, one for HTTP and one for HTTPS. This is my sample configuration for server myfineblog.local

    server {
        listen       80;
        server_name  myfineblog.local;
        access_log      /var/log/httpd/myfineblog.local-access.log;
        error_log       /var/log/httpd/myfineblog.local-error.log;
        root            /srv/http/myfineblog;
        gzip            on;

        location ~ ^/(wp-admin|quicklook|webalizer)/* {
            rewrite ^/(.*) https://myfineblog.local/$1 permanent;
        }

        location / {
            index  index.html index.htm index.php;
            root                /srv/http/myfineblog;
            if (!-e $request_filename) {
                rewrite ^.+/?(/wp-.*) $1 last;
                rewrite &.+/?(/.*\.php)$ $1 last;
                rewrite ^(.+)$ /index.php?q=$1 last;
            }
        }

        location ~ \.php$ {
            fastcgi_pass   127.0.0.1:9000;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  /srv/http/myfineblog/$fastcgi_script_name;
            include        fastcgi_params;
        }
    }

Line 3 defines the server name (so we can configure virtual hosts based on names).
Line 4-5 defines the access logs for this web site.
Line 6 is the physical location of the web site on local system.
Line 7 is used to turn on gzip.
Line 9-11 is redirect to SSL by sending HTTP redirect if the uri contains any of wp-admin or quicklook or webalizer)
Line 13-21 is the definition of website directory and an equivalent scripts for Apache’s mod_rewrite.
Line 23-29 is the connection to the fastcgi daemon we configured above. It is *important* to change the SCRIPT_FILENAME variable to suit the real physical path of the wordpress script.

To enable SSL server, I assume we already have the certificate and key for the website. The configuration looks the same but it will have SSL options enabled and Basic HTTPAuth section for a certain directories.

    server {
        listen          443;
        server_name     myfineblog.local;
        ssl                     on;
        ssl_certificate         /etc/ssl/certs/myfineblog.crt;
        ssl_certificate_key     /etc/ssl/private/myfineblog.key;
        ssl_session_timeout     5m;
        ssl_ciphers             HIGH:MEDIUM;
        ssl_prefer_server_ciphers       on;
        ssl_protocols           SSLv3 TLSv1;

        root                    /srv/http/myfineblog;
        access_log              /var/log/httpd/myfineblog.local-ssl_access.log;
        error_log               /var/log/httpd/myfineblog.local-ssl_error.log debug;
        gzip                    off;

        location ~ ^/(quicklook|webalizer)/* {
                auth_basic      "Private Section";
                auth_basic_user_file    /srv/http/myfineblog/.htpasswd;
        }
        location / {
                index   index.html index.htm index.php;
                root    /srv/http/myfineblog;
        }
        location ~ \.php$ {
            fastcgi_pass   127.0.0.1:9000;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  /srv/http/myfineblog/$fastcgi_script_name;
            fastcgi_param  HTTPS on;
            include        fastcgi_params;
        }
    }

This configuration turned on “SSL”, disabling SSLv2 and weak ciphers. It enabled HTTP Basic Authentication for two directories. I disabled gzip on SSL stream. And it tells the fastcgi server to turn HTTPS on.

And started the daemons with “/etc/rc.d/fastcgi start” and “/etc/rc.d/nginx start”.

YQL, Python, and Yahoo Finance

YQL is the way to get information from Web Services using SQL-like queries. It also provides us a console where we can test our queries and generate the REST query. To see how it works, just go to the console page, and enter the following as the YQL statement:

select * from yahoo.finance.quotes where symbol='FFIV'

And set the output to either “XML” or “JSON”, and click “Test”. I personally prefer JSON and will continue to use JSON throughout the example. I unchecked the “Diagnostics” and emptied the text field next to “JSON”.

The command will fetch the information related to the stock quote FFIV (F5 Networks, NASDAQ) from Yahoo Finance. Inside “Formatted View” window, you will see the result like this:

{
 'query': {
  'count': '1',
  'created': '2010-07-22T04:59:35Z',
  'lang': 'en-US',
  'results': {
   'quote': {
    'symbol': 'FFIV',
    'Ask': '80.00',
    'AverageDailyVolume': '1699250',
    'Bid': '77.63',
    'AskRealtime': '80.00',
    'BidRealtime': '77.63',
    'BookValue': '11.167',
    'Change_PercentChange': '-3.68 - -4.79%',
    'Change': '-3.68',
    'Commission': null,
    'ChangeRealtime': '-3.68',
    'AfterHoursChangeRealtime': 'N/A - N/A',
    'DividendShare': '0.00',
    'LastTradeDate': '7/21/2010',
    'TradeDate': null,
    'EarningsShare': '1.412',
    'ErrorIndicationreturnedforsymbolchangedinvalid': 'N/A',
    'EPSEstimateCurrentYear': '2.29',
    'EPSEstimateNextYear': '2.71',
    'EPSEstimateNextQuarter': '0.62',
    'DaysLow': '72.48',
    'DaysHigh': '77.74',
    'YearLow': '33.43',
    'YearHigh': '79.21',
    'HoldingsGainPercent': '- - -',
    'AnnualizedGain': '-',
    'HoldingsGain': null,
    'HoldingsGainPercentRealtime': 'N/A - N/A',
    'HoldingsGainRealtime': null,
    'MoreInfo': 'cnsprmiIed',
    'OrderBookRealtime': 'N/A',
    'MarketCapitalization': '5.859B',
    'MarketCapRealtime': null,
    'EBITDA': '188.9M',
    'ChangeFromYearLow': '+39.68',
    'PercentChangeFromYearLow': '+118.70%',
    'LastTradeRealtimeWithTime': 'N/A - 73.11',
    'ChangePercentRealtime': 'N/A - -4.79%',
    'ChangeFromYearHigh': '-6.10',
    'PercebtChangeFromYearHigh': '-7.70%',
    'LastTradeWithTime': 'Jul 21 - 73.11',
    'LastTradePriceOnly': '73.11',
    'HighLimit': null,
    'LowLimit': null,
    'DaysRange': '72.48 - 77.74',
    'DaysRangeRealtime': 'N/A - N/A',
    'FiftydayMovingAverage': '72.3831',
    'TwoHundreddayMovingAverage': '63.8515',
    'ChangeFromTwoHundreddayMovingAverage': '+9.2585',
    'PercentChangeFromTwoHundreddayMovingAverage': '+14.50%',
    'ChangeFromFiftydayMovingAverage': '+0.7269',
    'PercentChangeFromFiftydayMovingAverage': '+1.00%',
    'Name': 'F5 Networks, Inc.',
    'Notes': '-',
    'Open': null,
    'PreviousClose': '76.79',
    'PricePaid': null,
    'ChangeinPercent': '-4.79%',
    'PriceSales': '8.42',
    'PriceBook': '6.88',
    'ExDividendDate': 'N/A',
    'PERatio': '54.38',
    'DividendPayDate': 'N/A',
    'PERatioRealtime': null,
    'PEGRatio': '1.65',
    'PriceEPSEstimateCurrentYear': '33.53',
    'PriceEPSEstimateNextYear': '28.34',
    'Symbol': 'FFIV',
    'SharesOwned': null,
    'ShortRatio': '3.60',
    'LastTradeTime': '4:00pm',
    'TickerTrend': ' -===== ',
    'OneyrTargetPrice': '75.11',
    'Volume': '3213004',
    'HoldingsValue': null,
    'HoldingsValueRealtime': null,
    'YearRange': '33.43 - 79.21',
    'DaysValueChange': '- - -4.79%',
    'DaysValueChangeRealtime': 'N/A - N/A',
    'StockExchange': 'NasdaqNM',
    'DividendYield': null,
    'PercentChange': '-4.79%'
   }
  }
 }
}

Below that text field, we can find the REST statement which we can use to send query to the server. It looks like this for our query:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%3D'FFIV'%0A%09%09&format=json&env=http%3A%2F%2Fdatatables.org%2Falltables.env&callback=

This is how we can fetch stock information using YQL and receive information in JSON. Since this is a public data, we can directly send the REST, otherwise we need the API keys to access the data.

Let’s see how we can fetch the information via Python.

>>> import urllib2
>>> result = urllib2.urlopen('http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%3D'FFIV'%0A%09%09&format=json&env=http%3A%2F%2Fdatatables.org%2Falltables.env&callback=').read()
>>>  print result.read()

That should print the whole JSON response. We can use simplejson module to parse the result. It looks like this:

>>> import simplejson
>>> data = simplejson.loads(result.read())
>>> data['query']['results']['quote']['LastTradePriceOnly']
'2.35'

The python statements are pretty much self-explanatory.

Here’s another example from Yahoo, to get stock information from open data tables.

http://www.yqlblog.net/blog/2009/06/02/getting-stock-information-with-yql-and-open-data-tables/

SSL and HTTP Basic Authentication

In general, when I want to force the browser to access certain part of my website via https if the request is made with http, I would put a .htaccess inside that web directory.

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}

But when I want to protect the directory with HTTP Basic Auth, it creates double authentication. I’ll expand this section after I captures the headers.

As a quick workaround, I use this hack in .htaccess

SSLOptions +StrictRequire
SSLRequireSSL
AuthUserFile /home/minn/.htpasswd
AuthType Basic
AuthName "Private Section"
Require valid-user
ErrorDocument 403 https://www.minnmyatsoe.com/private/

Haiku OS

Haiku is another open source operating system, and IMO we can say it continues from where BeOS left off. I haven’t had a chance to try BeOS, but read about what it was supposed to do and some beautiful screenshots. BeOS was a closed source OS, and some loyal users tried to re-create the OS under OpenSource license.

And there came Haiku OS, an open source OS, and it released its alpha version on 09/2009. It is written in C++. The ISO image, as well as qemu/vmware images are now available to download. I just did a test run via their live CD image, and I would say I’m quite impressed. I wish it’d continue to R2 release soon.


A Walk in the Clouds

I’ve moved this site over to the cloud servers, by Rackspace from my previous shared host. Actually I was looking for a cloud server and cloud space so that I can play with Hadoop. I found Amazon EC servers and S3, but their services charges are expensive for me. While searching for alternatives, CloudServers caught my attention.

It is cheaper than Amazon services, but at the moment I don’t think I can test Hadoop on CloudServer and with CloudSpace. I’m using it more like a virtual private server, that gives me “root” access. The good thing is you can modify the resources as you wish, so I would say it’s quite scalable. You are also charged by hours (uptime). Rackspace will also charge you even if you turn off the machine. They will not charge after we have deleted the server. If you want to test something for a project, you can just subscribe for desired amount of memory and disk space. And delete the server after it’s been used. We will only be charged for those period. That’s the flexibility that I prefer.

I’ll see what I can do with my server, and update the blog again.

Kudos to ICA (Singapore)

Recently I’ve applied for entry visas for four of my relatives, about two or three weeks in advanced, and the visas have been approved by ICA.

Just one a day before the flight, one of my relatives has learnt that her daughter (2 years old) need to obtain visa and air tickets. I opened up the SAVE application website and made the application for the baby. Travel arrangements have already been made, and they were worried that the visa approval might be late, and they would have to rearrange the flights. It usually takes one business day to process. But to my surprise, ICA approved the visa within 3 hours after submission, and it has relieved all the worries of the families.

I really would like to give my heartfelt thank the officers at ICA who are working hard and understanding the need for urgency.

To those who want to submit visa application from the web:

  • Online visa application can be found on the home page of ICA, which is http://www.ica.gov.sg
  • Or do a seach on the google for “save singapore”, and follow the links.
  • Please have your Singpass ready. If you don’t have one, it’s a good idea to make a request at http://www.singpass.gov.sg.
  • The application will require a fee of S$30, which is payable by eNETS (Visa/Master)
  • Please have the applicants information ready. Most data can be found on the passport, plus current address in home country, educational qualification. And a digital photo.

Standard Chartered Bank, Singapore

Last week, I went to a branch to open XtraSaver account. As usual, their personal banking consultant asked me if I wanted to open their Supersalary account. I said no and told him I just wanted XtraSaver account. Then he tried to open an account for me, and suddenly he had to see his manager for some verification. About three minutes later, he told me that the burmese are disallowed to open an account. Well, they were not supposed to tell me this, and they had a right to disapprove my application without giving any reason. But it was good to hear their reason for the reject.

I just left the bank, and checked online website to see if they have any written information about this. I couldn’t find it and I sent them an email inquiring about account opening, stating my nationality and residential status.

A few days later, a girl called me and asked me to open an account at Six Battery Road. I was surprised, and she arranged me an appointment with the staff at the branch.

It was my fault I didn’t check thoroughly with her, and I blamed myself for trusting Standard Chartered Bank again. This time, I wasn’t told the reason, and I was only told due to some policies. It might the same reason. I’m not interested in their policies. All I know is Standard Chartered Bank just wasted my time and resources. The bank doesn’t seem to have connections/communication between their departments. Although I wasn’t allow to open the saving/checking account, he asked me if I was interested in Fixed Deposits. Huh. I’m done with that bank. I also should warn the nationality of Myanmar should not be wasting time going to the bank and open an account.

I understand that the burmese people can be rejected by any US or Europe financial institutions due to sanctions. If this is the case, my enquiry should be returned with negative reply so that I wouldn’t waste my time going to the standard chartered bank.

Cloudy

These days, something relates to software platforms that perform distributed computing on a cluster, catches my attention, and this led me to:

Hadoop platform is just the open-source implementation of Google’s Mapreduce.

I think the most basic ingredient for the this platform is distributed file system. Basically MapReduce framework works in two steps, it Maps and then it Reduces. At the end of the workflow it writes the output to a distributed file system (GFS for Google or HDFS for Hadoop). GFS is proprietory to Google, and it’s implemented in userspace as opposed to be in kernel. Please find Google Research Publication for GFS here.

Some people say that the implementation is low-level and some tried to add more layer to original implementations. For example, Facebook layered Hive on Hadoop engine.

MapReduce framework is supposed to handle huge amount of data, so in general we will need a data structure that can hold/process this amount of data comfortably. Google implemented BigTable, and HBase is the open-source alternative from Hadoop.

I think I’ll look into Hadoop (Java implementation) and Qt Concurrent (Qt C++ implementation) of MapReduce.

Last.fm’s bashreduce look interesting, too.

Short notes on Linux Libraries

Libraries are the compiled code that is usually incorporated into a programer at a later time.

  • Three types: Static Libraries, Shared Libraries, and Dynamically Loaded Libraries
  • Static libraries are a collection of normal object files.
  • They usually ends with “.a”.
  • Collection is created with “ar” command.
  • Shared libraries are loaded at program start-up and shared between programs.
  • Dynamically loaded libraries can be loaded and used at any time while a program is running.
  • DL libraries are not really in any kind of library format.
  • Both static and shared libraries can be used as DL libraries.

Linux Processes and CPU Performance

In Linux, a process can be either:

  • runnable, or
  • blocked (awaiting some events to complete)

When it’s runnable, the process is in competition with other processes for CPU time. A runnable process may or may not be consuming CPU time. It is the CPU scheduler that decides which process to run next from the runnable processes list. The processes form a line, known as run queue, when they are waiting to use the CPU.

When it’s blocked, it may mean it’s waiting for data from IO device or the results of a system call.

System usually shows the load by totalling the running processes and the runnable processes.

Multitasking
When it comes to multitasking, the OS can be:

  • cooperative multitasking, or
  • preemptive multitasking

In preemptive multitasking, scheduler gives the processes time slices for CPU. The process will be involuntarily suspended after it has consumes the allocated time. It prevents one process from monopolizing the available CPU time.

In cooperative multitasking, the process will not stop running until it is voluntary. When it suspends itself, it is called yielding. The scheduler cannot make decision how long the process should run.

Scheduler
Starting from kernel 2.5, Linux gets itself a new scheduler, O(1). Now it’s been replaced with CFS, as I’ve written about it in my earlier posts.

Tools to view the CPU performance
I usually use these tools to check:

  • vmstat
  • top

Those tools are quite basic, yet are able to produce pretty good information, and they come with almost every distro.

vmstat, I would check the number interrupts fired (in), the number of context switches (cs), as well as CPU utilization such as User (us), System (sy), Idle (id). I expect to see lower “cs” than “in”. I’ll try to explain the context switches and the interrupts in my future posts. For the time being, kindly google for them.

top, version 3 produces more stats. We can check the states of the processes, as well as the user cpu stats, system cpu stats (softirq, iowait, irq).