YQL, Python, and Yahoo Finance

YQL is the way to get information from Web Services using SQL-like queries. It also provides us a console where we can test our queries and generate the REST query. To see how it works, just go to the console page, and enter the following as the YQL statement:

And set the output to either “XML” or “JSON”, and click “Test”. I personally prefer JSON and will continue to use JSON throughout the example. I unchecked the “Diagnostics” and emptied the text field next to “JSON”.

The command will fetch the information related to the stock quote FFIV (F5 Networks, NASDAQ) from Yahoo Finance. Inside “Formatted View” window, you will see the result like this:

Below that text field, we can find the REST statement which we can use to send query to the server. It looks like this for our query:

This is how we can fetch stock information using YQL and receive information in JSON. Since this is a public data, we can directly send the REST, otherwise we need the API keys to access the data.

Let’s see how we can fetch the information via Python.

That should print the whole JSON response. We can use simplejson module to parse the result. It looks like this:

The python statements are pretty much self-explanatory.

Here’s another example from Yahoo, to get stock information from open data tables.

SSL and HTTP Basic Authentication

In general, when I want to force the browser to access certain part of my website via https if the request is made with http, I would put a .htaccess inside that web directory.

But when I want to protect the directory with HTTP Basic Auth, it creates double authentication. I’ll expand this section after I captures the headers.

As a quick workaround, I use this hack in .htaccess

Haiku OS

Haiku is another open source operating system, and IMO we can say it continues from where BeOS left off. I haven’t had a chance to try BeOS, but read about what it was supposed to do and some beautiful screenshots. BeOS was a closed source OS, and some loyal users tried to re-create the OS under OpenSource license.

And there came Haiku OS, an open source OS, and it released its alpha version on 09/2009. It is written in C++. The ISO image, as well as qemu/vmware images are now available to download. I just did a test run via their live CD image, and I would say I’m quite impressed. I wish it’d continue to R2 release soon.


A Walk in the Clouds

I’ve moved this site over to the cloud servers, by Rackspace from my previous shared host. Actually I was looking for a cloud server and cloud space so that I can play with Hadoop. I found Amazon EC servers and S3, but their services charges are expensive for me. While searching for alternatives, CloudServers caught my attention.

It is cheaper than Amazon services, but at the moment I don’t think I can test Hadoop on CloudServer and with CloudSpace. I’m using it more like a virtual private server, that gives me “root” access. The good thing is you can modify the resources as you wish, so I would say it’s quite scalable. You are also charged by hours (uptime). Rackspace will also charge you even if you turn off the machine. They will not charge after we have deleted the server. If you want to test something for a project, you can just subscribe for desired amount of memory and disk space. And delete the server after it’s been used. We will only be charged for those period. That’s the flexibility that I prefer.

I’ll see what I can do with my server, and update the blog again.

Kudos to ICA (Singapore)

Recently I’ve applied for entry visas for four of my relatives, about two or three weeks in advanced, and the visas have been approved by ICA.

Just one a day before the flight, one of my relatives has learnt that her daughter (2 years old) need to obtain visa and air tickets. I opened up the SAVE application website and made the application for the baby. Travel arrangements have already been made, and they were worried that the visa approval might be late, and they would have to rearrange the flights. It usually takes one business day to process. But to my surprise, ICA approved the visa within 3 hours after submission, and it has relieved all the worries of the families.

I really would like to give my heartfelt thank the officers at ICA who are working hard and understanding the need for urgency.

To those who want to submit visa application from the web:

  • Online visa application can be found on the home page of ICA, which is http://www.ica.gov.sg
  • Or do a seach on the google for “save singapore”, and follow the links.
  • Please have your Singpass ready. If you don’t have one, it’s a good idea to make a request at http://www.singpass.gov.sg.
  • The application will require a fee of S$30, which is payable by eNETS (Visa/Master)
  • Please have the applicants information ready. Most data can be found on the passport, plus current address in home country, educational qualification. And a digital photo.

Standard Chartered Bank, Singapore

Last week, I went to a branch to open XtraSaver account. As usual, their personal banking consultant asked me if I wanted to open their Supersalary account. I said no and told him I just wanted XtraSaver account. Then he tried to open an account for me, and suddenly he had to see his manager for some verification. About three minutes later, he told me that the burmese are disallowed to open an account. Well, they were not supposed to tell me this, and they had a right to disapprove my application without giving any reason. But it was good to hear their reason for the reject.

I just left the bank, and checked online website to see if they have any written information about this. I couldn’t find it and I sent them an email inquiring about account opening, stating my nationality and residential status.

A few days later, a girl called me and asked me to open an account at Six Battery Road. I was surprised, and she arranged me an appointment with the staff at the branch.

It was my fault I didn’t check thoroughly with her, and I blamed myself for trusting Standard Chartered Bank again. This time, I wasn’t told the reason, and I was only told due to some policies. It might the same reason. I’m not interested in their policies. All I know is Standard Chartered Bank just wasted my time and resources. The bank doesn’t seem to have connections/communication between their departments. Although I wasn’t allow to open the saving/checking account, he asked me if I was interested in Fixed Deposits. Huh. I’m done with that bank. I also should warn the nationality of Myanmar should not be wasting time going to the bank and open an account.

I understand that the burmese people can be rejected by any US or Europe financial institutions due to sanctions. If this is the case, my enquiry should be returned with negative reply so that I wouldn’t waste my time going to the standard chartered bank.

Cloudy

These days, something relates to software platforms that perform distributed computing on a cluster, catches my attention, and this led me to:

Hadoop platform is just the open-source implementation of Google’s Mapreduce.

I think the most basic ingredient for the this platform is distributed file system. Basically MapReduce framework works in two steps, it Maps and then it Reduces. At the end of the workflow it writes the output to a distributed file system (GFS for Google or HDFS for Hadoop). GFS is proprietory to Google, and it’s implemented in userspace as opposed to be in kernel. Please find Google Research Publication for GFS here.

Some people say that the implementation is low-level and some tried to add more layer to original implementations. For example, Facebook layered Hive on Hadoop engine.

MapReduce framework is supposed to handle huge amount of data, so in general we will need a data structure that can hold/process this amount of data comfortably. Google implemented BigTable, and HBase is the open-source alternative from Hadoop.

I think I’ll look into Hadoop (Java implementation) and Qt Concurrent (Qt C++ implementation) of MapReduce.

Last.fm’s bashreduce look interesting, too.

Short notes on Linux Libraries

Libraries are the compiled code that is usually incorporated into a programer at a later time.

  • Three types: Static Libraries, Shared Libraries, and Dynamically Loaded Libraries
  • Static libraries are a collection of normal object files.
  • They usually ends with “.a”.
  • Collection is created with “ar” command.
  • Shared libraries are loaded at program start-up and shared between programs.
  • Dynamically loaded libraries can be loaded and used at any time while a program is running.
  • DL libraries are not really in any kind of library format.
  • Both static and shared libraries can be used as DL libraries.

Linux Processes and CPU Performance

In Linux, a process can be either:

  • runnable, or
  • blocked (awaiting some events to complete)

When it’s runnable, the process is in competition with other processes for CPU time. A runnable process may or may not be consuming CPU time. It is the CPU scheduler that decides which process to run next from the runnable processes list. The processes form a line, known as run queue, when they are waiting to use the CPU.

When it’s blocked, it may mean it’s waiting for data from IO device or the results of a system call.

System usually shows the load by totalling the running processes and the runnable processes.

Multitasking
When it comes to multitasking, the OS can be:

  • cooperative multitasking, or
  • preemptive multitasking

In preemptive multitasking, scheduler gives the processes time slices for CPU. The process will be involuntarily suspended after it has consumes the allocated time. It prevents one process from monopolizing the available CPU time.

In cooperative multitasking, the process will not stop running until it is voluntary. When it suspends itself, it is called yielding. The scheduler cannot make decision how long the process should run.

Scheduler
Starting from kernel 2.5, Linux gets itself a new scheduler, O(1). Now it’s been replaced with CFS, as I’ve written about it in my earlier posts.

Tools to view the CPU performance
I usually use these tools to check:

  • vmstat
  • top

Those tools are quite basic, yet are able to produce pretty good information, and they come with almost every distro.

vmstat, I would check the number interrupts fired (in), the number of context switches (cs), as well as CPU utilization such as User (us), System (sy), Idle (id). I expect to see lower “cs” than “in”. I’ll try to explain the context switches and the interrupts in my future posts. For the time being, kindly google for them.

top, version 3 produces more stats. We can check the states of the processes, as well as the user cpu stats, system cpu stats (softirq, iowait, irq).

Linux Package Management

I always like to play around with new distros that I can find from distrowatch.com. Gentoo being my primary distribution, I have Arch as my second distribution. Arch also offers the flexible system. Almost every linux systems are the same in functionality and the features, and from as far as I can see, the only difference arethat how they implement the front-ends, and how they manage the packages.

With Gentoo, I am not being fancied by easy or pretty front-ends (you can say Gentoo text output is quite colorful), but I’m more interested in how to add/remove/update new software package onto the system. I don’t think anyone will content with the packages that comes with the distro. Package Management offers various ways to install/remove the software as well as update one package or the whole system. It also allows us to select software repositories which we download the packages from. These are some package management systems that usually tied to a distro and its variants:

apt-get for Debian, Ubuntu, etc.
emerge for Gentoo, Sabayon
yum for Fedora, etc.

For more information about the package management systems for linux distributions, you can always refer to those good documents: