- BNotions uses Hadoop and Mahout for their Vu mobile app. Vu is a smart news reader that recommends articles based on article similarity to things you like as well as user similarity to you.
- Graph theory and graph processing algos are helpful for this work.
- Likes, dislikes, reads, skips are the most important input for their machine learning. Also relevant: user preference for breadth of topics vs depth; recency; natural language processing to extract topic keyword and organize topics by similarity.
- Redis is used for transient storage. It has some useful ops above just key-value. They use S3 as a data warehouse, but it could just as easily be HDFS.
- They use Amazon EMR as the Hadoop cluster. EMR constrains technology choice. For example, harder to use HDFS, hence Redis instead. They are evaluating HBase as an alternative — performance differences not relevant for use case.
- They don’t currently adjust for article length as factor in recommendations.
- They use a third party API for NLP, not Hadoop specidically. Only once per article, so not a bottleneck yet. Not happy with NLP quality, though.
- Cascalog/JCascalog to query the Hadoop data using Scala.
- Scalability is limited by cost, not capability. May switch from EMR to dedicated cluster, etc as cost grows.
- Data science 10%, engineering 90%. Stock algos for rapid application development, tweak after. Deployment (my own specialty!) can be painful.
- Service-oriented architecture (SOA) helps with deployment. Simplifies components, but adds a devops layer. Jenkins is used to automate builds.
By default, Bash treats uninitialized variables the same way as Perl — they are blank strings. If you want them treated more like Python, you can issue the following command in your bash script:
You will then start seeing warning messages like the following:
./my_script.sh: line 419: FOO_BAR: unbound variable
Note that this mean you can’t check for the non-existence of environment variables with a simple [[ -z "$ENVIRONMENT_VARIABLE" ]]. Instead, you could do something like the following:
[[ $( set | grep "ENVIRONMENT_VARIABLE=" | wc -l ) -lt 1 ]]
PuTTY or one of its forks is a standard tool for administering Unix and Linux machines from Windows. It provides SSH connectivity for command line access, as well as keypair management for compatible programs like WinSCP.
Unfortunately, PuTTY has some terrible defaults. For example, it limits itself to 200 lines of scrollback by default, which guarantees that you’ll lose some history in most SSH sessions.
There’s a way to fix this and other defaults.
Then, save the new default settings:
PuTTY will now have a sensible defaults whenever you’re connecting to a random server.
I had to harden my WordPress some time ago. Here are the guides I followed when hardening my installation:
Additional steps I’ve taken today:
- Install the Limit Login Attempts plugin
Now that I’ve moved from Google Reader to Fever, I’d like to reduce my reliance on other Google services. Switching from Google search to Bing is pretty easy, but I’m on much less sure ground when it comes to replacing Gmail.
- Paid service (If you aren’t paying, you are the product, not the customer)
- Search-driven interface
- Reasonable limits on message and mailbox size
I’ve heard of HushMail. Is there anything else worthwhile?
Edit: HushMail is a no-go. It doesn’t have a way to set up a filter or rule to automatically file incoming mail.
The perfidious vandals at Google will kill Google Reader on July 1, 2013. Accordingly, it is time to wean ourselves off Google dependence and find an alternative. Perhaps this will prove to be a good thing, as Google Reader has strangled RSS innovation through its monopolist, good-enough position much like IE6 once strangled the web.
NewsBlur and The Old Reader are two services I’ve seen mentioned. Unfortunately, both are currently buckling under the load of my fellow reader-heads fleeing the sinking Google ship. (Edit: More alternatives are listed in the roundups at Kikolani and LifeHacker.)
Accordingly, I’ve just installed Fever on my shared hosting. I’m not going to recommend my hosting provider as my account is based on a grandfathered plan, but Dreamhost is popular. The more technically inclined may want to spin up an Amazon EC2 instance.
Fever is a PHP/MySQL web application. It’s very easy to install, assuming you have access to a web server. It costs a one-time $30, which is likely why it is very easy to install. It also comes with lots of really neat features that innovate beyond what Google Reader ever did, none of which I care about.
Migrate from Google Reader to Fever
- Log into Google Takeout.
- Download your Google Reader data.
- Unzip it. The subscriptions.xml file contains your feeds and folders in standard OPML format.
- Download the Fever Server Compatibility Suite
- Upload it to your server and let it verify compatibility.
- Is it compatible? Great! Paypal over the $30.
- Copy the activation code from the email in your inbox into the wizard.
- Let the wizard install Fever for you, importing your precious subscriptions.xml.
- Fever will display a brisk progress bar as it quickly processes your myriad feeds.
- Oh, you may want to enable the unread messages count:
Sometimes when my VPN connection to work goes down, certain applications that rely on intranet servers (e.g. Lotus Notes, Lotus Sametime) become unable to reconnect to their servers even after I reconnect to VPN. This is due to the operating system’s DNS lookup cache reusing the failed hostname lookup from when VPN was down rather than doing a fresh hostname lookup now that there is a fresh VPN connection.
On Windows, you can fix the issue by opening up the Command Prompt as Administrator and running the following command: