At hoppr we get checkins over SMS and USSD, soon our apps will launch across mediums and then we will open up other access mediums.
At hoppr, we particularly pay attention towards USSD checkins - because we must repond to users pretty quick - within 200ms - and since this is the time for overall checkin, other access mediums enjoy this fast processing across. To make this work, we have been doing many things to improve upon checkins, sms sending and responding to users.
Major decision we took was to re-write our core app in Ruby - which eventually means that we are using rails and sinatra. Our core business logic is implemented in rails with various services handling different input mediums and calling main core services. We had to do this because old system was full of JSPs and every telco application (we have 6 of them) - had their own databases, this created problems related to Mobile Number portability and logic being repeated all over the place. Also this code was not unit-tested since it was organically grown over period of last 18 months. Rewrite was a big challenge and moving 2mn+ users was a challenge to be tackled in a nice way without loosing any customer information.
We wanted a futuristic service and we wanted to rapidly improve our database, sticking to a rdbms would have been a difficult task since database migrations for a schema or 2mn+ entries would have taken a longer time.. having worked on mongodb in my previous company where I had developed a system with 20mn+ records for online store with images and tags - it was easy for us to choose MongoDB?
NoSQL to rescue - way to go Mongo
The major advantage of moving to MongoDB was to not deal with any migrations in Rails. Migrations can be head-ache for large production dbs, our DB was evolving from a loosely coupled, some how put together tables to proper domain model. We took a call - not to embed too many documents and try to keep it like relational database with mongo documents - we lost flexibility of map-reduce, but it was good decisions for ETL. Given no migrations we could deploy our code - sometimes 6 - 8 times in day without worrying about big table migrations, the database is huge, with enormous amount of checkins coming from wap, campaigns, coupon downloads, ussd and sms - we could deal with pretty nicely. We had few problems with Mongo around performance and then we realized that our indexes were not optimized for the task and later mongo was breeze to work through...
The major problem we faced for not having multi-master replication in mongodb, but later we implemented a custom service for Rest over JMS for replicating domain objects across multiple WAN environment* (side not on environment down)
Not many people use jruby with torquebox, I have always been fan of JRuby - it does give us speed over time and also gives us native advantage of Java eco-system. Easy deployment came as an added advantage with JRuby, our deployments became like one single line shell script telling torquebox deploy our services pretty quickly. We process huge load of SMS every day across six telcos for checkins in, doing sychronous checkin would have been nightmare and torquebox's hornetq scaled up like crazy for us and we were able to deliver messages pretty quickly.
For every asynchronous processing for tasks we moved to JMS. At hoppr, when a checkin happens it goes through multiple stages - for example, when a user checks into a place with offer, we need to send him 2 - 3 smss based on offer type - for example, if you checkin to barista, we would be checking you in, in addition to that we would send you an offer coupon code. since every checkin gets you points, we need calculate your ranks and points, there are few points trigger in hoppr, which entitle you for certain awards if you cross that benchmark, for example crossing 100,000 points we send you a coupon code some nice service free! So processing all these in serial way would have caused delay in system - we check you in and rest all gets taken care by JMS and message processing happens based on triggers and notification - we are able to handle around 100K+ messages per hour using this technique. This also reduced our synchronus processing for first sms to 50 ms from 500ms.
Sinatra and Rails
As I stated earlier, our core logic is implemented in rails, but other services we have been using Sinatra, hoppr has multiple modules using sinatra as web service for each end point - for example, when a sms arrives from user, its received by a service called receptionist, this service, for all practical purposes, acts as entry and end point for hoppr core service, apart from that we have implemented dnd, image service, subscription service, points trigger service, wap service, coupon service for all other purposes. Sinatra is beautiful and small framework with not much of overload, also, all services talk to core over http, this helps us deploy multiple instances if required to scale up system pretty quickly.
Mysql - Percona
For all our services, we use percona mysql variant as logging and reporting db, every call is recorded as log, we do extensive logging to figure out what's causing problem and also, since services live on separate instance, it was just easy to do rdbms logging to enable graphs and everything else - this way we were able to scale and generate graphs for loggin.
While mongo is great and efficient, looking up users, places, offers, points in db everytime was a troublesome feature, doing lookups and updating these models in database was exepensive, for example we update user points everytime, this results in 1000s of users getting updated every minute and in turn, locking down database, in turn slowing down system with few milliseconds, but since we needed to keep tab on USSD checkins and it must not cross 200ms boundry for us, we could not do without a near cache - simple key-value hash in redis has improved our performance many folds - lookups have been down to 5 ms and we could store live checkin counters - foursquare does similar stuff on mongodb and we did the same with redis - resulted in much faster throughput. We not only used redis here, but we also used redis for caching multiple objects and tried to not to lookup database everytime, it also became kind of shared memory across all services! Redis improved our service performance by magnitude of 10 - 50 times at times.
Redis is also used with node.js as pub-sub client for putting up live checkin counters with using raephal as graph library, its fascinating to see live checkin counts ticking minute by minute, showing us details on hours, daily and overall checkins!
Well, if you are web service, you need to have a little bit of php in your place! PHP is not part of our core system, but it helped us migrate from old system - the major advantage of PHP is that you can connect to multiple DBs in a single line and migrate databases, we used PHP to inter-connect mongo and old mysql system and migrate useful data as places, users, merchants and many more across db ETL tasks! for example a 15 line php code, looked up places in old system and inserted them in new system as mongodb document - it ran as scheduled cron job every 15 minutes, and helped us through migration period of about 8 weeks! now we are using PHP in multiple ways such as looking up 25mn+ dnd entries and looking up whether a number is on that or not - in less than 2ms, it helped us scale! PHP worked as simple shell scripts where we did not want to use ruby!
Hoppr can not live without java, all our sms sending modules, basic entry points, call back urls are all JSPs and java jms clients, in telco eco system we have less libraries for sending sms using CIMD, SMPP protocols in ruby compared to java, also, java sending module run as simple background programme in our system, they are efficient, fast and long running java programs get better and better in performance over period of time, we have been processing 1mn+ sms every day just by using this.
No production environment is complete if its not continuously being monitored by nagios, we use nagios at the core for monitoring everything at hoppr.
We just started using node for web presence! there has not been a significant advantage or disadvantage related to that. but more on that later...
Testing & CI
No code at hoppr gets to our server if its not been run using rspec, junit and has not passed the greenlight at your CI server, in last 90 days we have cross 1200+ builds - around 15 builds and 4 deployments per day! we run tests and write tests before writing code!
If all this stuff - near cache with redis, nosql with mongodb, excellent mvc using ruby on rails or sinatra, play with low level protocols using jms, java, smpp, cimd for sending and receiving sms and want to limit performace of application to 200ms benchmark, then we are waiting for you! drop me a note on twitter or send a mail at hoppr.com and we will get back to you! At hoppr you get to use fast mac books, with IDE of your choice! the only choice you don't have is to stop innovating at hoppr with all us! :-)
more on architecture in following post!