Tuesday, November 15, 2011

Learning the Architecture for Big Data Big Insights Analytics Platform

I started exploring and configuring the AMAZON EC2 cloud for building the big analytics platform.

- easy to control the cost
- ideal for on demand services for clients
- high flexible availability
- focus on core competency, that is big data analytics

The following is a great set of readings before making up the decisions on how much to sign in, and how to run a cluster computing platform in EC2.

- read basic materials at http://aws.amazon.com
- Benefit of creating spot instances - discounted architecture for testing environment in the early stages
- Building a cluster of choice in 10 minutes
- More how to videos and webinars
- Want to run R on EC2 - The author also has notes for running windows OS in EC2 and associated R program
- One more tutorial - http://user2010.org/tutorials/Zolot_tut.pdf

Monday, November 14, 2011

Some Important Reading for Open Source Data Mining Software

The following is an excellent comparative review that compares Orange, R, RapidMiner, Statistica, and WEKA.

Orange, R, RapidMiner, Statistica and WEKA

The interface for truly ‘big’ data sets is developed and tested by http://prekopcsak.hu/papers/preko-2011-rcomm.pdf , and is available in beta version at http://radoop.eu and the linux cluster hardware architecture is ideal.

Another open source data mining software is

http://eric.univ-lyon2.fr/~ricco/tanagra/index.html

Wednesday, November 02, 2011

A great predictive scoring system for options

http://seekingalpha.com/article/304428-a-daily-options-trading-strategy-for-high-flying-stocks?source=yahoo

Sunday, October 30, 2011

Best Practices for Dashboards

The dashboard is a quick and visual way of capturing and presenting the current status of the operations/sales/production/logistics in an organization. The dashboards should stimulate discussions, warn if status, trends, and exceptions have potential negative effect on the key stakeholder measurements. To be effective, dashboards have to satisfy the following best practices.

- Have to show the status, trends, projections, and exceptions
- Have to be visual and capturing and presenting information at the least on a daily basis. Gone are the days of monthly report and even weekly reports
- Have alerts with analytical insights - Advanced
- Have to explain with few drill downs key projections and how various metrics behind the scene are measured to protect the adverse events of projections - Advanced
- Many distinct layers of dashboard - operations layer, sales layer, and executive layer as the metrics concentrated are pretty different in these - Advanced
- The latest trend is the customer/clients comments in social sites could be flowing in real time on a side bar

Some of reasonable dashboards are
- http://www.enterprise-dashboard.com/tag/insurance-marketing-dashboard/


Fairbalance statement: I am not connected to this company in any way
-

Thursday, September 29, 2011

Predictive Analytics in Audting



This is fascinating how simple methods like Benford's law could have helped identify something unusual is going on and hence would have helped predict the current Europe's economic condition.

Fact and Fiction in EU-Governmental Economic Data

Bernhard Rauch1,
Max Göttsche1,
Gernot Brähler2,
Stefan Engel3

in German Economic Review.

I was fascinated by the article abstract from the reference, which is

"To detect manipulations or fraud in accounting data, auditors have successfully used Benford's law as part of their fraud detection processes. Benford's law proposes a distribution for first digits of numbers in naturally occurring data. Government accounting and statistics are similar in nature to financial accounting. In the European Union (EU), there is pressure to comply with the Stability and Growth Pact criteria. Therefore, like firms, governments might try to make their economic situation seem better. In this paper, we use a Benford test to investigate the quality of macroeconomic data relevant to the deficit criteria reported to Eurostat by the EU member states. We find that the data reported by Greece shows the greatest deviation from Benford's law among all euro states."

Benford's law says that a truely randomly occuring measurements first digits are likely to follow a certain type of probability distribution

picture and equation reference: wikipedia

Wednesday, September 28, 2011

Top 10 Data Mining Algorithms - IEEE Knowledge and Information Systems 2008

Top 10 data mining algorithms - Knowledge and Information Systems (2008)
publication.


This not only provides the experience and thought processes of 145 data mining experts who voted on these, but also a great review paper for those who are in the field of data mining. Kudos to the organizer of this specific panel and team worked on this. A great contribution to the science.

Though it looks like, it is heavily influenced by the current trends in the field, I tend to think the ease of use, interpretation, and amenability for automated scoring will keep this selection of 10 for many many years to come. For example, even in new trends in web data, big data, un-structured data, the top algorithms will continue to dominate in terms of its applications, interpretations and quickly usability of these algorithms.

Tuesday, September 27, 2011

NY Institute of Analytics is Hiring Trainers for ONLINE Training Program - SAS BASE, SAS ADVANCED, R BASE, R ADVANCED



Institute of Analytics (NY) at 344, West 38th Street is looking for additional great trainers for online training of students from around the globe to help them achieve SAS Basic, SAS Advanced, R Basic, and R Advanced certifications.

Please forward your resume to Arpitha (arpitha@instituteofanalytics.com - PH: 215-803-1488)

If you are interested in training R courses, please attach the course contents that is considered to be desirable for R Basic and R Advanced.