I started exploring and configuring the AMAZON EC2 cloud for building the big analytics platform.
- easy to control the cost
- ideal for on demand services for clients
- high flexible availability
- focus on core competency, that is big data analytics
The following is a great set of readings before making up the decisions on how much to sign in, and how to run a cluster computing platform in EC2.
- read basic materials at http://aws.amazon.com
- Benefit of creating spot instances - discounted architecture for testing environment in the early stages
- Building a cluster of choice in 10 minutes
- More how to videos and webinars
- Want to run R on EC2 - The author also has notes for running windows OS in EC2 and associated R program
- One more tutorial - http://user2010.org/tutorials/Zolot_tut.pdf
Predictive Models, Data Mining, CRM Portals Inc, Institute of Analytics
Consumer Centric Predictive Models and Data Mining. The site CRMportals Inc., is being redesigned and until then this blog will be my main source of reaching out to you. If you still want to visit CRMportals Inc, click on the right side logo CRMportals Inc.
Tuesday, November 15, 2011
Monday, November 14, 2011
Some Important Reading for Open Source Data Mining Software
The following is an excellent comparative review that compares Orange, R, RapidMiner, Statistica, and WEKA.
Orange, R, RapidMiner, Statistica and WEKA
The interface for truly ‘big’ data sets is developed and tested by http://prekopcsak.hu/papers/preko-2011-rcomm.pdf , and is available in beta version at http://radoop.eu and the linux cluster hardware architecture is ideal.
Another open source data mining software is
http://eric.univ-lyon2.fr/~ricco/tanagra/index.html
Orange, R, RapidMiner, Statistica and WEKA
The interface for truly ‘big’ data sets is developed and tested by http://prekopcsak.hu/papers/preko-2011-rcomm.pdf , and is available in beta version at http://radoop.eu and the linux cluster hardware architecture is ideal.
Another open source data mining software is
http://eric.univ-lyon2.fr/~ricco/tanagra/index.html
Wednesday, November 02, 2011
A great predictive scoring system for options
http://seekingalpha.com/article/304428-a-daily-options-trading-strategy-for-high-flying-stocks?source=yahoo
Sunday, October 30, 2011
Best Practices for Dashboards
The dashboard is a quick and visual way of capturing and presenting the current status of the operations/sales/production/logistics in an organization. The dashboards should stimulate discussions, warn if status, trends, and exceptions have potential negative effect on the key stakeholder measurements. To be effective, dashboards have to satisfy the following best practices.
- Have to show the status, trends, projections, and exceptions
- Have to be visual and capturing and presenting information at the least on a daily basis. Gone are the days of monthly report and even weekly reports
- Have alerts with analytical insights - Advanced
- Have to explain with few drill downs key projections and how various metrics behind the scene are measured to protect the adverse events of projections - Advanced
- Many distinct layers of dashboard - operations layer, sales layer, and executive layer as the metrics concentrated are pretty different in these - Advanced
- The latest trend is the customer/clients comments in social sites could be flowing in real time on a side bar
Some of reasonable dashboards are
- http://www.enterprise-dashboard.com/tag/insurance-marketing-dashboard/
Fairbalance statement: I am not connected to this company in any way
-
- Have to show the status, trends, projections, and exceptions
- Have to be visual and capturing and presenting information at the least on a daily basis. Gone are the days of monthly report and even weekly reports
- Have alerts with analytical insights - Advanced
- Have to explain with few drill downs key projections and how various metrics behind the scene are measured to protect the adverse events of projections - Advanced
- Many distinct layers of dashboard - operations layer, sales layer, and executive layer as the metrics concentrated are pretty different in these - Advanced
- The latest trend is the customer/clients comments in social sites could be flowing in real time on a side bar
Some of reasonable dashboards are
- http://www.enterprise-dashboard.com/tag/insurance-marketing-dashboard/
Fairbalance statement: I am not connected to this company in any way
-
Thursday, September 29, 2011
Predictive Analytics in Audting


This is fascinating how simple methods like Benford's law could have helped identify something unusual is going on and hence would have helped predict the current Europe's economic condition.
Fact and Fiction in EU-Governmental Economic Data
Bernhard Rauch1,
Max Göttsche1,
Gernot Brähler2,
Stefan Engel3
in German Economic Review.
I was fascinated by the article abstract from the reference, which is
"To detect manipulations or fraud in accounting data, auditors have successfully used Benford's law as part of their fraud detection processes. Benford's law proposes a distribution for first digits of numbers in naturally occurring data. Government accounting and statistics are similar in nature to financial accounting. In the European Union (EU), there is pressure to comply with the Stability and Growth Pact criteria. Therefore, like firms, governments might try to make their economic situation seem better. In this paper, we use a Benford test to investigate the quality of macroeconomic data relevant to the deficit criteria reported to Eurostat by the EU member states. We find that the data reported by Greece shows the greatest deviation from Benford's law among all euro states."
Benford's law says that a truely randomly occuring measurements first digits are likely to follow a certain type of probability distribution
picture and equation reference: wikipedia
Wednesday, September 28, 2011
Top 10 Data Mining Algorithms - IEEE Knowledge and Information Systems 2008
Top 10 data mining algorithms - Knowledge and Information Systems (2008)
publication.
This not only provides the experience and thought processes of 145 data mining experts who voted on these, but also a great review paper for those who are in the field of data mining. Kudos to the organizer of this specific panel and team worked on this. A great contribution to the science.
Though it looks like, it is heavily influenced by the current trends in the field, I tend to think the ease of use, interpretation, and amenability for automated scoring will keep this selection of 10 for many many years to come. For example, even in new trends in web data, big data, un-structured data, the top algorithms will continue to dominate in terms of its applications, interpretations and quickly usability of these algorithms.
publication.
This not only provides the experience and thought processes of 145 data mining experts who voted on these, but also a great review paper for those who are in the field of data mining. Kudos to the organizer of this specific panel and team worked on this. A great contribution to the science.
Though it looks like, it is heavily influenced by the current trends in the field, I tend to think the ease of use, interpretation, and amenability for automated scoring will keep this selection of 10 for many many years to come. For example, even in new trends in web data, big data, un-structured data, the top algorithms will continue to dominate in terms of its applications, interpretations and quickly usability of these algorithms.
Tuesday, September 27, 2011
NY Institute of Analytics is Hiring Trainers for ONLINE Training Program - SAS BASE, SAS ADVANCED, R BASE, R ADVANCED

Institute of Analytics (NY) at 344, West 38th Street is looking for additional great trainers for online training of students from around the globe to help them achieve SAS Basic, SAS Advanced, R Basic, and R Advanced certifications.
Please forward your resume to Arpitha (arpitha@instituteofanalytics.com - PH: 215-803-1488)
If you are interested in training R courses, please attach the course contents that is considered to be desirable for R Basic and R Advanced.
Subscribe to:
Posts (Atom)