Sunday, December 22, 2013

The Essence of Big Data

It seems like the term "Big Data" has come to refer to so many things that it has become more of an aspirational marketing term than a technical one.
In dealing with "big data" projects at work I have come to a definition that seems useful.

Big Data is composed of several systems that work together:

  • The hardware that enables all the other algorithms needed to handle and analyze the data of interest.
  • The algorithms which efficiently load, transform and store data from varied, high volume data flows.
  • The algorithms which retrieve and manage the stored data in a way that allows other algorithms and s/w tools to act on it efficiently.
  • Correlation algorithms which automatically comb through the data looking for data items which are related in ways which might be "interesting" to the end user.
  • The visualization tools which are used to look at the automatically flagged "interesting" subset of data in order to determine if the data is actually useful and if so, how.
  • The actionable information that the end user extracts from the system which he uses to further whatever goals originally justified implementing the big data system.
The hardware, ETL and data storage parts have been addressed by a fairly large number of vendors using proprietary and open source methods. You could say that the data handling platform is becoming a commodity because the movement of data is an undifferentiated requirement for all big data users.

What is still hard is algorithmically filtering the flood of incoming data to pull out the nuggets of interest so that someone can confirm their meaning. The aspects which qualify something as interesting differ from industry to industry and company to company so coming up with a common, turn-key solution may not be possible. So, until that statement is proven false, the seller's market for data scientists will continue.

Tuesday, July 23, 2013

Mixing Metaphors: BAC! Who left this MES?

I have been learning a bit about building management systems (BMS) and realized that, abstracted the right way, the manufacturing execution systems (MES) of semiconductor fabs are pretty similar in overall concept and components. That parallel made it much easier for me to frame my learning. 

The brains of the system: BMS vs MES.
These s/w systems take schedules, control targets and feedback to make the building "work" (literally).

The communication protocols: BACnet vs SECS/GEM.
Granted, building equipment use many more protocols than just BACnet, some of which are proprietary to the equipment vendor.

 The building equipment which is actually being controlled for the purpose of making the building useful for its owners. 
An office building could be thought of a making an environment conducive to worker productivity with maximum efficiency.
A fab could be thought of as controlling the flow of materials between equipment to maximize output of wafers/chips at minimum cost.

 The sensor systems which allow the control system to make smart choices about controlling the building equipment.
For an office building this might mean feedback control for HVAC (don't over cool) and switching off unused lights or dimming lights for daylight harvesting.
For a fab this might mean monitoring the voltage and flow rates for a particular piece of process equipment and adjusting the process recipe for the next lot or wafer to ensure uniform film properties from lot to lot.

Friday, March 8, 2013

Wait... There's a Genie in my Economy!

What do the following have in common?
  • International Paper
  • United Airlines
  • Google
  • Facebook
  • Twitter
  • Amazon EC2
  • Kickstarter
  • 3D Printing
  • Genies

More specifically, they each help to reduce the friction between all of the things that must be in place to get something that you want.
  • International Paper - What does it take to get the paper in your printer? 
    • Rights to log trees. The labor to log them. The tools to cut down a tree. The knowledge to use those tools. The transports to move the trees. The machines to pulp the tree. The labor and knowledge to use the pulping equipment. The chemicals to process the pulp into paper... you get the idea...
  • United Airlines - What does it take to move yourself to Japan?
    • The money to buy an airplane. The knowledge to fly it. The contacts required and hours spent to negotiate the rights to take off from SFO and land at NRT. The labor and knowledge to service the aircraft... etc...
  • Google - What does it take to find out about everything on the internet?
    • The knowledge to create an algorithm that is helpful at finding what you want amid tons of stuff you don't. The programming skills to implement it. The knowledge to build the IT infrastructure to process and store all of the data required to run the algorithm. The servers and real-estate required to hold the servers... how easy would those be to get on your own?
  • Facebook - What does it take to find all of your long lost high school friends?
    • The hours and hours of phone calls to numbers in your old day runner (they still make these?) hoping that their parents still remember you and still live there. Or trawling through phone directories looking for the right Joe Smith... ugh...
    • OR build your own content site which will attract half of the planet AND get them to list their high school... pretty simple...
The remaining companies or topics flip the equation a bit as they are more general tools for reducing friction towards the end of doing something else.
  • Twitter - How could I publish my thoughts to "everyone" at a reasonable cost?
    • I could never mail a letter, call by phone or place enough radio and TV ads to do this. What would it cost to generate the lead list and qualify the leads to do this in a more focused way?
  • Amazon EC2 - How do I start a s/w business that scales without major capital outlays?
    • How else can I get enough computers to scale my SaaS business to profitability without the friction of convincing someone to front a significant amount of money to purchase and administer a server farm?
  • Kickstarter - How do I find funding to raise capital to do something people want to see done?
    • Am I lucky enough to be born rich? Did I get lucky enough to know powerful, rich people? Am I a good enough social engineer to find these people? Do know the right VCs? Is my product profitable enough to a VC for them to consider? What would it cost to build the audience of millions who are engaged enough to put money on the table - sight unseen?
  • 3D Printing - How do I make a complicated, custom physical part in low volume (qty 1)?
    • The money and space to buy a CNC machine plus the experience and knowledge to operate it? Or the hours spent to find a machine shop that will do a low volume run, now, for a reasonable price?
  • Genies - How do I do anything with anyone, anywhere at any time?
    • You have 3 wishes...
The interesting thing about removing the friction around doing "something else" is that it enables new ways for people to do things for themselves and, ultimately, find others who might want those things. Which they then might trade something for (like money). Which sounds sort of like an economy.
Take that to its logical conclusion where friction is, genie-like, reduced to near zero between all people and the resources / skills they hold and what is the purpose of a corporation as we know it today? We could do anything for ourselves by finding and coordinating the right people.

Maybe this does not happen in my lifetime, but the idea of friction seems like a powerful filter for looking at the value of any product or service that you are trying to create today. If it is not reducing friction then you're heading the wrong way.


Thanks to Gabe Newell for throwing the lightning bolt which fused the 10,000 threads in my brain into a coherent idea.

Take a look at this talk if you have an hour to spare.
Why is Valve structured as it is?
What is the purpose of a corporation in recent history and today?

Good stuff.