Everyone seems to
agree unstructured data is a problem.
Digging down into the real causes must start with challenging the
assumptions being made about the problem itself. The first place to start is with the basic
definitions. Just what is unstructured
data? Who says so? Why?
Who should really define unstructured data?
At first glance, the
question as to what unstructured data is appears to be quite simple. It is just a pile of files. I’m going to show that such a definition is
completely wrong and is at the heart of the problem. But first, let’s look into the question as to
who gets to even set the definition.
The storage industry
would seem to be a logical choice for having the responsibility of definition. They are the ones that are building the
actual technology (file systems, object storage, etc.) Are they the right people though? Who else could be candidates? The IT organizations are the next choice but
they aren’t much different than storage people. IT treats unstructured data as a pile of
files that have to be stored and backed up.
They are also missing the point.
Well, OK. Who’s left?
The group of people who should have the most input to the definition of
unstructured data are the ones that actually create them in the first place,
are responsible for them once created, get judged by them, and get yelled at
over them. These are the business users and managers that
are actually using unstructured data to run their organizations. Remember them? They are the ones paying for all this storage
in the first place. Unfortunately, they seem
to be the most ignored by the storage industry.
However, if you spend some time talking to these people, watching what
they try to do with storage, and really listen to what they are saying, what
you learn can be amazingly revealing.
So, what do THEY talk about when they refer to unstructured data?
The
first thing you notice talking to business users about managing their
unstructured data is that they view it as something much different than
files. Instead, they refer to what we
call information assets, not
individual files. We will drill into
what these really are in much more detail but they are essentially business
definitions of their information. They
talk about their contracts, purchase orders, final reports, vendor agreements,
marketing plans, etc. They don’t talk
files and directories. Information
assets are at a much higher level. They
are collections of files, tracking data, logs, emails, rules, supporting items
like photographs, people lists, spreadsheets, contact information, etc., that
collectively are meaningful to that business.
This is where the concept of the information asset came from. These business people work with these assets
but are disappointed to find out all they get to contend with are individual
files.
The
next thing you notice is these same business users quickly transition to
talking about collections of their assets.
They start to refer to “all approved contracts”, “paid invoices from
last year”, or some such descriptions.
They are actually referring to something incredibly powerful for IT
people. These are what we call
containers, which are defined as collections of assets with the same business context. These containers are where storage management
functions can be specified, not on the files individually. If you listen to these business people
carefully, they are showing you how to solve one of the big problems with
unstructured data management. You don’t
have to decide how to apply functions to every single file separately. There are simply far too many to make that
practical. However, applying them at the
container level is now much simpler, more efficient, and can actually match the
business requirements of that information.
One
of the more surprising observations when doing these interviews is just how
short they can be. The people in charge
of the assets generally know how they work.
They can tell you very quickly what they really need from storage. Many will ask for advice on the process,
different ways of doing it, how others handle it, etc. They all complain about it, and most will
tell a story or two about some disaster in the past. The key thing is that having a discussion
with the actual stewards of the asset can very quickly and efficiently define
how storage management should operate.
Now IT can provide that level of service for the business. This is an example of the long standing
desired but seldom realized objective to better align business and IT.
We
will drill down into this new definition in much more detail in later postings but
what are the ramifications of such a different view of unstructured data? The rest of this blog is going to be focused
on how the information asset model impacts the storage industry but the
ramifications are incredibly profound.
It will send shock waves through the storage industry. Just about every aspect of unstructured data
will be changed. Many of the core values
of some companies will be impacted greatly.
Even the economics of how storage management functions are funded,
valued, implemented, configured, and deployed will change, in many cases
drastically. Needless to say, this will
make storage people “uncomfortable”!
The
biggest challenge the new definition presents concerns the very operating
system architecture and implementation of unstructured data. The core file interfaces down in the kernel
of today’s operating systems, the file system implementations, file sharing
protocols like CIFS, NFS, etc., are all incorrect. The very assumption these were built on so
many years ago were actually wrong!
That is a pretty
bold statement. It is also kind of
scary. If this is right, and it will
take a while to convince you it is right, what can we do about it? It would seem that the architecture so
ingrained into today’s computers could not possibly be uprooted and
replaced. Does this mean that a solution
is simply not possible?
The “file
infrastructure must change but the infrastructure can’t possibly change”
paradox has kept a lot of good people away from attacking the problem. Even Microsoft isn’t big enough (foolish
enough?) to rototill that much code.
What chance does anyone else have at solving this? Does this mean we are stuck with all the
problems and that’s just life? No.
All is not
lost. We have shown that with careful
systems engineering, adding information asset support to today’s operating
systems (both Windows and Linux) can be done without ANY changes to the
infrastructure. No changes to
applications, operating systems, file systems, file sharing protocols, drivers,
storage, or anything else for that matter need to change.
The ramifications of
this new business view of unstructured data will be the focus of future posts…
No comments:
Post a Comment