Unstructured data – Digital data is also known as Unformed data.
It is not stored in a database and does not conform to any model. i.e. it is difficult to determine the meaning of data. It does not follow any rules.
Data can be of any type. i.e audio,video , text file .etc
- Body of an E-mail
- Word document
- Power point presentations
While these sorts of files may have an internal structure, they are still considered “unstructure” because the data they contain doesn’t fit neatly in a database.
Unstructure Data-Analysis Issue :-
The unstructure data is present in different formats(images, text), so it is difficult to generalize and analyze.
Lack of standardization:
For storing data in relational databases, it is important that there should be similarity in the data(standardization). We cannot standardize the images and text, which makes it difficult to store in the form of a relational database.
The content of the unstructure data is generally written using informal language(slangs). When people communicate on Facebook, they use different slangs. So it is difficult to store and standardize the data.
As the unstructure data is in the form of text, images, etc, it is difficult to understand the true meaning of unstructured data. Somebody can interpret unstructured data in a different manner, whereas somebody else can visualize the same unstructured data in a different manner. This can cause much ambiguity.
Ways to manage unstructured data
- It helps in searching and retrieval.
- Text can be indexed based on text string but in case of non-text based files,eg.audio/video etc, indexing depends on file names.
CAS(content addressable storage):
- It stores data based on their metadata.
- It assigns a unique name to every object stored in it.
- The object is retrieved based on its content and not its location.
- It is used to store e-mails etc.
Characteristics of Unstructure data
- It does not reside in RDBMS tables.
- Has not predefined format.
- Difficult to categorize.
- Not arranged in any order.