Semi Structured Data:-
Semi Structured Data does not follow any data model. It cannot be stored in rows and columns. It has tags that help to group the data and describe how the data is stored.
In semi-structured data, similar entities are grouped and organized in a hierarchy. The attributes within the group may or may not be the same.
Two address may or may not have same number of attributes:
<house number><street number><area name><city>
<house number><street number><City>
Example: 2- Web pages.
Sources of data
- XML document
- TCP/IP packets
- Zipped files
- Binary Executables
- Integration of data from heterogeneous sources.
Characteristics of data
The pattern of SS data is irregular. eg. at one place, the date will be in the format 7/10/2017 and at some other place, the format can be 2017/oct/7.As a result of this irregular pattern, it is difficult to store and mine.
This data does not have any fixed schema and that’s why it is also called schema-less data.
Most of the data come from various internet sources. As a result, there is constant updating of data which results in creating newly updated data. So structured data is fast changing data.
This data is heterogeneous because most of the data comes from various Internet sources.
Storage of semi-struct data
This data is highly irregular and schema-less which makes the storage of semi-struct data a key issue. There are two models which help in the storage of semi-structured data:
- OEM(Object Exchange Model)
- DOM(Document Object Model)