According to industry estimates, only 21% of the available data is present in
structured form. Data is being generated as we speak, as we tweet, as we
send messages on Whatsapp and in various other activities. Majority of this
data exists in the textual form, which is highly unstructured in nature.
Few notorious examples include – tweets / posts on social media, user to user
chat conversations, news, blogs and articles, product or services reviews and
patient records in the healthcare sector. A few more recent ones includes
chatbots and other voice driven bots.
Despite having high dimension data, the information present in it is not directly
accessible unless it is processed (read and understood) manually or analyzed
by an automated system.
In order to produce significant and actionable insights from text data, it is
important to get acquainted with the techniques and principles of Natural
Language Processing (NLP).