pasterequipment.blogg.se - Target text extractor

The base line model is a simple bidirectional LSTM. The base line model should be simple so that if any error occur we could detect easily and in future we can compare with other models Its expressed in the mathematical terms by, The idea is to find the number of common tokens and divide it by the total number of unique tokens. The higher the score, the more similar the two strings. Jaccard Score is a measure of how similar/dissimilar two sets are. Another thing we need to look how many sentences coming per minuteġ.5 : For the given task what should be my KPI (key Performance Indicator) :Īs this was a competition on kaggle they have chosen Jaccard similarity as their KPI and from my point of view it is the right metric we should care about.It should pick right phrase from the text maximum time (as well as possible).Interpretability is important(but it is going to difficult as we will working on deep learning model but we can do it by something called post analysis of our ml models).No low-latency requirement (but it should be reasonable).1.4 : Before going ahead it is good to think about objective and constraints: We can easily get the data from kaggle re is the link : The objective is clear/ simple : Given a text and the sentiment we have to predict the selected text (which is the a word or phrase of the text).Īnswer such questions like what kind of problem is it (classification regression)? - This is a little bit difficult to answer but i would say it is a classification task but in a different way (because we have to classify the phrase of the text ) 1.3 : where can we find the data : Words defining that it is a negative tweet : “bullying me” (this is something we want to find in this case study) 1.2 : Now the question is which kind of ML problem it is (classification or regression or something else ) ? Here is a tweet : “ my boss is bullying me…” We need to pick out the part of the tweet (word or phrase) that reflects the sentiment. 1.1 : Now lets talk about the machine learning problem that i am about solving : This case study is about capturing the sentiment or meaning behind a tweet. But, which words actually lead to the sentiment description Capturing sentiment in language is important in these times where decisions and reactions are created and updated in seconds. With all of the tweets circulating every second it is hard to tell whether the sentiment behind a specific tweet will impact a company, or a person’s, brand for being viral (positive), or devastate profit because it strikes a negative tone. This is first think we should ask to ourselves. Previously i have worked on those project where i have to just predict the sentiment given some data but in this project there is an addition to the whole task. As, here i need to extract phrase from a sentence base on some sentiment.