Hi, my name is EKO WAHYUDIHARTO
I'm just an ordinary man


Trustworthy | Competent | Harmonious | Loyal | Adaptive | Collaborative

PAST

PRESENT

FUTURE

TAKE A LOOK @ WHAT I RECORD AND WRITE

Something that might be useful for anyone who have passion in computer science, quality control circle and project management

`
  • Posted on Sunday September 19, 2021
    Definition of terms:Sastrawi is a simple PHP library that allows you to reduce inflected words in Indonesian (Bahasa Indonesia) to their basic form (stem).Cleansing is an activity to improve data systematically using certain algorithms.Stemming is the process of changing affixed words into root words.Tokenizing is the process of dividing text, which can be in the form of sentences, paragraphs or documents, into certain tokens/parts. Tokenization is often used in linguistics and the tokenization results are useful for further text analysis.Stop-words are common words that usually appear in large numbers and are considered meaningless. Stop words are commonly used in information retrieval tasks, including by Google.Source:import requestsimport stringimport refrom bs4 import BeautifulSoupimport nltkfrom nltk.corpus import stopwords!pip install Sastrawifrom Sastrawi.Stemmer.StemmerFactory import StemmerFactoryweb = requests.get('https://wartakota.tribunnews.com/').textsoup = BeautifulSoup(web)for s in soup(['script', 'style']):        s.decompose()teks = ' '.join(soup.stripped_strings)print (teks)teks = teks.lower()teks = re.sub(r"\d+", "", teks) #remove numberteks = teks.translate(str.maketrans("","",string.punctuation)) #remove punctuationteks = teks.strip() #remove empty characterfactory = StemmerFactory()stemmer = factory.create_stemmer()output   = stemmer.stem(teks)print (output)tokens = [t for t in output.split()]print(tokens)nltk.download()clean_tokens = tokens[:]for token in tokens:  if token in stopwords.words('indonesian'):      clean_tokens.remove(token)freq = nltk.FreqDist(clean_tokens)for key,val in freq.items():  print(str(key) + ':' + str(val))freq.plot(30)And I wrapped them all into single video below:Please support this blog or my video channel with subscribe ... Continue Reading »

I LIVE IN JAKARTA

Jakarta is the political, economic and cultural capital of Indonesia. ... Around 230 languages are spoken here and you'll find a wealth of different cultures and communities throughout the capital. ... Jakarta's world-class tourist attractions are renowned across the globe.