Commit f368da46 authored by Patrick Schlindwein's avatar Patrick Schlindwein
Browse files

[learn, #48] spike done

parent ef8a1413
Yamaha is reminding people that musical equipment cases are for musical equipment - not people -
two weeks after fugitive auto titan Carlos Ghosn reportedly was smuggled out of Japan in one. In a tweet over the
weekend, the Japanese musical equipment company said it was not naming any names, but noted there had been many
recent stories about people getting into musical equipment cases. Yamaha (YAMCY) warned people not to get into,
or let others get into, its cases to avoid "unfortunate accidents." Multiple media outlets have reported that Ghosn
managed to sneak through a Japanese airport to a private jet that whisked him out of the country by hiding in a
large, black music equipment case with breathing holes drilled in the bottom. CNN Business has not independently
confirmed those details of his escape. The former Nissan (NSANF) CEO had been out on bail awaiting trial in Japan on
charges of financial wrongdoing before making his stunning escape to Lebanon at the end of December. Ghosn has
referred to his departure as an effort to "escape injustice." In an interview with CNN's Richard Quest last week,
Ghosn did not comment on the nature of his escape, saying he didn't want to endanger any of the people who aided in
the operation. Ghosn did, however, respond to a question about what it felt like to ride through the airport in a
packing case by first declining to comment but then adding: "Freedom, no matter the way it happens, is always sweet."
In a press conference in Lebanon ahead of the CNN interview last Wednesday, Ghosn's first public appearance since
fleeing Japan, Ghosn said he decided to leave the country because he believed he would not receive a fair trial,
a claim Japanese authorities have disputed. Brands sometimes capitalize on their tangential relationship to big news
in order to attract attention on social media. Yamaha is one of Japan's best known brands and Ghosn was one of
Japan's top executives before being ousted from Nissan — a match made in social media heaven. Not surprisingly,
Yamaha's post went viral on Twitter over the weekend.
\ No newline at end of file
import spacy
from collections import Counter
from string import punctuation
import de_core_news_sm
import en_core_web_sm
# nlp = de_core_news_sm.load()
nlp = en_core_web_sm.load()
def top_sentence(text, limit):
keyword = []
pos_tag = ['PROPN', 'ADJ', 'NOUN', 'VERB']
doc = nlp(text.lower())
for token in doc:
if token.text in nlp.Defaults.stop_words or token.text in punctuation:
continue
if token.pos_ in pos_tag:
keyword.append(token.text)
freq_word = Counter(keyword)
max_freq = freq_word.most_common(1)[0][1]
for w in freq_word:
freq_word[w] = freq_word[w] / max_freq
sent_strength = {}
for sent in doc.sents:
for word in sent:
if word.text not in freq_word.keys():
continue
if sent in sent_strength.keys():
sent_strength[sent] += freq_word[word.text]
else:
sent_strength[sent] = freq_word[word.text]
summary = []
sorted_x = sorted(sent_strength.items(), key=lambda kv: kv[1], reverse=True) # 13
counter = 0
for i in range(len(sorted_x)):
summary.append(str(sorted_x[i][0]).capitalize())
counter += 1
if counter >= limit:
break
return ' '.join(summary)
if __name__ == '__main__':
source = open('exampletext.txt', 'r').read()
print(top_sentence(source, 3))
## How it works
spaCy doesn't use `Term Frequency-Inverse Data Frquency`, unlike most other summarization tools.\
It identifies top sentences by tokenizing the article and calculating/extracting important keywords.\
Then it calculates the importance of single sentences based on keyword appearance.
## Requirements
1. Download correct trained pipeline using `python -m spacy download <name>`. A selection can be found here: [spaCy trained models](https://spacy.io/models)
2. Import the desired trained pipeline: `import <name>` and initialize `nlp` with it: `nlp = <name>.load()`
## Running it
Paste the article you'd like to summarize in the `exampletext.txt` file and run the python file.\
Currently it's configured for texts in the English Language.
## Problems and limitations
- Result does not retain case-sensitivity
- Sentences will be unordered, meaning the result could seem incoherent
- The length of result has to be specified manually
The used code is not pretty and copy-pasted. This just services as a proof-of-concept and an example.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment