Commit 097c7896 authored by Patrick Schlindwein's avatar Patrick Schlindwein
Browse files

Merge branch 'Learn/#50' into 'master'

[learn,#50] done
parents 054eb744 0641a9cc
import math
import bs4 as bs
import urllib.request
import re
import nltk
from nltk.stem import WordNetLemmatizer
import spacy
#nltk.download('wordnet')
nlp = spacy.load('en_core_web_sm')
lemmatizer = WordNetLemmatizer()
def wiki_text(url):
scrap_data = urllib.request.urlopen(url)
article = scrap_data.read()
parsed_article = bs.BeautifulSoup(article,'lxml')
paragraphs = parsed_article.find_all('p')
article_text = ""
for p in paragraphs:
article_text += p.text
article_text = re.sub(r'\[[0-9]*\]', '', article_text)
return article_text
wiki_url = input("Enter Wikipedia URL to load Article : ")
text = wiki_text(wiki_url)
def frequency_matrix(sentences):
freq_matrix = {}
stopWords = nlp.Defaults.stop_words
for sent in sentences:
freq_table = {}
words = [word.text.lower() for word in sent if word.text.isalnum()]
for word in words:
word = lemmatizer.lemmatize(word)
if word not in stopWords:
if word in freq_table:
freq_table[word] += 1
else:
freq_table[word] = 1
freq_matrix[sent[:15]] = freq_table
return freq_matrix
def tf_matrix(freq_matrix):
tf_matrix = {}
for sent, freq_table in freq_matrix.items():
tf_table = {} #dictionary with 'word' itself as a key and its TF as value
total_words_in_sentence = len(freq_table)
for word, count in freq_table.items():
tf_table[word] = count / total_words_in_sentence
tf_matrix[sent] = tf_table
return tf_matrix
def sentences_per_words(freq_matrix):
sent_per_words = {}
for sent, f_table in freq_matrix.items():
for word, count in f_table.items():
if word in sent_per_words:
sent_per_words[word] += 1
else:
sent_per_words[word] = 1
return sent_per_words
def idf_matrix(freq_matrix, sent_per_words, total_sentences):
idf_matrix = {}
for sent, f_table in freq_matrix.items():
idf_table = {}
for word in f_table.keys():
idf_table[word] = math.log10(total_sentences / float(sent_per_words[word]))
idf_matrix[sent] = idf_table
return idf_matrix
def tf_idf_matrix(tf_matrix, idf_matrix):
tf_idf_matrix = {}
for (sent1, f_table1), (sent2, f_table2) in zip(tf_matrix.items(), idf_matrix.items()):
tf_idf_table = {}
for (word1, tf_value), (word2, idf_value) in zip(f_table1.items(),
f_table2.items()):
tf_idf_table[word1] = float(tf_value * idf_value)
tf_idf_matrix[sent1] = tf_idf_table
return tf_idf_matrix
def score_sentences(tf_idf_matrix):
sentenceScore = {}
for sent, f_table in tf_idf_matrix.items():
total_tfidf_score_per_sentence = 0
total_words_in_sentence = len(f_table)
for word, tf_idf_score in f_table.items():
total_tfidf_score_per_sentence += tf_idf_score
if total_words_in_sentence != 0:
sentenceScore[sent] = total_tfidf_score_per_sentence / total_words_in_sentence
return sentenceScore
def average_score(sentence_score):
total_score = 0
for sent in sentence_score:
total_score += sentence_score[sent]
average_sent_score = (total_score / len(sentence_score))
return average_sent_score
def create_summary(sentences, sentence_score, threshold):
summary = ''
for sentence in sentences:
if sentence[:15] in sentence_score and sentence_score[sentence[:15]] >= (threshold):
summary += " " + sentence.text
return summary
original_words = text.split()
original_words = [w for w in original_words if w.isalnum()]
num_words_in_original_text = len(original_words)
text = nlp(text)
sentences = list(text.sents)
total_sentences = len(sentences)
freq_matrix = frequency_matrix(sentences)
tf_matrix = tf_matrix(freq_matrix)
num_sent_per_words = sentences_per_words(freq_matrix)
idf_matrix = idf_matrix(freq_matrix, num_sent_per_words, total_sentences)
tf_idf_matrix = tf_idf_matrix(tf_matrix, idf_matrix)
sentence_scores = score_sentences(tf_idf_matrix)
threshold = average_score(sentence_scores)
summary = create_summary(sentences, sentence_scores, 1.3 * threshold)
print("\n****summary****\n")
print(summary)
print("\n")
print("Total words in original article = ", num_words_in_original_text)
print("Total words in summarized article = ", len(summary.split()))
"""
input : https://en.wikipedia.org/wiki/Artificial_intelligence
output:
****summary****
The distinction between the former and the latter categories is often revealed by the acronym chosen. ' A quip in Tesler's Theorem says "AI is whatever hasn't been done yet. AGI is among the field's long-term goals. These issues have been explored by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity if it progresses unabated. Marvin Minsky agreed, writing, "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved". They failed to recognize the difficulty of some of the remaining tasks. By 1985, the market for AI had reached over a billion dollars. No. 1 ranking for two years. and it's time to move on. Other programs handle imperfect-information games; such as for poker at a superhuman level, Pluribus (poker bot) and Cepheus (poker bot). See: General game playing. In a 2017 survey, one in five companies reported they had "incorporated AI in some offerings or processes". Goals can be explicitly defined or induced. AI often revolves around the use of algorithms. An algorithm is a set of unambiguous instructions that a mechanical computer can execute.[b] A complex algorithm is often built on top of other, simpler, algorithms. These learners could therefore derive all possible knowledge, by considering every possible hypothesis and matching them against the data. These inferences can be obvious, such as "since the sun rose every morning for the last 10,000 days, it will probably rise tomorrow morning as well". Besides classic overfitting, learners can also disappoint by "learning the wrong lesson". Modifying these patterns on a legitimate image can result in "adversarial" images that the system misclassifies.[c]
This gives rise to two classes of models: structuralist and functionalist. The functional model refers to the correlating data to its computed counterpart. The traits described below have received the most attention.
They solve most of their problems using fast, intuitive judgments. However, if the agent is not the only actor, then it requires that the agent can reason under uncertainty. This calls for an agent that can not only assess its environment and make predictions but also evaluate its predictions and adapt based on its assessment. Multi-agent planning uses the cooperation and competition of many agents to achieve a given goal. Emergent behavior such as this is used by evolutionary algorithms and swarm intelligence. Regression is the attempt to produce a function that describes the relationship between inputs and outputs and predicts how the outputs should change as the inputs change. In reinforcement learning the agent is rewarded for good responses and punished for bad ones. Applications include speech recognition, facial recognition, and object recognition. Computer vision is the ability to analyze visual input. AI is heavily used in robotics. Motion planning is the process of breaking down a movement task into "primitives" such as individual joint movements. Moravec's paradox can be extended to many forms of social intelligence. Many advances have general, cross-domain significance. No established unifying theory or paradigm guides AI research. Or is human biology as irrelevant to AI research as bird biology is to aeronautical engineering? This includes embodied, situated, behavior-based, and nouvelle AI. Nowadays results of experiments are often rigorously measurable, and are sometimes (with difficulty) reproducible. AI is relevant to any intellectual task. Modern artificial intelligence techniques are pervasive and are too numerous to list here. AI can also produce Deepfakes, a content-altering technology. As such, there is a need for policy making to devise policies for and regulate artificial intelligence and robotics. In all cases, only human beings have engaged in ethical reasoning. The time has come for adding an ethical dimension to at least some machines. Machine ethics is sometimes referred to as machine morality, computational ethics or computational morality. He argues that "any sufficiently advanced benevolence may be indistinguishable from malevolence. Some question whether this kind of check could actually remain in place.
Lethal autonomous weapons are of concern. The hard problem is explaining how this feels or why it should feel like anything at all. Human information processing is easy to explain, however human subjective experience is difficult to explain. The hard problem is that people also know something else—they also know what red looks like. (Consider that a person born blind can know that something is red without knowing what red looks like.)[k] If a machine can be created that has intelligence, could it also feel? If it can feel, does it have the same rights as a human? Are there limits to how intelligent machines—or human-machine hybrids—can be? Superintelligence may also refer to the form or degree of intelligence possessed by such an agent. The improved software would be even better at improving itself, leading to recursive self-improvement. The new intelligence could thus increase exponentially and dramatically surpass humans. Science fiction writer Vernor Vinge named this scenario "singularity". The long-term economic effects of AI are uncertain. The relationship between automation and employment is complicated. And, of course, other risks come from things like job losses. Algorithms already have numerous applications in legal systems. Humans, who are limited by slow biological evolution, couldn't compete and would be superseded. The goal of the institute is to "grow wisdom with which we manage" the growing power of technology. I think there is potentially a dangerous outcome there." This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: Logic machines in fiction and List of fictional computers
Total words in original article = 7903
Total words in summarized article = 902
"""
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment