Merge branch 'kd_dataset_description' into 'main'

Described missing Datasets and fixed wrong number in exercise_deep_learning See merge request !131

Merge branch 'kd_dataset_description' into 'main'
ce08461e · Daniel Müller · 1f2b5098 · 5c3a10c9 · ce08461e · 1f2b5098
Commit ce08461e authored 3 years ago by Daniel Müller
--- a/notebooks/data/Auto_mpg/README.md
+++ b/notebooks/data/Auto_mpg/README.md
+# Context
+
+Mileage per gallon performances of various cars. The data is technical specs of cars.   
+Origin: This dataset was taken from the StatLib library which is
+maintained at Carnegie Mellon University. The dataset was
+used in the 1983 American Statistical Association Exposition.
+(c) Date: July 7, 1993
+
+# Content
+
+The dataset has 398 entries and 9 attributes.
+This file contains the basic information (mpg, cylinders, displacement, horsepower, weight, acceleration, model year, origin, car name) about the cars. Be careful, there are 6 invalid values in the 'horsepower' column.
+
+# Example Entries
+
+|mpg|cylinders|displacement|horsepower|weight|acceleration|model year|origin|car name|
+|----|----|----|----|----|----|----|----|----|
+|18|8|307|130|3504|12|70|1|chevrolet chevelle malibu|
+|15|8|350|165|3693|11.5|70|1|buick skylark 320|
+|18|8|318|150|3436|11|70|1|plymouth satellite|
+|16|8|304|150|3433|12|70|1|amc rebel sst|
+|17|8|302|140|3449|10.5|70|1|ford torino|
+
+
+# Credit
+
+Dataset acquired from [Kaggle](https://www.kaggle.com/uciml/autompg-dataset)
+
--- a/notebooks/data/Auto_mpg/ReadMe.md
+++ b/notebooks/data/Auto_mpg/ReadMe.md
-https://www.kaggle.com/shubhampundir/autompg-dataset
\ No newline at end of file
--- a/notebooks/data/Letters/README.md
+++ b/notebooks/data/Letters/README.md
+# Context
+
+This dataset holds example images(28x28 pixels) of handwritten letters. 
+
+# Content
+
+The dataset has 1499 training and 3999 test images with labels.
+
+# Credit
+## Test Data
+Dataset acquired from [Kaggle](https://www.kaggle.com/crawford/emnist?select=emnist-letters-test.csv)
+
+## Train Data
+Dataset acquired from [Kaggle](https://www.kaggle.com/crawford/emnist/version/3?select=emnist-letters-train.csv)
--- a/notebooks/data/Letters/ReadMe.md
+++ b/notebooks/data/Letters/ReadMe.md
-## Test Data
-https://www.kaggle.com/crawford/emnist?select=emnist-letters-test.csv
-
-## Train Data
-https://www.kaggle.com/crawford/emnist/version/3?select=emnist-letters-train.csv
\ No newline at end of file
--- a/notebooks/exercises/exercise_deep_learning.ipynb
+++ b/notebooks/exercises/exercise_deep_learning.ipynb
@@ -41,7 +41,7 @@
   "metadata": {},
   "source": [
    "## Load Test and Trainig Data\n",
-    "We reduced the training dataset to **15000** and the test dataset to **4000** entries. Otherwise the nodebook will fail because of RAM issues (especially for the **Raspberry PI 3**)   \n",
+    "We reduced the training dataset to **14999** and the test dataset to **3999** entries. Otherwise the nodebook will fail because of RAM issues (especially for the **Raspberry PI 3**)   \n",
    "Unfortunately it will reduce the accuracy of the model"
   ]
  },

 %% Cell type:markdown id:97b380db-e5c4-47fc-9239-723ef2b96c89 tags:

 # Deep Learning
 Your task is to build a deep neural network with Dense Layers to classify the letters from the emnist dataset

 %% Cell type:markdown id:22cfd362-a197-4d6b-97b4-7ebcd09eb091 tags:

 ## Imports

 %% Cell type:code id:8604e125-06c3-4710-b812-029499e87d21 tags:

 ``` python
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import tensorflow as tf
 from keras.models import Sequential
 from keras.layers.core import Dense
 from keras.layers import Lambda
 from keras.utils import np_utils
 from mpl_toolkits.axes_grid1 import ImageGrid
 ```

 %% Cell type:markdown id:ddce98e9-e324-4a64-93ac-d76285673b27 tags:

 ## Load Test and Trainig Data
-We reduced the training dataset to **15000** and the test dataset to **4000** entries. Otherwise the nodebook will fail because of RAM issues (especially for the **Raspberry PI 3**)
+We reduced the training dataset to **14999** and the test dataset to **3999** entries. Otherwise the nodebook will fail because of RAM issues (especially for the **Raspberry PI 3**)
 Unfortunately it will reduce the accuracy of the model

 %% Cell type:code id:2f607d91-ca98-4204-9d09-d02504c327b4 tags:

 ``` python
 train = pd.read_csv('../data/Letters/emnist-train.csv.gz')
 test = pd.read_csv('../data/Letters/emnist-test.csv.gz') # load data into Test and Training Data
 ```

 %% Cell type:markdown id:1505a3f0-3e03-4e2e-8bfe-d4df14990378 tags:

 ## Split into image and lable

 %% Cell type:code id:2c08d87c-fed0-4036-9460-daca9602a888 tags:

 ``` python
 # code here
 ```

 %% Cell type:markdown id:819d75a2-8eca-4d3b-8792-e075dccac862 tags:

 ## Show the first 9 images

 %% Cell type:code id:931516c6-bc6c-4245-8dfe-4a0e52a14c9e tags:

 ``` python
 # code here
 ```

 %% Cell type:markdown id:04a3c6bb-68b6-468c-8643-0c1d92b4b362 tags:

 ## Put image into a single vector

 %% Cell type:code id:abb287f2-0f5c-4717-8986-af92a1db0e98 tags:

 ``` python
 # code here
 ```

 %% Cell type:markdown id:f7673eb1-a8bd-41e0-9e21-c1a72d2cf971 tags:

 ## Function to normalize pixel values

 %% Cell type:code id:a6fbc91d-9127-4c61-8f59-0c3ced262d5b tags:

 ``` python
 def preprocess_image(image): # input for this method needs to be an image
    return image / np.float32(255.0) # divide each pixel by 255 and return the new image
 ```

 %% Cell type:markdown id:788d6b65-341a-4fd0-8778-49554e12a7f6 tags:

 ## How many classes do we have?

 %% Cell type:code id:76d7a70f-35c0-4ccb-a834-52371b23fb7d tags:

 ``` python
 # code here
 ```

 %% Cell type:markdown id:e9ddaa85-9695-4368-9d2f-30526d195f93 tags:

 ## Categorize Dataset

 %% Cell type:code id:8531ff4c-4039-4bf5-ae48-752ea954a1b1 tags:

 ``` python
 # code here
 # use to_categorical()
 ```

 %% Cell type:markdown id:c96e71f0-ed34-47ad-a93e-0e1a18cfb567 tags:

 ## Model for the neural network

 %% Cell type:code id:41142012-e981-4444-a77b-583724b6d254 tags:

 ``` python
 # model_One = Sequential() # A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

 # model_One.add(Lambda(preprocess_image)) # in this first Layer we normalize the image

 # ...
 ```

 %% Cell type:markdown id:d5ac1df1-5259-458e-beaa-796a6db973da tags:

 ## Plot Data

 %% Cell type:code id:add7c2e8-e1ab-4acd-8b02-526090f8d7f7 tags:

 ``` python
 # code here
 ```

 %% Cell type:markdown id:340efb59-8119-4f1d-bdfc-5623abc3b7bf tags:

 ## Evaluate with test data

 %% Cell type:code id:bd626a2b-dfc4-402b-8e7c-ca8aa40143cd tags:

 ``` python
 # Use model_One.evaluate()

 # print result
 #print("Training accuracy: " + str(np.round(train_accuracy1 * 100, 2)) + "%") # print the train accuracy in percent
 #print("Test accuracy: " + str(np.round(test_accuracy1 * 100, 2)) + "%") # print the test accuracy in percent
 ```

 %% Cell type:code id:b0b53870-fa56-4c42-a9de-e406fb0bc07d tags:

 ``` python
 ```