Welcome to Talewind! This project aims to take text as input, and output a highlighted version of the text, similar to plagiarism-checkers. Instead, our project will highlight any biases that it sees in it. This is done on a sentence-by-sentence basis, and is rendered through displaCy's beautiful span visualizer. We currently support 7 varying types of media bias, listed as follows:
"Synthesize a dataset in plain text delimited by ';'. The dataset's header's are as follows: Sentence, Content Bias, Partisan Bias, False balance, Ventriloquism, Demographic bias, Undue Weight, Corporate
Fill the data by creating sentences from news media, articles and journalist writing pieces and sentences written inside articles. Generate long and short sentences and significantly vary the length of the sentences. Then fill the data with 1 or 0 in the following columns based on if the sentence has that bias present or not. Keep variance high in data. Also include sentences which have no bias at all. Do not use code block to generate the data. Generate as much data as possible."
The FFNN used in this model is relatively simple and a summary() dump can be found below.
Layer (type) | Output Shape | Param # |
---|---|---|
dense_4 (Dense) | (None, 384) | 147840 |
dropout_2 (Dropout) | (None, 384) | 0 |
dense_5 (Dense) | (None, 128) | 49280 |
dropout_3 (Dropout) | (None, 128) | 0 |
dense_7 (Dense) | (None, 32) | 4128 |
dropout_7 (Dropout) | (None, 384) | 231 |
Category: | Value |
---|---|
Total params: | 201,479 |
Trainable params: | 201,479 |
Non-trainable params: | 0 |
The Loss, Accuracy, Precision and Recall charts are shown below. The input data for these was the 384 dimensional embeddings for each sentence, as it is right now in production as well. As for how it works in the real world, We encourage you to try out our demo website. Click on "Home" to get started.
Epoch Accuracy
Epoch Precision
Epoch Recall
Epoch Loss