About
- Briefly, these are lists I make for myself. I use the "whitelisted" sentences in my own projects. I don't use the "blacklisted" sentences.
- I have sorted the "blacklisted" sentences into multiple lists, so other developers can make their own decisions on what sentences they would like to review for possible use on their own projects.
- In the past, as I proofread English sentences in the Tatoeba Corpus, I sorted them into lists.
- Recently, each week I've been proofreading the new English sentences that have been added. I read the ones by native speakers first, and if I have time, I read ones by non-native speakers, too.
I have been adding the ones I want to use to the #01 WHITE LIST, and have been ignoring others since it takes more time to sort the sentences I don't want to use into lists.
- If someone can suggest a more efficient way for me to eliminate the bad sentences and harvest the good sentences, please send me a message.
- The lists I am currently most likely to be adding to are prefixed with an exclamation mark followed by a number.
For example, ! #03 and ! #11.
Thoughts on an Effective Rating System
Use one-digit numbers 0 through 9, with 9 being the highest rating.
- 9 = Very likely to be good and natural-sounding.
- Sentences with audio
- [This level could possibly be automatically assigned.]
- 8 = Likely to be good and natural-sounding.
- Sentences tagged OK by a native speaker
- Sentences rated OK by at least one native speaker and not rated "not OK" by anyone else.
- [This level could possibly be automatically assigned.]
- 7 = Likely to be good and natural-sounding.
- Sentences by native speakers, but not rated and/or not tagged OK.
- [This level could possibly be automatically assigned.]
- 6 = [for future use, if needed]
- [for future use, if needed]
- 5 = Understandable and grammatically correct, but perhaps not what a native speaker would say.
- If a non-native speaker used this, the intended message would very likely be understood.
- [This would be a manually-chosen level by members]
- 4 = Unsure, but perhaps correct.
- [This would be a manually-chosen level by members]
- 3 = Unknown. It would be safest to assume that these may not be good.
- Sentences by non-native speakers that are not rated and/or not tagged OK.
- [This level could possibly be automatically assigned.]
- 2 - Unsure, but perhaps wrong.
- [This would be a manually-chosen level by members]
- 1 = Definitely Bad
- [This would be a manually-chosen level by members]
- 0 = Ignored. I've read it, but I don't want to rate it now.
- I don't want to look at this sentence again until I've reviewed all the others.
- [This would be a manually-chosen level by members]