Which Sentences I Use and Which I Do Not Use

Short, easy-to-remember URL to this page
http://bit.ly/tatoebafiltering

I Use These ("Whitelisted")

** A Select Group of Sentences from the #01 WHITE LIST

1. Sentences I have chosen to use.

Others developers who are using English sentences from the Tatoeba Project may wish to start with this list as your default list of sentences to use. You can grab all the sentences numbers for list 907 from this file (sentences_in_lists.tar.bz2) which is exported every week a little after 9:00 GMT.

I Do Not Use These ("Blacklisted")

In addition to sentences on these sorted lists, I do not use any sentence that is not on my WHITE LIST (list 907).

2. Sentences I have chosen not to use, but possibly will use in the future for another project.

These should probably all be proofread again by any developer who wants to use these.

3. Sentences I have chosen not to use, at least for now, but may want to review again in the future.

I'm unlikely to actually review these again, since it's a more efficient use of my time to just contribute good English sentences. However, perhaps other developers might want to look at these sentences. I put sentences on various lists to help save you time, since you can prioritize the order in which you want to review these.

4. Sentences I have just ignored, since I'm unlikely to ever use them.

5. Sentences I am very unlikely to ever use.

6. Sentences I am very, very unlikely to ever use.

About

  • Briefly, these are lists I make for myself. I use the "whitelisted" sentences in my own projects. I don't use the "blacklisted" sentences.
  • I have sorted the "blacklisted" sentences into multiple lists, so other developers can make their own decisions on what sentences they would like to review for possible use on their own projects.
  • In the past, as I proofread English sentences in the Tatoeba Corpus, I sorted them into lists.
  • Recently, each week I've been proofreading the new English sentences that have been added. I read the ones by native speakers first, and if I have time, I read ones by non-native speakers, too. I have been adding the ones I want to use to the #01 WHITE LIST, and have been ignoring others since it takes more time to sort the sentences I don't want to use into lists.
  • If someone can suggest a more efficient way for me to eliminate the bad sentences and harvest the good sentences, please send me a message.
  • The lists I am currently most likely to be adding to are prefixed with an exclamation mark followed by a number.
    For example, ! #03 and ! #11.

Thoughts on an Effective Rating System

Use one-digit numbers 0 through 9, with 9 being the highest rating.

9 = Very likely to be good and natural-sounding.
Sentences with audio
[This level could possibly be automatically assigned.]
8 = Likely to be good and natural-sounding.
Sentences tagged OK by a native speaker
Sentences rated OK by at least one native speaker and not rated "not OK" by anyone else.
[This level could possibly be automatically assigned.]
7 = Likely to be good and natural-sounding.
Sentences by native speakers, but not rated and/or not tagged OK.
[This level could possibly be automatically assigned.]
6 = [for future use, if needed]
[for future use, if needed]
5 = Understandable and grammatically correct, but perhaps not what a native speaker would say.
If a non-native speaker used this, the intended message would very likely be understood.
[This would be a manually-chosen level by members]
4 = Unsure, but perhaps correct.
[This would be a manually-chosen level by members]
3 = Unknown. It would be safest to assume that these may not be good.
Sentences by non-native speakers that are not rated and/or not tagged OK.
[This level could possibly be automatically assigned.]
2 - Unsure, but perhaps wrong.
[This would be a manually-chosen level by members]
1 = Definitely Bad
[This would be a manually-chosen level by members]
0 = Ignored. I've read it, but I don't want to rate it now.
I don't want to look at this sentence again until I've reviewed all the others.
[This would be a manually-chosen level by members]