improve lexical warmup and standardize stopword pipeline
This commit is contained in:
15
modules/story-summary/vector/utils/stopwords-data/SOURCES.md
Normal file
15
modules/story-summary/vector/utils/stopwords-data/SOURCES.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# stopwords sources for story-summary
|
||||
|
||||
- Dataset: `stopwords-iso` (npm package, version 1.1.0)
|
||||
- Repository: https://github.com/stopwords-iso/stopwords-iso
|
||||
- License: MIT
|
||||
- Snapshot date: 2026-02-16
|
||||
- Languages used: `zh`, `ja`, `en`
|
||||
- Local snapshot files:
|
||||
- `stopwords-iso.zh.txt`
|
||||
- `stopwords-iso.ja.txt`
|
||||
- `stopwords-iso.en.txt`
|
||||
|
||||
Generation note:
|
||||
- `modules/story-summary/vector/utils/stopwords-base.js` is generated from these snapshot files.
|
||||
- Keep `stopwords-patch.js` for tiny domain overrides only.
|
||||
Reference in New Issue
Block a user