Related tools
Why use a keyword extractor?
Skim repeated terms, draft tag ideas, or compare two pastes—without shipping your copy to a server.
Benefits
- Ranked list: see what repeats most as unigrams.
- Controls: top N, minimum length, stop-word toggle.
- Phrases: optional 2–5 word n-grams.
- Export: quick .txt of keyword tokens.
- Private: client-side only.
How it works
Naive bag-of-words and sliding windows—good for exploration, not a substitute for SEO suites or linguistics tools.
What the code does
- Normalize: lowercase; non-\w to spaces; split on whitespace.
- Unigrams: count tokens passing min length; optional English stop list.
- Sort & cap: descending count; keep top N (≤ unique available).
- N-grams: same stream, contiguous n-word windows; rank by count.
- Export: keywords only, newline separated.
When to use
Blog outlines, student summaries, light content QA, and quick “what did I overuse?” checks.
Ideal use cases
- Editing: spot overused words.
- Drafting: phrase echoes via n-grams.
- Teaching: demonstrate tokenization limits.
- Privacy: air-gapped pastes.
- Prep: before specialized NLP.
Facts
Interpretation depends on token rules and language.
Key points
- Stop-word list is English and fixed in code.
- N-gram ranking ignores min-length and stop-word settings used for unigrams.
- High frequency is not the same as topical importance or search intent.
- Very large pastes may hit browser memory limits.
- \w includes letters, digits, and underscore in ECMAScript.
Best practices
Cross-check with your editorial or SEO workflow.
Quality tips
- Clean markup to plain text first for fair counts.
- Try several min-length values to reduce noise.
- Pair with readability or corpus tools for serious analysis.
- Do not treat export lists as finalized keyword strategy.
- For code snippets, identifiers may dominate tokens.
When not to rely on it
- Multilingual stop-word lists or lemmatization requirements.
- Legal, medical, or compliance-grade keyword reporting.
- Exact parity with a specific publisher’s keyword specification.
Limitations and compatibility
English-oriented stop words; heuristic tokenization; requires JavaScript.
Keyword extraction runs fully in your browser with no server upload; keyword rankings and phrase lists update instantly as filters change.
Frequently asked questions
Is this free and private?
Yes. Everything runs in your browser; nothing is uploaded for extraction.
What are stop words here?
A fixed small English list of common words you can filter out so unigrams skew toward content words. It is not customizable in the UI.
Do n-grams use stop-word removal?
No. N-grams are built from all non-empty normalized tokens; only the unigram list uses the stop-word and min-length options.
What does export contain?
Only the visible keyword tokens (one per line). Counts and n-grams are not included in the file.
Will this match Google keyword volume?
No. This is a naive frequency view of your pasted text, not a search-volume or ranking tool.
Does it work for non-English text?
Tokenization follows JavaScript \w rules; stop-word filtering is English-oriented. Results may be less meaningful for other languages.