new HTMLToolbox(input, config)
Creates an instance of HTMLToolbox. There can be multiple instances and each can be configured separately.
Parameters:
Name | Type | Description |
---|---|---|
input |
string | The input string containing the HTML document. |
config |
object | (optional) The configuration, which is optional. It can contain any of the list below:
|
- Source:
Methods
apply()
By default changes are applied only when it is necessary. setTag
and
setAttribute
are not applied until another loop is called or output is
needed. Normally there is no problem in this and everything is handled
automatically.
But in rare occasions, when you are setting tags and attributes outside a loop and you want to change a node you have already changed, then you may need to call this function manually. Keep in mind that it is the performance bottleneck, so call it only when necessary.
- Source:
getHTML(indent)
Returns the resulting HTML document.
Parameters:
Name | Type | Default | Description |
---|---|---|---|
indent |
string | (optional) The indentation unit. Enter an empty
string for a compressed HTML, or |
- Source:
getString()
Returns the rendered text. It is deprecated and will be removed. Use getText() instead.
- Source:
getText()
Returns the rendered text.
- Source:
insert(what, insert_at)
Inserts HTML in a location relative to the search query match region.
Parameters:
Name | Type | Description |
---|---|---|
what |
string | a string which will be parsed as HTML and inserted at the requested location. |
insert_at |
the relative position on insertion. If its value is set
to |
- Source:
printNodes(nodes)
(For debugging only)
Prints the current tree structure of the document.
Parameters:
Name | Type | Description |
---|---|---|
nodes |
Array | (optional) a list of the nodes to printed. |
- Source:
printTraverse(nodes)
(For debugging only)
Prints the current traversed structure of the document. This is different
from printNodes
, as it doesn't reflect the tree structure, and reflects
the sequence the traverse is done. It only prints node type
, str_index
and value
.
Parameters:
Name | Type | Description |
---|---|---|
nodes |
Array | (optional) a list of the nodes to traverse. |
- Source:
remove()
Removes the match.
let doc = "<!DOCTYPE html>this is tesss<b><b></b></b>sssst!</div>";
let htb = new HTMLToolbox(doc);
for(const val of htb.search(/te(s+)t/gm)) {
htb.remove();
}
console.log(htb.getHTML(null));
and the result would be:
<!DOCTYPE html>this is!
- Source:
removeAll(query)
A helper function to simplify removing all occurrences without implementing a loop.
For more details on the search query see HTMLToolbox#search
Parameters:
Name | Type | Description |
---|---|---|
query |
a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document. |
- Source:
replace(replace_with, replace_at)
Replaces the match with an HTML string.
let doc = "<!DOCTYPE html>this is a tesss<b><b></b></b>sssst!</div>";
let htb = new HTMLToolbox(doc);
for(const val of htb.search(/te(s+)t/gm)) {
htb.replace("<div>experiment</div>");
}
console.log(htb.getHTML(null));
and the result would be:
<!DOCTYPE html>this is a <div>experiment</div>!
Note that you can access the match data using the given value:
let doc = "<div>1 and 2 and 3 and 4</div>";
let htb = new HTMLToolbox(doc);
for(const val of htb.search(/(\d) and/gm)) {
const m = val.match;
htb.replace(`${m[1]} or`);
}
console.log(htb.getHTML(null));
and the result would be:
<!DOCTYPE html><div>1 or 2 or 3 or 4</div>
Parameters:
Name | Type | Description |
---|---|---|
replace_with |
string | the new HTML string to be used as the replacement. |
replace_at |
the relative position of replacement. If its value is set to
|
- Source:
replaceAll(query, replace_with, replace_at)
A helper function to simplify replacing all occurrences without implementing a loop.
Parameters:
Name | Type | Description |
---|---|---|
query |
a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document. For more details on the search query see HTMLToolbox#search |
|
replace_with |
string | the new HTML string to be used as the replacement. |
replace_at |
the relative position of replacement. If its value is set to
|
- Source:
(generator) search(query)
Searches for the given query and the results are yielded in a generator. You may process them in a loop, or by consecutive calls to .next() function.
let htb = new HTMLToolbox(doc);
for(const val of htb.search(/te(s+)t/gm)) {
htb.remove();
}
Inside the loop block you may call one of the modifier functions, which
include remove
, insert
, replace
, wrap
, and setTag
and setAttribute
. The
first 4 ones are all applied to the whole match, but setTag
and
setAttribute
are applied to specific nodes. See each function documentation
for more details.
The object that is returned by the generator has the following members:
match
: {index, 0, 1, ...} which is similar to js regex match results, index
is the location of the match in the rendered text, and the rest is an array.
start_node
: is the node of the tree that contains the beginning of the
match.
end_node
: is the node that contains the end of the match.
start_offset
: is the index of the character in the start node value which is the
beginning of the match.
end_offset
: is the index of the last character of the match in the end node.
Nodes
Each node has properties like attributes, name, rawName and type. The type may
be Tag
, Text
or Sep
. Text nodes are the leaves of the HTML tree. The Sep
nodes are the rendered separators. They are not modifiable. If you need a different
separator for a specific tag, change the separators
dictionary of the HTMLToolbox
instance you are working with.
Each node also has a body
that contains the separators and its children. It
also has a next
and prev
which point the next and previous rendered nodes.
You may also want to use its parent
to go up in the hierarchy.
Note that on some occasions both nodes might be the same.
Parameters:
Name | Type | Description |
---|---|---|
query |
a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document. |
- Source:
setAttribute(node, attr, value)
Changes the attribute value of the given node to the new one.
For more details on the nodes structure see HTMLToolbox#search
Parameters:
Name | Type | Description |
---|---|---|
node |
node | the target node, you can find it from the query match. |
attr |
string | the target attribute. |
value |
string | the new value. |
- Source:
setTag(node, tag)
Changes the tag of the given node to the new one.
For more details on the nodes structure see HTMLToolbox#search
Parameters:
Name | Type | Description |
---|---|---|
node |
node | the target node, you can find it from the query match. |
tag |
string | the new tag. |
- Source:
wrap(envelope)
Wraps the match inside an HTML string. Any HTML can be used as the envelope the placement of the match is given using a <!/> and it will be replaced by the match.
Note that at the moment the leaves of the tree will be wrapped. And neighbor nodes will be wrapped separately.
Also <!/> must be a valid HTML node, for example it can't be inside the tag definition.
let doc = "<div>1 and 2 and 3 and 4</div>";
let htb = new HTMLToolbox(doc);
for(const val of htb.search(/\d/gm)) {
htb.wrap(`<span>Number <!/></span>`);
}
console.log(htb.getHTML(null));
and the result would be:
<div><span>Number 1</span> and <span>Number 2</span> and <span>Number 3</span> and <span>Number 4</span></div>
Parameters:
Name | Type | Description |
---|---|---|
envelope |
string | the HTML string to wrap the query match. |
- Source:
wrapAll(query, envelope)
A helper function to simplify wrapping all occurrences without implementing a loop.
Parameters:
Name | Type | Description |
---|---|---|
query |
a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document. For more details on the search query see HTMLToolbox#search |
|
envelope |
string | the HTML string to wrap the query match. For more details on the envelope see HTMLToolbox#wrap |
- Source: