Class: HTMLToolbox

HTMLToolbox(input, config)

new HTMLToolbox(input, config)

Creates an instance of HTMLToolbox. There can be multiple instances and each can be configured separately.

Parameters:
Name Type Description
input string

The input string containing the HTML document.

config object

(optional) The configuration, which is optional. It can contain any of the list below:

separators: a dictionary of defined tags and their rendered separators, i.e. " for q and \n for div. see the source code for more examples.

lineBreakTags: a list of line break tags, it contains br and hr by default.

noCloseTags: these are the tags that don't contain children, like img or br.

rawTags: these are the tags that are not HTML encoded, like scripts.

defaultSeparator: this is the default tag separator, which is a blank space by default.

deleteEmpty: if set, the empty elements in the remove or replace region will be deleted completely. it is true by default.

convertNbspToSpace: this is for the cases when you want to treat nbsp entities as normal spaces. it is false by default.

processInputValues: If set to false, input field values will not be rendered. it is set to true by default which means input values will be rendered as text.

Source:

Methods

apply()

By default changes are applied only when it is necessary. setTag and setAttribute are not applied until another loop is called or output is needed. Normally there is no problem in this and everything is handled automatically.

But in rare occasions, when you are setting tags and attributes outside a loop and you want to change a node you have already changed, then you may need to call this function manually. Keep in mind that it is the performance bottleneck, so call it only when necessary.

Source:

getHTML(indent)

Returns the resulting HTML document.

Parameters:
Name Type Default Description
indent string

(optional) The indentation unit. Enter an empty string for a compressed HTML, or null for the initial indentation. The default value is \t.

Source:

getString()

Returns the rendered text. It is deprecated and will be removed. Use getText() instead.

Source:

getText()

Returns the rendered text.

Source:

insert(what, insert_at)

Inserts HTML in a location relative to the search query match region.

Parameters:
Name Type Description
what string

a string which will be parsed as HTML and inserted at the requested location.

insert_at

the relative position on insertion. If its value is set to HTMLToolbox.BEGIN then the HTML will be inserted right before the match point start_node at start_offset. If set to HTMLToolbox.END it will be inserted right after the end_node at end_offset. Other values not supported.

Source:

printNodes(nodes)

(For debugging only)

Prints the current tree structure of the document.

Parameters:
Name Type Description
nodes Array

(optional) a list of the nodes to printed.

Source:

printTraverse(nodes)

(For debugging only)

Prints the current traversed structure of the document. This is different from printNodes, as it doesn't reflect the tree structure, and reflects the sequence the traverse is done. It only prints node type, str_index and value.

Parameters:
Name Type Description
nodes Array

(optional) a list of the nodes to traverse.

Source:

remove()

Removes the match.

let doc = "<!DOCTYPE html>this is tesss<b><b></b></b>sssst!</div>";
let htb = new HTMLToolbox(doc);

for(const val of htb.search(/te(s+)t/gm)) {
	htb.remove();
}

console.log(htb.getHTML(null));

and the result would be:

<!DOCTYPE html>this is!
Source:

removeAll(query)

A helper function to simplify removing all occurrences without implementing a loop.

For more details on the search query see HTMLToolbox#search

Parameters:
Name Type Description
query

a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document.

Source:

replace(replace_with, replace_at)

Replaces the match with an HTML string.

let doc = "<!DOCTYPE html>this is a tesss<b><b></b></b>sssst!</div>";
let htb = new HTMLToolbox(doc);

for(const val of htb.search(/te(s+)t/gm)) {
	htb.replace("<div>experiment</div>");
}

console.log(htb.getHTML(null));

and the result would be:

<!DOCTYPE html>this is a <div>experiment</div>!

Note that you can access the match data using the given value:

let doc = "<div>1 and 2 and 3 and 4</div>";
let htb = new HTMLToolbox(doc);

for(const val of htb.search(/(\d) and/gm)) {
	const m = val.match;
	htb.replace(`${m[1]} or`);
}

console.log(htb.getHTML(null));

and the result would be:

<!DOCTYPE html><div>1 or 2 or 3 or 4</div>
Parameters:
Name Type Description
replace_with string

the new HTML string to be used as the replacement.

replace_at

the relative position of replacement. If its value is set to HTMLToolbox.BEGIN then the HTML will be inserted right before the match point start_node at start_offset. If set to HTMLToolbox.END it will be inserted right after the end_node at end_offset. Other values not supported.

Source:

replaceAll(query, replace_with, replace_at)

A helper function to simplify replacing all occurrences without implementing a loop.

Parameters:
Name Type Description
query

a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document.

For more details on the search query see HTMLToolbox#search

replace_with string

the new HTML string to be used as the replacement.

replace_at

the relative position of replacement. If its value is set to HTMLToolbox.BEGIN then the HTML will be inserted right before the match point start_node at start_offset. If set to HTMLToolbox.END it will be inserted right after the end_node at end_offset. Other values not supported.

Source:

Searches for the given query and the results are yielded in a generator. You may process them in a loop, or by consecutive calls to .next() function.

let htb = new HTMLToolbox(doc);

for(const val of htb.search(/te(s+)t/gm)) {
	htb.remove();
}

Inside the loop block you may call one of the modifier functions, which include remove, insert, replace, wrap, and setTag and setAttribute. The first 4 ones are all applied to the whole match, but setTag and setAttribute are applied to specific nodes. See each function documentation for more details.

The object that is returned by the generator has the following members:

match: {index, 0, 1, ...} which is similar to js regex match results, index is the location of the match in the rendered text, and the rest is an array.

start_node: is the node of the tree that contains the beginning of the match.

end_node: is the node that contains the end of the match.

start_offset: is the index of the character in the start node value which is the beginning of the match.

end_offset: is the index of the last character of the match in the end node.

Nodes

Each node has properties like attributes, name, rawName and type. The type may be Tag, Text or Sep. Text nodes are the leaves of the HTML tree. The Sep nodes are the rendered separators. They are not modifiable. If you need a different separator for a specific tag, change the separators dictionary of the HTMLToolbox instance you are working with.

Each node also has a body that contains the separators and its children. It also has a next and prev which point the next and previous rendered nodes. You may also want to use its parent to go up in the hierarchy.

Note that on some occasions both nodes might be the same.

Parameters:
Name Type Description
query

a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document.

Source:

setAttribute(node, attr, value)

Changes the attribute value of the given node to the new one.

For more details on the nodes structure see HTMLToolbox#search

Parameters:
Name Type Description
node node

the target node, you can find it from the query match.

attr string

the target attribute.

value string

the new value.

Source:

setTag(node, tag)

Changes the tag of the given node to the new one.

For more details on the nodes structure see HTMLToolbox#search

Parameters:
Name Type Description
node node

the target node, you can find it from the query match.

tag string

the new tag.

Source:

wrap(envelope)

Wraps the match inside an HTML string. Any HTML can be used as the envelope the placement of the match is given using a <!/> and it will be replaced by the match.

Note that at the moment the leaves of the tree will be wrapped. And neighbor nodes will be wrapped separately.

Also <!/> must be a valid HTML node, for example it can't be inside the tag definition.

let doc = "<div>1 and 2 and 3 and 4</div>";
let htb = new HTMLToolbox(doc);

for(const val of htb.search(/\d/gm)) {
	htb.wrap(`<span>Number <!/></span>`);
}

console.log(htb.getHTML(null));

and the result would be:

<div><span>Number 1</span> and <span>Number  2</span> and <span>Number  3</span> and <span>Number  4</span></div>
Parameters:
Name Type Description
envelope string

the HTML string to wrap the query match.

Source:

wrapAll(query, envelope)

A helper function to simplify wrapping all occurrences without implementing a loop.

Parameters:
Name Type Description
query

a string or regexp to search for in the rendered text of the document. If not provided, search will traverse over all the document.

For more details on the search query see HTMLToolbox#search

envelope string

the HTML string to wrap the query match.

For more details on the envelope see HTMLToolbox#wrap

Source: