![]() |
Code Documentation 3.4
Social Network Visualizer
|
Defines a class for network file loading and parsing. More...
#include <parser.h>


Classes | |
| struct | ParseConfig |
| ParseConfig boundary - the immutable config object. More... | |
Signals | |
| void | finished (QString) |
Public Member Functions | |
| Parser () | |
| ~Parser () | |
| void | setParseSink (SocNetV::IO::IGraphParseSink *sink) |
| void | setOwnedParseSink (std::unique_ptr< SocNetV::IO::IGraphParseSink > sink) |
| void | load (const QString &fileName, const QString &codecName, const int &defNodeSize, const QString &defNodeColor, const QString &defNodeShape, const QString &defNodeNumberColor, const int &defNodeNumberSize, const QString &defNodeLabelColor, const int &defNodeLabelSize, const QString &defEdgeColor, const int &canvasWidth, const int &canvasHeight, const int &format, const QString &delim=QString(), const int &sm_mode=1, const bool &sm_has_labels=false) |
| Loads the data of the given network file, and calls the relevant method to parse it. | |
| bool | parseAsPajek (const QByteArray &rawData) |
| Parse a Pajek-formatted network from raw bytes. | |
| bool | parseAsAdjacency (const QByteArray &rawData, const ParseConfig &cfg, const QString &delimiter) |
| bool | parseAsDot (const QByteArray &rawData) |
| Parses the data as GraphViz (DOT) formatted network. | |
| QString | preprocessDotContent (const QString &dotContent) |
| Preprocesses the content of a DOT file to normalize its formatting, improve parsing and readability. | |
| bool | parseAsGraphML (const QByteArray &rawData) |
| Parses the data as GraphML (not GML) formatted network. | |
| bool | parseAsGML (const QByteArray &rawData) |
| Parses the data as GML formatted network. | |
| bool | parseAsDL (const QByteArray &rawData) |
| Parses the given raw data as DL formatted (UCINET) data. | |
| bool | parseAsEdgeListSimple (const QByteArray &rawData, const QString &delimiter) |
| Parses the data as simple edgelist formatted. | |
| bool | parseAsEdgeListWeighted (const QByteArray &rawData, const QString &delimiter) |
| Parses the data as weighted edgelist formatted network. | |
| bool | parseAsTwoModeSociomatrix (const QByteArray &rawData) |
| Parses a two-mode (bipartite) sociomatrix file (.2sm / .aff). | |
| bool | readDLKeywords (QStringList &strList, int &N, int &NM, int &NR, int &NC, bool &fullmatrixFormat, bool &edgelist1Format, bool &diagonalPresent) |
| Reads and parses DL keywords from a given QStringList. | |
| void | readDotProperties (QString str, qreal &, QString &label, QString &shape, QString &color, QString &fontName, QString &fontColor) |
| Reads the properties of a dot element with improved handling of quoted values. | |
| bool | readGraphML (QXmlStreamReader &) |
| Checks the xml token name and calls the appropriate function. | |
| void | readGraphMLElementGraph (QXmlStreamReader &) |
| Reads a graph definition. | |
| void | readGraphMLElementNode (QXmlStreamReader &) |
| Reads basic node attributes and sets the nodeNumber. | |
| void | endGraphMLElementNode (QXmlStreamReader &) |
| Signals to create a new node. | |
| void | readGraphMLElementEdge (QXmlStreamAttributes &) |
| Reads basic edge creation properties. | |
| void | endGraphMLElementEdge (QXmlStreamReader &) |
| Signals for a new edge to be created/added. | |
| void | readGraphMLElementData (QXmlStreamReader &) |
| Reads data for edges and nodes. | |
| void | readGraphMLElementUnknown (QXmlStreamReader &) |
| Trivial call for unknown elements. | |
| void | readGraphMLElementKey (QXmlStreamAttributes &) |
| Reads a key definition. | |
| void | readGraphMLElementDefaultValue (QXmlStreamReader &) |
| Reads default key values. | |
| void | readGraphMLElementNodeGraphics (QXmlStreamReader &) |
| Reads node graphics data and properties: label, color, shape, size, coordinates, etc. | |
| void | readGraphMLElementEdgeGraphics (QXmlStreamReader &) |
| Reads edge graphics data and properties: path, linestyle,width, arrows, etc. | |
| void | createMissingNodeEdges () |
| Creates any missing node edges. | |
| bool | isComment (QString str) |
| Helper. Checks if the string parameter is a comment (starts with a known char, i.e #). | |
| void | createRandomNodes (const int &fixedNum=1, const QString &label=QString(), const int &newNodes=1) |
| Signals to create either a single new node (numbered fixedNum) or multiple new nodes (numbered from 1 to to newNodes) | |
Static Public Member Functions | |
| static QString | normalizeQuotedIdentifier (const QString &s) |
| Normalizes a quoted identifier from external network formats. | |
Private Member Functions | |
| bool | validateAndInitialize (const QByteArray &rawData, const QString &delimiter, const bool &sm_has_labels, QStringList &nodeLabels) |
| void | resetCounters () |
| bool | doParseAdjacency (QTextStream &ts, const QString &delimiter, const QStringList &nodeLabels) |
| void | createNodeWithDefaults (int nodeIndex, const QString &label) |
| bool | createEdgesForRow (const QStringList ¤tRow, int rowIndex) |
| bool | containsReservedKeywords (const QString &str) const |
Defines a class for network file loading and parsing.
Supports GraphML, Pajek, Adjacency, Graphviz, UCINET, EdgeLists etc
| Parser::Parser | ( | ) |
| Parser::~Parser | ( | ) |
|
private |
Checks if the given string contains any reserved keywords. Reserved keywords suggest the file is not adjacency-formatted but in another graph format. Parsing is aborted if a reserved keyword is found.
| str | The string to check for keywords. |
|
private |
Iterates through a row of the adjacency matrix to create edges. Emits a signal for each non-zero weight to create an edge between nodes. Parsing is aborted immediately if any invalid data is encountered.
| currentRow | The adjacency matrix row being processed. |
| rowIndex | The index of the row (source node for edges). |
| void Parser::createMissingNodeEdges | ( | ) |
Creates any missing node edges.
|
private |
Emits a signal to create a node with the specified index and label, and default node properties. Assigns a random position for the node within the graph dimensions.
| nodeIndex | Index of the node to create. |
| label | Label for the node (numerical or custom). |
| void Parser::createRandomNodes | ( | const int & | fixedNum = 1, |
| const QString & | label = QString(), |
||
| const int & | newNodes = 1 |
||
| ) |
Signals to create either a single new node (numbered fixedNum) or multiple new nodes (numbered from 1 to to newNodes)
| fixedNum | |
| label | |
| newNodes |
|
private |
Processes the adjacency matrix file to create nodes and edges. Reads each line of the matrix, creates nodes for the first row, and creates edges for subsequent rows. Uses nodeLabels to assign labels to nodes if provided. Parsing is aborted immediately if any issue is encountered.
| ts | QTextStream of the decoded adjacency matrix file. |
| delimiter | Delimiter used to split rows and columns. |
| nodeLabels | List of node labels (optional). If empty, numeric labels are used. |
| void Parser::endGraphMLElementEdge | ( | QXmlStreamReader & | xml | ) |
Signals for a new edge to be created/added.
Called at the end of edge element
| xml |
| void Parser::endGraphMLElementNode | ( | QXmlStreamReader & | xml | ) |
Signals to create a new node.
called at the end of a node element
| xml |
|
signal |
| SOCNETV_USE_NAMESPACE bool Parser::isComment | ( | QString | str | ) |
Helper. Checks if the string parameter is a comment (starts with a known char, i.e #).
| str |
| void Parser::load | ( | const QString & | fileName, |
| const QString & | codecName, | ||
| const int & | defNodeSize, | ||
| const QString & | defNodeColor, | ||
| const QString & | defNodeShape, | ||
| const QString & | defNodeNumberColor, | ||
| const int & | defNodeNumberSize, | ||
| const QString & | defNodeLabelColor, | ||
| const int & | defNodeLabelSize, | ||
| const QString & | defEdgeColor, | ||
| const int & | canvasWidth, | ||
| const int & | canvasHeight, | ||
| const int & | format, | ||
| const QString & | delim = QString(), |
||
| const int & | sm_mode = 1, |
||
| const bool & | sm_has_labels = false |
||
| ) |
Loads the data of the given network file, and calls the relevant method to parse it.
| fileName | |
| codecName | |
| defNodeSize | |
| defNodeColor | |
| defNodeShape | |
| defNodeNumberColor | |
| defNodeNumberSize | |
| defNodeLabelColor | |
| defNodeLabelSize | |
| defEdgeColor | |
| width | |
| height | |
| format | |
| sm_mode | |
| delim |
|
static |
Normalizes a quoted identifier from external network formats.
Some formats (e.g. Pajek) use quotes in headers as syntactic delimiters, such as:
Quotes are part of the file syntax and must not become part of the internal relation name. This function:
It does NOT collapse internal whitespace.
| s | Raw identifier substring extracted from the file. |
| SOCNETV_USE_NAMESPACE bool Parser::parseAsAdjacency | ( | const QByteArray & | rawData, |
| const ParseConfig & | cfg, | ||
| const QString & | delimiter | ||
| ) |
Main function to parse adjacency-formatted data.
Validates the format, resets internal counters, and processes the file to create nodes and edges from an adjacency matrix representation.
If cfg.sm_has_labels is true, the first comment line is treated as node labels.
NOTE: Parsing is aborted if any invalid data is encountered.
Example of a supported adjacency matrix file with node labels:
In this example:
| rawData | Raw input data as QByteArray. |
| cfg | Parser configuration (contains format flags and defaults, including sm_has_labels). |
| delimiter | Delimiter used to split rows and columns. |
| SOCNETV_USE_NAMESPACE bool Parser::parseAsDL | ( | const QByteArray & | rawData | ) |
Parses the given raw data as DL formatted (UCINET) data.
This function reads and interprets a DL formatted file, which is a format used by UCINET. It processes the file line by line, extracting relevant information such as node labels, edge weights, and network properties. The function supports both fullmatrix and edgelist1 formats, and can handle two-mode networks.
| rawData | The raw data to be parsed, provided as a QByteArray. |
The function performs the following steps:
| SOCNETV_USE_NAMESPACE bool Parser::parseAsDot | ( | const QByteArray & | rawData | ) |
Parses the data as GraphViz (DOT) formatted network.
IMPORTANT CONTRACTS / INVARIANTS
Previous bug:
Fix:
| rawData | Raw file bytes. |
| bool Parser::parseAsEdgeListSimple | ( | const QByteArray & | rawData, |
| const QString & | delimiter | ||
| ) |
Parses the data as simple edgelist formatted.
| rawData | |
| delimiter |
| bool Parser::parseAsEdgeListWeighted | ( | const QByteArray & | rawData, |
| const QString & | delimiter | ||
| ) |
Parses the data as weighted edgelist formatted network.
This method can read and parse edgelist formated files where edge source and target are either named with numbers or with labels That is the following formats can be parsed:
1 2 1 1 3 2 1 6 2 1 8 2 ...
actor1 actor2 1 actor2 actor4 2 actor1 actor3 1 actorX actorY 3 name othername 1 othername somename 2 ...
| rawData | |
| delimiter |
| SOCNETV_USE_NAMESPACE bool Parser::parseAsGML | ( | const QByteArray & | rawData | ) |
Parses the data as GML formatted network.
This parser is line/state based. Many GML files are "compact" and place multiple attributes on the same line, e.g.:
node [ id 1 label "1" ]
while others use the expanded form:
node [ id 1 label "1" ]
To support both forms, we preprocess the decoded input into a normalized stream where:
We then run the existing state machine unchanged (but now it receives one attribute per line).
Also accepts both "weight" and "value" as edge weight keys.
| rawData | Raw file bytes. |
Returns true if ch is whitespace (space/tab/etc).
Tokenizes a line into tokens while respecting quoted strings.
Example: id 1 label "John Doe" becomes tokens: ["id","1","label","\"John Doe\""]
Quotes are preserved as part of the token.
Normalizes GML text for a line-based parser.
1) Makes '[' and ']' standalone lines (outside quoted strings). 2) Splits compact attribute runs into "key value" lines using a known-key set.
This is intentionally conservative: we only split on keys we know how to parse. Unknown tokens are left as-is.
| SOCNETV_USE_NAMESPACE bool Parser::parseAsGraphML | ( | const QByteArray & | rawData | ) |
Parses the data as GraphML (not GML) formatted network.
| rawData |
| bool Parser::parseAsPajek | ( | const QByteArray & | rawData | ) |
Parse a Pajek-formatted network from raw bytes.
Supported constructs include (depending on the file contents):
*Network header*Vertices N (node definitions with optional attributes)*Arcs, *Edges*Matrix :k*Matrix :k "Label"*Matrix k: "Label"*Matrix :k / *Matrix k:Behavior:
errorMessage and returns false.| rawData | Entire file contents (as read from disk). |
SPLIT EACH LINE (ON EMPTY SPACE CHARACTERS) IN SEVERAL ELEMENTS
READING NODES, THEN EDGES/ARCS
NODELABEL
NODESHAPE: There are five possible .
NODECOLORS
READ NODE COORDINATES
EDGES
ARCS
ARCSlist
matrix
| bool Parser::parseAsTwoModeSociomatrix | ( | const QByteArray & | rawData | ) |
Parses a two-mode (bipartite) sociomatrix file (.2sm / .aff).
The file contains a rectangular NR × NC binary (or weighted) matrix where rows = Mode 1 actors (persons, CEOs, …) and columns = Mode 2 actors (events, clubs, …).
Behaviour is controlled by two_sm_mode (set from ParseConfig::sm_mode, default = 1):
two_sm_mode == 1 → Bipartite graph (default) Creates NR + NC nodes and one undirected edge per non-zero cell. Mode 1 nodes: numbers 1..NR, labels "p1".."pNR", initNodeColor / initNodeShape Mode 2 nodes: numbers NR+1..NR+NC, labels "e1".."eNC", "SkyBlue" / "diamond"
two_sm_mode == 2 → Person (Mode-1) projection [B × Bᵀ] Creates NR nodes. Connects person i and person k with an undirected edge whenever they share at least one event (co-membership).
two_sm_mode == 3 → Event (Mode-2) projection [Bᵀ × B] Creates NC nodes. Connects event j and event l with an undirected edge whenever they share at least one person.
| rawData | Raw bytes of the file. |
| QString Parser::preprocessDotContent | ( | const QString & | dotContent | ) |
Preprocesses the content of a DOT file to normalize its formatting, improve parsing and readability.
This function performs several transformations on the input DOT content:
{ and before closing braces }.] and semicolon ;.; between consecutive node definitions.-> and --).| dotContent | The original content of the DOT file as a QString. |
| bool Parser::readDLKeywords | ( | QStringList & | strList, |
| int & | N, | ||
| int & | NM, | ||
| int & | NR, | ||
| int & | NC, | ||
| bool & | fullmatrixFormat, | ||
| bool & | edgelist1Format, | ||
| bool & | diagonalPresent | ||
| ) |
Reads and parses DL keywords from a given QStringList.
This function processes a list of strings to extract and interpret DL keywords. It updates the provided references with the parsed values.
| strList | A reference to a QStringList containing the DL keywords. |
| N | A reference to an integer to store the parsed value of 'N'. |
| NM | A reference to an integer to store the parsed value of 'NM'. |
| NR | A reference to an integer to store the parsed value of 'NR'. |
| NC | A reference to an integer to store the parsed value of 'NC'. |
| fullmatrixFormat | A reference to a boolean to indicate if the format is 'FULLMATRIX'. |
| edgelist1Format | A reference to a boolean to indicate if the format is 'edgelist'. |
| void Parser::readDotProperties | ( | QString | str, |
| qreal & | nValue, | ||
| QString & | label, | ||
| QString & | shape, | ||
| QString & | color, | ||
| QString & | fontName, | ||
| QString & | fontColor | ||
| ) |
Reads the properties of a dot element with improved handling of quoted values.
| str | String containing properties (format: "prop1=value1, prop2=value2, ...") |
| nValue | Output variable for numeric value property |
| label | Output variable for label property |
| shape | Output variable for shape property |
| color | Output variable for color property |
| fontName | Output variable for font name property |
| fontColor | Output variable for font color property |
| bool Parser::readGraphML | ( | QXmlStreamReader & | xml | ) |
Checks the xml token name and calls the appropriate function.
| xml |
| void Parser::readGraphMLElementData | ( | QXmlStreamReader & | xml | ) |
Reads data for edges and nodes.
called at a data element (usually nested inside a node or an edge element)
| xml |
| void Parser::readGraphMLElementDefaultValue | ( | QXmlStreamReader & | xml | ) |
Reads default key values.
Called at a default element (usually nested inside key element)
| xml |
| void Parser::readGraphMLElementEdge | ( | QXmlStreamAttributes & | xmlStreamAttr | ) |
Reads basic edge creation properties.
called at the start of an edge element
| xmlStreamAttr |
| void Parser::readGraphMLElementEdgeGraphics | ( | QXmlStreamReader & | xml | ) |
Reads edge graphics data and properties: path, linestyle,width, arrows, etc.
| xml |
| void Parser::readGraphMLElementGraph | ( | QXmlStreamReader & | xml | ) |
| void Parser::readGraphMLElementKey | ( | QXmlStreamAttributes & | xmlStreamAttr | ) |
Reads a key definition.
called at key element
| xmlStreamAttr |
| void Parser::readGraphMLElementNode | ( | QXmlStreamReader & | xml | ) |
Reads basic node attributes and sets the nodeNumber.
called at the start of a node element
| xml |
| void Parser::readGraphMLElementNodeGraphics | ( | QXmlStreamReader & | xml | ) |
Reads node graphics data and properties: label, color, shape, size, coordinates, etc.
| xml |
| void Parser::readGraphMLElementUnknown | ( | QXmlStreamReader & | xml | ) |
Trivial call for unknown elements.
| xml |
|
private |
Resets counters and data structures used during parsing. Clears relations and resets node and edge counters to ensure a clean state.
| void Parser::setOwnedParseSink | ( | std::unique_ptr< SocNetV::IO::IGraphParseSink > | sink | ) |
| void Parser::setParseSink | ( | SocNetV::IO::IGraphParseSink * | sink | ) |
|
private |
Validates the adjacency matrix file format and, optionally, gets node labels from first line (if it is a comment line). Checks for reserved keywords, row consistency, and appropriate delimiters in the first 11 rows. Parsing is aborted immediately if any issue is encountered.
| rawData | Raw input data as QByteArray. |
| delimiter | Delimiter used to split rows and columns. |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |
|
private |