Class: Legion::Extensions::MicrosoftTeams::LocalCache::RecordParser
- Inherits:
-
Object
- Object
- Legion::Extensions::MicrosoftTeams::LocalCache::RecordParser
- Defined in:
- lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb
Overview
Parses Chromium IndexedDB values from Teams LevelDB records. Values use 0x22 (double-quote) as a string marker followed by varint length. Teams stores conversation objects as sequential key-value string pairs.
Gotchas:
- Boolean fields (isSanitized, isModerator, etc.) have non-string values
that get skipped by the string extractor, causing the next field name
to appear immediately.
- HTML content strings get split on internal 0x22 bytes (from HTML attributes
like href="..."), producing multiple string fragments for one content field.
- Field names are well-known and can be used to detect key vs value.
Constant Summary collapse
- KNOWN_FIELDS =
Known field names in Teams conversation records. Used to distinguish field names from field values in the string stream.
Set.new(%w[ id source version type content contentHash isSanitized messagetype messageType contenttype contentType activitytype activityType clientmessageid clientMessageId sequenceId prioritizeimdisplayname prioritizeImDisplayName imdisplayname fromDisplayNameInToken fromFamilyNameInToken fromGivenNameInToken fromAgentIdentityBlueprintId properties mentions cards importance subject title links files formatVariant languageStamp draftDetails innerThreadId state inlineImages callId composetime composeTime originalarrivaltime originalArrivalTime from fromUserId conversationLink skypeguid translation deletionInfo annotationsSummary threadtype threadType postType dlpData crossPostData callLogsOwnerId sendPipelineStatus streamingMetadata originalParentMessageId skypeeditedid importMetadata recipientId isPlainTextConvertedToHtml clientArrivalTime lastMessage members botMembers rosterVersion rosterSummary nonFilteredLastMessageTimeUtc __typename localClientId memberProperties memberExpirationTime role explicitlyAdded isModerator isFollowing isReader channelOnlyMember messages lastMessageTimeUtc detailsVersion consumptionHorizonForPinnedMessages consumptionhorizon consumptionHorizonBookmark rclch rclchBookmark lastTimeFavorited favorite ispinned lastimportantimreceivedtime lasturgentimreceivedtime isfollowed followAllRc notifyAllRc collapsed isGeneralChannelFavorite pinnedVersion pinnedOrder hasMessageDraft targetLink teamId threadProperties topic topicThreadTopic spaceThreadTopic spaceThreadVersion description favDefault channelDocsFolderRelativeUrl channelDocsDocumentLibraryId sharepointRootLibrary isdeleted tenantid creator retentionHorizon retentionHorizonV2 sharedInSpaces spaceId gapDetectionEnabled createdat groupId extensionDefinitionContainer lastjoinat lastleaveat chatModalityType threadingMode csav1 teamSmtpAddress spaceType spaceTypes classification dynamicMembership isMaxMemberLimitExceeded isTeamLocked isUnlockMembershipSyncRequired picture pictureETag sharepointSiteUrl notebookId sensitivityLabelDisplayName sensitivityLabelId sensitivityLabelName sensitivityLabelToolTip sensitivityLabelParentDisplayName sensitivityLabelParentName sensitivityLabelParentTooltip sensitivityLabelIsCopyBlocked teamStatus spaceAdminSettings visibility topics threadVersion lastContentMessageTime identityMaskEnabled lastL2MessageIdNotFromSelf parentId clientUpdateTime isMigrated chatSubType conversationId replyChainId latestDeliveryTime parentMessageVersion messageMap dedupeKey parentMessageId searchKey edittime skypeGuid isConversationLastMessage isConversationLastMessageSanitized originalNonLieMessage hasAnnotated messageSearchKey ]).freeze
- BOOLEAN_FIELDS =
Fields that have boolean or numeric values (not strings). When we see these, the next string is NOT their value — it’s the next field.
Set.new(%w[ isSanitized isModerator isFollowing isReader channelOnlyMember explicitlyAdded hasMessageDraft ispinned isfollowed collapsed isGeneralChannelFavorite favDefault isdeleted isMaxMemberLimitExceeded isTeamLocked isUnlockMembershipSyncRequired isPlainTextConvertedToHtml gapDetectionEnabled dynamicMembership identityMaskEnabled sensitivityLabelIsCopyBlocked isMigrated prioritizeimdisplayname prioritizeImDisplayName isConversationLastMessage isConversationLastMessageSanitized hasAnnotated ]).freeze
Class Method Summary collapse
-
.consume_field(strings, idx, str, target, past_last_message) ⇒ Object
Consume one field token from the strings array and return how many positions to advance.
- .consume_known_field(strings, idx, str, target) ⇒ Object
-
.extract_strings(data) ⇒ Object
Extract ordered string array from a binary IDB value.
-
.html_fragment?(str) ⇒ Boolean
Check if a string looks like an HTML fragment (split from content field).
-
.parse_conversation(strings) ⇒ Object
Parse a conversation record into a structured hash.
- .read_length_prefixed_string(data, pos) ⇒ Object
Class Method Details
.consume_field(strings, idx, str, target, past_last_message) ⇒ Object
Consume one field token from the strings array and return how many positions to advance.
132 133 134 135 136 137 138 139 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 132 def self.consume_field(strings, idx, str, target, ) if KNOWN_FIELDS.include?(str) consume_known_field(strings, idx, str, target) else target['content'] = "#{target['content']}#{str}" if target.key?('content') && html_fragment?(str) && ! 1 end end |
.consume_known_field(strings, idx, str, target) ⇒ Object
141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 141 def self.consume_known_field(strings, idx, str, target) return 1 if BOOLEAN_FIELDS.include?(str) return 1 if idx + 1 >= strings.length value = strings[idx + 1] if KNOWN_FIELDS.include?(value) 1 else target[str] = value 2 end end |
.extract_strings(data) ⇒ Object
Extract ordered string array from a binary IDB value.
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 76 def self.extract_strings(data) strings = [] pos = 0 while pos < data.bytesize if data.getbyte(pos) == 0x22 str, new_pos = read_length_prefixed_string(data, pos + 1) if str strings << str pos = new_pos next end end pos += 1 end strings end |
.html_fragment?(str) ⇒ Boolean
Check if a string looks like an HTML fragment (split from content field).
155 156 157 158 159 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 155 def self.html_fragment?(str) str.include?('<') || str.start_with?('http') || str.match?(/\A(width|height|alt|id|itemid|src|href|target|rel|style)/) || str.match?(/\A[a-z]+=/) end |
.parse_conversation(strings) ⇒ Object
Parse a conversation record into a structured hash. Uses known field names to correctly pair keys with values, handling boolean fields (no string value) and fragmented HTML content.
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 98 def self.parse_conversation(strings) fields = {} = {} = false = false i = 0 while i < strings.length str = strings[i] # Detect section boundaries if str == 'lastMessage' = true i += 1 next end if && %w[members botMembers rosterVersion rosterSummary nonFilteredLastMessageTimeUtc __typename localClientId parentId clientUpdateTime].include?(str) = false = true end target = ? : fields advance = consume_field(strings, i, str, target, ) i += advance end { fields: fields, last_message: } end |
.read_length_prefixed_string(data, pos) ⇒ Object
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/legion/extensions/microsoft_teams/local_cache/record_parser.rb', line 161 def self.read_length_prefixed_string(data, pos) return nil if pos >= data.bytesize len = data.getbyte(pos) return nil unless len&.positive? if len < 0x80 str_start = pos + 1 actual_len = len else next_byte = data.getbyte(pos + 1) return nil unless next_byte actual_len = (len & 0x7F) | (next_byte << 7) str_start = pos + 2 if next_byte >= 0x80 && pos + 2 < data.bytesize third = data.getbyte(pos + 2) return nil unless third actual_len = (len & 0x7F) | ((next_byte & 0x7F) << 7) | ((third & 0x7F) << 14) str_start = pos + 3 end end return nil if actual_len <= 0 || actual_len > 1_000_000 return nil if str_start + actual_len > data.bytesize str = data.byteslice(str_start, actual_len) str.force_encoding('UTF-8') return nil unless str.valid_encoding? [str, str_start + actual_len] end |