Class: SitemapGenerator::LinkSet
- Inherits:
-
Object
- Object
- SitemapGenerator::LinkSet
- Includes:
- LocationHelpers
- Defined in:
- lib/sitemap_generator/link_set.rb
Defined Under Namespace
Modules: LocationHelpers
Constant Summary collapse
- @@requires_finalization_opts =
%i[filename sitemaps_path sitemaps_host namer]
- @@new_location_opts =
%i[filename sitemaps_path namer]
Instance Attribute Summary collapse
-
#adapter ⇒ Object
Returns the value of attribute adapter.
-
#create_index ⇒ Object
readonly
Returns the value of attribute create_index.
-
#default_host ⇒ Object
readonly
Returns the value of attribute default_host.
-
#filename ⇒ Object
readonly
Returns the value of attribute filename.
-
#include_index ⇒ Object
Returns the value of attribute include_index.
-
#include_root ⇒ Object
Returns the value of attribute include_root.
-
#max_sitemap_links ⇒ Object
Returns the value of attribute max_sitemap_links.
-
#sitemaps_path ⇒ Object
readonly
Returns the value of attribute sitemaps_path.
-
#verbose ⇒ Object
Set verbose on the instance or by setting ENV to true or false.
-
#yield_sitemap ⇒ Object
Returns the value of attribute yield_sitemap.
Instance Method Summary collapse
-
#add(link, options = {}) ⇒ Object
Add a link to a Sitemap.
-
#add_to_index(link, options = {}) ⇒ Object
Add a link to the Sitemap Index.
-
#create(opts = {}, &block) ⇒ Object
Create a new sitemap index and sitemap files.
-
#finalize! ⇒ Object
All done.
-
#group(opts = {}, &block) ⇒ Object
Create a new group of sitemap files.
-
#include_index? ⇒ Boolean
Return a boolean indicating hether to add a link to the sitemap index file to the current sitemap.
-
#include_root? ⇒ Boolean
Return a boolean indicating whether to automatically add the root url i.e.
-
#initialize(options = {}) ⇒ LinkSet
constructor
Constructor.
-
#link_count ⇒ Object
Return a count of the total number of links in all sitemaps.
-
#ping_search_engines(*args) ⇒ Object
Ping search engines to notify them of updated sitemaps.
-
#sitemap ⇒ Object
Lazy-initialize a sitemap instance and return it.
-
#sitemap_index ⇒ Object
Lazy-initialize a sitemap index instance and return it.
-
#sitemap_index_url ⇒ Object
Return the full url to the sitemap index file.
-
#sitemaps_host ⇒ Object
Return the host to use in links to the sitemap files.
-
#yield_sitemap? ⇒ Boolean
Return a boolean indicating whether or not to yield the sitemap.
Methods included from LocationHelpers
#compress, #compress=, #create_index=, #default_host=, #filename=, #namer, #namer=, #public_path, #public_path=, #search_engines, #search_engines=, #sitemap_index_location, #sitemap_location, #sitemaps_host=, #sitemaps_path=
Constructor Details
#initialize(options = {}) ⇒ LinkSet
Constructor
Options:
-
:adapter - instance of a class with a write method which takes a SitemapGenerator::Location and raw XML data and persists it. The default adapter is a SitemapGenerator::FileAdapter which simply writes files to the filesystem. You can use a SitemapGenerator::WaveAdapter for uploading sitemaps to remote servers - useful for read-only hosts such as Heroku. Or you can provide an instance of your own class to provide custom behavior.
-
:default_host - host including protocol to use in all sitemap links e.g. http://en.google.ca
-
:public_path - Full or relative path to the directory to write sitemaps into. Defaults to the public/ directory in your application root directory or the current working directory.
-
:sitemaps_host - String. Host including protocol to use when generating a link to a sitemap file i.e. the hostname of the server where the sitemaps are hosted. The value will differ from the hostname in your sitemap links. For example:
'http://amazon.aws.com/'.Note that
include_indexis automatically turned off when thesitemaps_hostdoes not matchdefault_host. Because the link to the sitemap index file that would otherwise be added would point to a different host than the rest of the links in the sitemap. Something that the sitemap rules forbid. -
:sitemaps_path - path fragment within public to write sitemaps to e.g. 'en/'. Sitemaps are written to public_path + sitemaps_path
-
:filename - symbol giving the base name for files (default :sitemap). The names are generated like "##filename.xml.gz", "##filename1.xml.gz", "##filename2.xml.gz" with the first file being the index if you have more than one sitemap file.
-
:include_index - Boolean. Whether to add a link pointing to the sitemap index to the current sitemap. This points search engines to your Sitemap Index to include it in the indexing of your site. Default is
false. Turned off whensitemaps_hostis set or within agroup()block. Turned off because Google can complain about nested indexing and because if a robot is already reading your sitemap, they probably know about the index. -
:include_root - Boolean. Whether to add the root url i.e. '/' to the current sitemap. Default is
true. Turned off within agroup()block. -
:search_engines - Hash. A hash of search engine names mapped to ping URLs. See ping_search_engines.
-
:verbose - If
true, output a summary line for each sitemap and sitemap index that is created. Default isfalse. -
:create_index - Supported values:
true,false,:auto. Default::auto. Whether to create a sitemap index file. Iftruean index file is always created, regardless of how many links are in your sitemap. Iffalsean index file is never created. If:autoan index file is created only if your sitemap has more than one sitemap file. -
:namer - A SitemapGenerator::SimpleNamer instance for generating the sitemap and index file names. See :filename if you don't need to do anything fancy, and can accept the default naming conventions.
-
:compress - Specifies which files to compress with gzip. Default is
true. Accepted values:* `true` - Boolean; compress all files. * `false` - Boolean; write out only uncompressed files. * `:all_but_first` - Symbol; leave the first file uncompressed but compress any remaining files.The compression setting applies to groups too. So :all_but_first will have the same effect (the first file in the group will not be compressed, the rest will). So if you require different behaviour for your groups, pass in a
:compressoption e.g. group(:compress => false) { add('/link') } -
:max_sitemap_links - The maximum number of links to put in each sitemap. Default is
SitemapGenerator::MAX_SITEMAPS_LINKS, or 50,000.
Note: When adding a new option be sure to include it in options_for_group() if
the option should be inherited by groups.
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/sitemap_generator/link_set.rb', line 122 def initialize( = {}) @default_host, @sitemaps_host, @yield_sitemap, @sitemaps_path, @adapter, @verbose, @protect_index, @sitemap_index, @added_default_links, @created_group, @sitemap = nil = SitemapGenerator::Utilities.reverse_merge(, include_root: true, include_index: false, filename: :sitemap, search_engines: {}, create_index: :auto, compress: true, max_sitemap_links: SitemapGenerator::MAX_SITEMAP_LINKS ) .each_pair { |k, v| instance_variable_set(:"@#{k}", v) } # If an index is passed in, protect it from modification. # Sitemaps can be added to the index but nothing else can be changed. if [:sitemap_index] @protect_index = true end end |
Instance Attribute Details
#adapter ⇒ Object
Returns the value of attribute adapter.
13 14 15 |
# File 'lib/sitemap_generator/link_set.rb', line 13 def adapter @adapter end |
#create_index ⇒ Object (readonly)
Returns the value of attribute create_index.
12 13 14 |
# File 'lib/sitemap_generator/link_set.rb', line 12 def create_index @create_index end |
#default_host ⇒ Object (readonly)
Returns the value of attribute default_host.
12 13 14 |
# File 'lib/sitemap_generator/link_set.rb', line 12 def default_host @default_host end |
#filename ⇒ Object (readonly)
Returns the value of attribute filename.
12 13 14 |
# File 'lib/sitemap_generator/link_set.rb', line 12 def filename @filename end |
#include_index ⇒ Object
Returns the value of attribute include_index.
13 14 15 |
# File 'lib/sitemap_generator/link_set.rb', line 13 def include_index @include_index end |
#include_root ⇒ Object
Returns the value of attribute include_root.
13 14 15 |
# File 'lib/sitemap_generator/link_set.rb', line 13 def include_root @include_root end |
#max_sitemap_links ⇒ Object
Returns the value of attribute max_sitemap_links.
13 14 15 |
# File 'lib/sitemap_generator/link_set.rb', line 13 def max_sitemap_links @max_sitemap_links end |
#sitemaps_path ⇒ Object (readonly)
Returns the value of attribute sitemaps_path.
12 13 14 |
# File 'lib/sitemap_generator/link_set.rb', line 12 def sitemaps_path @sitemaps_path end |
#verbose ⇒ Object
Set verbose on the instance or by setting ENV to true or false. By default verbose is true. When running rake tasks, pass the -s option to rake to turn verbose off.
368 369 370 371 372 373 |
# File 'lib/sitemap_generator/link_set.rb', line 368 def verbose if @verbose.nil? @verbose = SitemapGenerator.verbose.nil? ? true : SitemapGenerator.verbose end @verbose end |
#yield_sitemap ⇒ Object
Returns the value of attribute yield_sitemap.
13 14 15 |
# File 'lib/sitemap_generator/link_set.rb', line 13 def yield_sitemap @yield_sitemap end |
Instance Method Details
#add(link, options = {}) ⇒ Object
Add a link to a Sitemap. If a new Sitemap is required, one will be created for you.
link - string link e.g. '/merchant', '/article/1' or whatever. options - see README. host - host for the link, defaults to your default_host.
149 150 151 152 153 154 155 156 157 158 |
# File 'lib/sitemap_generator/link_set.rb', line 149 def add(link, = {}) add_default_links unless @added_default_links sitemap.add(link, SitemapGenerator::Utilities.reverse_merge(, host: @default_host)) rescue SitemapGenerator::SitemapFullError finalize_sitemap! retry rescue SitemapGenerator::SitemapFinalizedError @sitemap = sitemap.new retry end |
#add_to_index(link, options = {}) ⇒ Object
Add a link to the Sitemap Index.
- link - A string link e.g. '/sitemaps/sitemap1.xml.gz' or a SitemapFile instance.
- options - A hash of options including
:lastmod, ':priority, ':changefreqand:host
The :host option defaults to the value of sitemaps_host which is the host where your
sitemaps reside. If no sitemaps_host is set, the default_host is used.
166 167 168 |
# File 'lib/sitemap_generator/link_set.rb', line 166 def add_to_index(link, = {}) sitemap_index.add(link, SitemapGenerator::Utilities.reverse_merge(, host: sitemaps_host)) end |
#create(opts = {}, &block) ⇒ Object
Create a new sitemap index and sitemap files. Pass a block with calls to the following methods:
add- Add a link to the current sitemapgroup- Start a new group of sitemaps
Options
Any option supported by new can be passed. The options will be
set on the instance using the accessor methods. This is provided mostly
as a convenience.
In addition to the options to new, the following options are supported:
- :finalize - The sitemaps are written as they get full and at the end
of the block. Pass
falseas the value to prevent the sitemap or sitemap index from being finalized. Default istrue.
If you are calling create more than once in your sitemap configuration file,
make sure that you set a different sitemaps_path or filename for each call otherwise
the sitemaps may be overwritten.
35 36 37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/sitemap_generator/link_set.rb', line 35 def create(opts = {}, &block) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength reset! (opts) if verbose start_time = Time.now puts "In '#{sitemap_index.location.public_path}':" end interpreter.eval(yield_sitemap: yield_sitemap?, &block) finalize! if block_given? end_time = Time.now if verbose output(sitemap_index.stats_summary(time_taken: end_time - start_time)) if verbose self end |
#finalize! ⇒ Object
All done. Write out remaining files.
342 343 344 345 |
# File 'lib/sitemap_generator/link_set.rb', line 342 def finalize! finalize_sitemap! finalize_sitemap_index! end |
#group(opts = {}, &block) ⇒ Object
Create a new group of sitemap files.
Returns a new LinkSet instance with the options passed in set on it. All groups share the sitemap index, which is not affected by any of the options passed here.
Options
Any of the options to LinkSet.new. Except for :public_path which is shared by all groups.
The current options are inherited by the new group of sitemaps. The only exceptions
being :include_index and :include_root which default to false.
Pass a block to add links to the new LinkSet. If you pass a block the sitemaps will be finalized when the block returns.
If you are not changing any of the location settings like filename, sitemaps_path, sitemaps_host or namer, links you add within the group will be added to the current sitemap. Otherwise the current sitemap file is finalized and a new sitemap file started, using the options you specified.
Most commonly, you'll want to give the group's files a distinct name using the filename option.
Options like :default_host can be used and it will only affect the links
within the group. Links added outside of the group will revert to the previous
default_host.
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
# File 'lib/sitemap_generator/link_set.rb', line 197 def group(opts = {}, &block) # rubocop:disable Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/MethodLength,Metrics/PerceivedComplexity @created_group = true original_opts = opts.dup if (@@requires_finalization_opts & original_opts.keys).empty? # If no new filename or path is specified reuse the default sitemap file. # A new location object will be set on it for the duration of the group. original_opts[:sitemap] = sitemap elsif original_opts.key?(:sitemaps_host) && (@@new_location_opts & original_opts.keys).empty? # If no location options are provided we are creating the next sitemap in the # current series, so finalize and inherit the namer. finalize_sitemap! original_opts[:namer] = namer end opts = (original_opts) @group = SitemapGenerator::LinkSet.new(opts) if opts.key?(:sitemap) # If the group is sharing the current sitemap, set the # new location options on the location object. @original_location = @sitemap.location.dup @sitemap.location.merge!(@group.sitemap_location) if block_given? @group.interpreter.eval(yield_sitemap: @yield_sitemap || SitemapGenerator.yield_sitemap?, &block) @group.finalize_sitemap! @sitemap.location.merge!(@original_location) end else # Handle the case where a user only has one group, and it's being written # to a new sitemap file. They would expect there to be an index. So force # index creation. If there is more than one group, we would have an index anyways, # so it's safe to force index creation in these other cases. In the case that # the groups reuse the current sitemap, don't force index creation because # we want the default behaviour i.e. only an index if more than one sitemap file. # Don't force index creation if the user specifically requested no index. This # unfortunately means that if they set it to :auto they may be getting an index # when they didn't expect one, but you shouldn't be using groups if you only have # one sitemap and don't want an index. Rather, just add the links directly in the create() # block. @group.send(:create_index=, true, true) if @group.create_index != false if block_given? @group.interpreter.eval(yield_sitemap: @yield_sitemap || SitemapGenerator.yield_sitemap?, &block) @group.finalize_sitemap! end end @group end |
#include_index? ⇒ Boolean
Return a boolean indicating hether to add a link to the sitemap index file
to the current sitemap. This points search engines to your Sitemap Index so
they include it in the indexing of your site, but is not strictly neccessary.
Default is true. Turned off when sitemaps_host is set or within a group() block.
351 352 353 354 355 356 357 |
# File 'lib/sitemap_generator/link_set.rb', line 351 def include_index? if default_host && sitemaps_host && sitemaps_host != default_host false else @include_index end end |
#include_root? ⇒ Boolean
Return a boolean indicating whether to automatically add the root url i.e. '/' to the
current sitemap. Default is true. Turned off within a group() block.
361 362 363 |
# File 'lib/sitemap_generator/link_set.rb', line 361 def include_root? !!@include_root end |
#link_count ⇒ Object
Return a count of the total number of links in all sitemaps
311 312 313 |
# File 'lib/sitemap_generator/link_set.rb', line 311 def link_count sitemap_index.total_link_count end |
#ping_search_engines(*args) ⇒ Object
Ping search engines to notify them of updated sitemaps.
Search engines are already notified for you if you run rake sitemap:refresh.
If you want to ping search engines separately to your sitemap generation, run
rake sitemap:refresh:no_ping and then run a rake task or script
which calls this method as in the example below.
Arguments
- sitemap_index_url - The full URL to your sitemap index file.
If not provided the location is based on the
hostyou have set and any other options like yoursitemaps_path. The URL will be CGI escaped for you when included as part of the search engine ping URL.
Options
A hash of one or more search engines to ping in addition to the
default search engines. The key is the name of the search engine
as a string or symbol and the value is the full URL to ping with
a string interpolation that will be replaced by the CGI escaped sitemap
index URL. If you have any literal percent characters in your URL you
need to escape them with %%. For example if your sitemap index URL
is http://example.com/sitemap.xml.gz and your
ping url is http://example.com/100%%/ping?url=%s
then the final URL that is pinged will be http://example.com/100%/ping?url=http%3A%2F%2Fexample.com%2Fsitemap.xml.gz
Examples
Both of these examples will ping the default search engines in addition to http://superengine.com/ping?url=http%3A%2F%2Fexample.com%2Fsitemap.xml.gz
SitemapGenerator::Sitemap.host('http://example.com/')
SitemapGenerator::Sitemap.ping_search_engines(:super_engine => 'http://superengine.com/ping?url=%s')
Is equivalent to:
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap.xml.gz', :super_engine => 'http://superengine.com/ping?url=%s')
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
# File 'lib/sitemap_generator/link_set.rb', line 281 def ping_search_engines(*args) # rubocop:disable Metrics/AbcSize,Metrics/MethodLength require 'open-uri' require 'timeout' require 'uri' engines = args.last.is_a?(Hash) ? args.pop : {} unescaped_url = args.shift || sitemap_index_url index_url = URI.encode_www_form_component(unescaped_url) output("\n") output("Pinging with URL '#{unescaped_url}':") search_engines.merge(engines).each do |engine, link| link %= index_url name = Utilities.titleize(engine.to_s) begin Timeout.timeout(10) do if URI.respond_to?(:open) # Available since Ruby 2.5 URI.open(link) else open(link) # using Kernel#open became deprecated since Ruby 2.7. See https://bugs.ruby-lang.org/issues/15893 end end output(" Successful ping of #{name}") rescue Timeout::Error, StandardError => e output("Ping failed for #{name}: #{e.inspect} (URL #{link})") end end end |
#sitemap ⇒ Object
Lazy-initialize a sitemap instance and return it.
322 323 324 |
# File 'lib/sitemap_generator/link_set.rb', line 322 def sitemap @sitemap ||= SitemapGenerator::Builder::SitemapFile.new(sitemap_location) end |
#sitemap_index ⇒ Object
Lazy-initialize a sitemap index instance and return it.
327 328 329 |
# File 'lib/sitemap_generator/link_set.rb', line 327 def sitemap_index @sitemap_index ||= SitemapGenerator::Builder::SitemapIndexFile.new(sitemap_index_location) end |
#sitemap_index_url ⇒ Object
Return the full url to the sitemap index file. When create_index is false
the first sitemap is technically the index, so this will be its URL. It's important
to use this method to get the index url because sitemap_index.location.url will
not be correct in such situations.
KJV: This is somewhat confusing.
337 338 339 |
# File 'lib/sitemap_generator/link_set.rb', line 337 def sitemap_index_url sitemap_index.index_url end |
#sitemaps_host ⇒ Object
Return the host to use in links to the sitemap files. This defaults to your
default_host.
317 318 319 |
# File 'lib/sitemap_generator/link_set.rb', line 317 def sitemaps_host @sitemaps_host || @default_host end |
#yield_sitemap? ⇒ Boolean
Return a boolean indicating whether or not to yield the sitemap.
376 377 378 |
# File 'lib/sitemap_generator/link_set.rb', line 376 def yield_sitemap? @yield_sitemap.nil? ? SitemapGenerator.yield_sitemap? : !!@yield_sitemap end |