orbiter
61409788eb
less word hash computations (removing some overhead because of MD5
...
calcs) using the clear word in a normalized form.
11 years ago
reger
5c4a3d1c01
Merge origin/master into jetty
11 years ago
Michael Peter Christen
caa20d63d9
fixed seedlist (hash was missing)
11 years ago
Michael Peter Christen
ccf2f4e43b
refactoring of seed attributes (introduced more constants)
11 years ago
Michael Peter Christen
c927b428d3
fixed json
11 years ago
Michael Peter Christen
64048ff217
fir for XSS
11 years ago
orbiter
b7f1e5af51
added new servlet which generates the same file as the principal peers
...
upload to a bootstrap position
you can call it either with
http://localhost:8090/yacy/seedlist.html
or to generate json (or jsonp) with
http://localhost:8090/yacy/seedlist.json
http://localhost:8090/yacy/seedlist.json?callback=seedlist
11 years ago
orbiter
3e552550d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
c2d720cdaf
purge a lucene cache - possible memory leak fix
11 years ago
reger
f111f30ace
Merge origin/master into jetty
11 years ago
Michael Peter Christen
f4172cbb3d
fix for another XSS bug
11 years ago
orbiter
ff86cb683f
fixed some XSS bugs reported by Marius from http://ctf365.com/
11 years ago
orbiter
19a051bec8
more monitoring for postprocessing and enhanced layout in Crawler
...
monitor page
11 years ago
Michael Peter Christen
fceac8cffd
more monitoring for postprocessing
11 years ago
Michael Peter Christen
9d5895f643
enhanced and fixed postprocessing
11 years ago
Michael Peter Christen
087df05e24
added option to Config_Network_p.html to enable remote search while
...
DHT-Receive is switched off.
11 years ago
Michael Peter Christen
1a4a69c226
set more logger to 'final static'
11 years ago
Michael Peter Christen
69b8d61c47
fix for search requests in GSA interface which contain 'funny'
...
characters (like ':' etc.)
11 years ago
orbiter
4234b0ed6c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
orbiter
74c86a72a0
better default value for crawler user agent
11 years ago
reger
1437c45383
merge rc1/master
11 years ago
Michael Peter Christen
87a956e881
calculating and showing the number of files and the average size of a
...
file in the HTCACHE in ConfigHTCache_p.html
11 years ago
Michael Peter Christen
acc1f8a749
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen
81bb50118e
found and fixed a huge memory leak in solr caching (inside Solr). The
...
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
11 years ago
sixcooler
987f410011
URL-export:add query and fix for cast-class-exception
11 years ago
Michael Peter Christen
ffe8276063
replaced referrer link masking to 'pure' links to the referring page
...
(that was more useful during testing)
11 years ago
reger
b38de92a16
Merge origin/master into jetty
11 years ago
Michael Peter Christen
434e13b46d
in host browser also show the properties of failed documents including
...
referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)
11 years ago
orbiter
1ac504ae51
use html encoding for urls in metadata
11 years ago
reger
f017066197
Merge origin/master into jetty
11 years ago
Michael Peter Christen
25951cee14
- fixed opensearchdescription, this delivered an url with missing
...
'global' option
- added display=2 to compare_yacy to remove the superfluous border
11 years ago
Michael Peter Christen
f1bfe64361
integrated startpage to compare_yacy
11 years ago
Michael Peter Christen
2f57327f20
added boolean load property to CacheResource_p servlet which causes that
...
the servlet loads the page from the web.
11 years ago
Michael Peter Christen
9bb7eab389
hacks to prevent storage of data longer than necessary during search and
...
some speed enhancements. This should reduce the memory usage during
heavy-load search a bit.
11 years ago
Michael Peter Christen
5afa6e3aee
Automatically flush the log cache if a short memory status is reached.
...
For the default of 200 lines this can flush about 10MB.
11 years ago
Michael Peter Christen
030d0776ff
Enhanced crawl start for very, very large crawl lists (i.e. > 5000)
...
which had a problem because of badly used concurrency.
This fix also caused a redesign of the whole host deletion process.
This should fix bug http://bugs.yacy.net/view.php?id=250
11 years ago
Michael Peter Christen
4948c39e48
added concurrency for mass crawl check
11 years ago
Michael Peter Christen
1b4fa2947d
- fixed a problem which ocurred when a document was not recognized with
...
the right content domain (i.e. identifying that it is an image, text
etc.) because it used the file extension and not an existing mime type
assignment.
- fixed the new setting that images shall be loaded for a better image
search.
- both fixes together makes it now possible to crawl
commons.wikimedia.org which makes use of 'funny' document names (i.e.
ending with .jpg while the document is html)
11 years ago
Michael Peter Christen
16e3b357b3
replaced old tag cloud and adopted design a bit
11 years ago
Michael Peter Christen
dc38d35986
added matching in url field in Table_API_p search
11 years ago
Michael Peter Christen
691d7e70fa
added hint to development/commit rss feed
11 years ago
Michael Peter Christen
b81859c751
Show a RSS icon in the right top corner of search results. This replaces
...
the 'API' icon which was the link for the opensearch result which is an
extension of RSS. Since it is more appropriate to visualize a RSS link
with an RSS icon, this API icon was changed here.
11 years ago
Michael Peter Christen
1a09771be8
fixed sitemap crawl start
11 years ago
orbiter
b743e6d79f
- prevent that crawl filter have empty (never-match) content
...
- rewrite the description of the options "Restrict to start domain(s)"
and "Restrict to sub-path(s)" to an explanation, that the restriction
applies to all links in the link list of the option "From Link-List of
URL" if this option is selected
- allow "Restrict to sub-path(s)" if the "From Link-List of URL" is
selected. This is supported in the crawl start.
11 years ago
orbiter
f597fdb602
make it easier to filter properties (case insensitive)
11 years ago
reger
f46c723398
allow to choose used http server, YaCy-Anomic or Jetty
...
- defaults to Jetty (in this branch)
- add server version info & config option -> Admin Console -> Advanced Settings -> Http Networking
11 years ago
reger
1adb4b8741
merge rc1/master
11 years ago
reger
37d24f3318
make use of declared static string ACTION_LOCATION
11 years ago
reger
eea504c117
update Info.plist
...
small DefaultServlet refactoring
11 years ago
reger
a44eede8b8
merge rc1/master
11 years ago
reger
54a0272338
searchpage javascript (latestinfo) causes reset of search statistic after moving to next page
...
- disabled call via setTimeout in yacysearch.html
11 years ago
Michael Peter Christen
91fa99e9bb
added new icon/image for latest commit
11 years ago
Michael Peter Christen
9fac9249bc
- replaced 'edit' link with a clone symbol in Table_API_p since that is
...
what it does: it clones the crawl, it does not change the crawl.
- moved the appearance of this clone link to the type column since this
makes it visible also if the URL column is not visible.
11 years ago
Michael Peter Christen
0f6db6ad5b
Merge remote-tracking branch 'jensbees/crawlexpert-post'
11 years ago
Jens Bertram
3252c1ec39
Merge upstream/master into crawlexpert-post
11 years ago
Michael Peter Christen
90c8577840
enhanced ranking; patches to replace old ranking
11 years ago
bhoerdzn
a3824dfbaa
check URL on inital load, if set
11 years ago
bhoerdzn
52f49d475b
add a hidden field for "crawlingstart" since jQuery omits the submit button value
11 years ago
bhoerdzn
b0c0ec2dec
link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"
11 years ago
bhoerdzn
d64d45361c
use integer types for boolean values
11 years ago
bhoerdzn
eda123d6fd
remove debugging code intercepting post requests
11 years ago
bhoerdzn
5057f27bbd
fix typo in parsing "cachePolicy" parameter
11 years ago
bhoerdzn
98f5c9018d
Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.
11 years ago
bhoerdzn
a6a62986d4
correct state handling for country code restriction
11 years ago
bhoerdzn
4066b85155
correctly set initial state for load filters
11 years ago
bhoerdzn
8c91c3e7cd
set form boolean values to 0 & 1 instead of false & true
11 years ago
bhoerdzn
c27fabc88e
fixed wrong parameter check
11 years ago
bhoerdzn
2214bf5396
Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.
11 years ago
reger
71d2655c02
downgrade to Jetty 8 to assure support of JRE 1.6
...
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
- putting the version specific code in classes starting with Jetty8xxxx
- moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets
- adjust other test cases/classes
11 years ago
orbiter
705b3338ee
list more fields available for search and for ranking boosts
11 years ago
bhoerdzn
405878182f
Use list template for all other option lists. Fixed some template expressions.
11 years ago
bhoerdzn
8e74098cd4
Use list template for "reloadIfOlderNumber".
11 years ago
bhoerdzn
52bad7b908
Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields.
11 years ago
Michael Peter Christen
e56aa4fe93
fixed search navigation
11 years ago
Michael Peter Christen
4fbc4740df
removed warnings
11 years ago
bhoerdzn
45cf553bc3
try to guess default crawling mode, if none set
11 years ago
bhoerdzn
b4f0c822f2
assign strings before checking contents
11 years ago
bhoerdzn
499abe8f91
set default values for string parameters
11 years ago
bhoerdzn
42ea56eaad
made crawStartExpert_p aware of post variables; extended template where needed
11 years ago
reger
c7c706fd9f
merge with rc1/master
12 years ago
Michael Peter Christen
82bfd9e00a
- crawl profiles shall be deleted from active and passive stacks if they
...
are deleted to terminate the crawl because otherwise the crawl will go
on after the load-from-passive stack policy.
- better check if a crawl is terminated using the loader queue.
12 years ago
orbiter
8ac2e8c8c9
added location navigator which causes that the image to the map search
...
is visible whenever a location is available in the search result.
To activate this, the search.navigation property in yacy.conf must be
modified to the new default values.
12 years ago
orbiter
d86d2be5c3
automatically removed Places autotagging if no location library is
...
wanted
12 years ago
reger
5c4ba9b5db
merge rc1 master
12 years ago
reger
70c51775ae
Merge remote-tracking branch 'origin/master' into jetty
12 years ago
orbiter
d2effd21db
fix for npe during location search
12 years ago
Michael Peter Christen
e40671ddb7
better and consistent deletions for error urls
12 years ago
Michael Peter Christen
2602be8d1e
- removed ZURL data structure; removed also the ZURL data file
...
- replaced load failure logging by information which is stored in Solr
- fixed a bug with crawling of feeds: added must-match pattern
application to feed urls to filter out such urls which shall not be in a
wanted domain
- delegatedURLs, which also used ZURLs are now temporary objects in
memory
12 years ago
Michael Peter Christen
61c5e40687
- replaced the properties object in AnchorURL with distinct variables
...
for anchor attributes.
- this caused that large portions of the parser code had to be adopted
as well
- added a counter target_order_i for anchor links in webgraph
computation
12 years ago
Michael Peter Christen
5e31bad711
- the webgraph shall store all links which appear on a web page and not
...
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
12 years ago
reger
13fc86c960
Merge remote-tracking branch 'origin/master' into jetty
12 years ago
reger
127adbf5cf
remove references to 10_http thread (legacy http server)
...
and add needed get/set function to jetty http server wrapper
12 years ago
Michael Peter Christen
3e22d05290
added option for daterange properties in GSA interface to use an left-
...
or right-open date range;
i.e. using daterange=..2013-09-09 or daterange=2013-09-02.. additional
to daterange=2013-09-02..2013-09-09
12 years ago
reger
36b7159282
- remove double initialization of jetty
...
- refactor some var assignments
12 years ago
reger
63ed04260a
Merge remote-tracking branch 'origin/master' into jetty
12 years ago
Michael Peter Christen
35ab2cef7b
added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in
...
html meta fields to get a correct (or: better) date timestamp. The
http:last-modified mostly does not work because it is set to the current
date from most CMS.
12 years ago
reger
aafef72a8a
merged current rc1/master into jetty branch to allow further development with latest version
...
ServerSideIncludes and servlet return values need further work (for working jetty integration)
- TODO: added nasty quickfix to allow SSI - needs further work
- TODO: YaCy servlet return values/parameters are not handled
12 years ago
Michael Peter Christen
dbef8ccfcb
forced deletion of ZURL entries for a specific host for each host that
...
appears in the crawl url list
12 years ago
Michael Peter Christen
e137ff4171
refactoring (im preparation for new removeHost method)
12 years ago
Michael Peter Christen
9e12fdff23
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
049c3b3f2e
added an option to exclude image search results from text search. This
...
is on by default.
12 years ago
Michael Peter Christen
5d71a4c8bc
fix for dc:description field
12 years ago
reger
392174de8c
remove all_words, all_strings lists from QueryGoal
...
- only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only
12 years ago
Michael Peter Christen
cb85b22725
redesign of the image search process (with much better results,
...
unfortunately the index schema has changed and p2p image search will not
be muchmuch better until many people update)
12 years ago
Michael Peter Christen
6184fd9d9a
fix for solr/gsa result logging
12 years ago
reger
29967102a2
optimized QueryGoal (reducing mem and computation by removing all_hashes)
...
- all_hashes used for text highlighting and word distance computation which can be done with include_hashes only
12 years ago
orbiter
f106345eef
link strings should not be tokenized
12 years ago
orbiter
5b14bdfffd
npe fix
12 years ago
orbiter
1ca4b9612c
added special handling of the BinaryResponseWriter in the solr interface
...
which makes it possible to use solrj with the javabin format which is
much better (compressed, no xml overhead, java object streams) and
faster. Furthermore, this enables the 'shards' option in the solr
interface which connects one solr (YaCy) to another solr (YaCy) ad-hoc.
12 years ago
Michael Peter Christen
a88a62f7aa
added a feature to set a collection for a crawl result based on a
...
regular expression on th url: the collection attribut for a crawl start
may be now either a token or a list of tokens, seperated by ',' where a
token is either a string or a pair <string,pattern> where the string is
separated to the pattern with a ':' and the string is assigned to the
document as collection only if the pattern matches with the url.
12 years ago
Michael Peter Christen
765943a4b7
Redesign of crawler identification and robots steering. A non-p2p user
...
in intranets and the internet can now choose to appear as Googlebot.
This is an essential necessity to be able to compete in the field of
commercial search appliances, since most web pages are these days
optimized only for Google and no other search platform any more. All
commercial search engine providers have a built-in fake-Google User
Agent to be able to get the same search index as Google can do. Without
the resistance against obeying to robots.txt in this case, no
competition is possible any more. YaCy will always obey the robots.txt
when it is used for crawling the web in a peer-to-peer network, but to
establish a Search Appliance (like a Google Search Appliance, GSA) it is
necessary to be able to behave exactly like a Google crawler.
With this change, you will be able to switch the user agent when portal
or intranet mode is selected on per-crawl-start basis. Every crawl start
can have a different user agent.
12 years ago
Michael Peter Christen
47b1c81d08
- refactoring
...
- generalized writing of url attributes to solr documents
- added more url attributes to error documents
12 years ago
Michael Peter Christen
e6b423c4d9
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
reger
94bec24d14
add back menu to Surftips page (currently no menu is displayed)
12 years ago
Michael Peter Christen
1f299b0d42
removed link.gif as link button because this image is now shown
...
automatically for expernal links
12 years ago
Michael Peter Christen
48ddd50a6c
html fix
12 years ago
reger
96ae332427
revert del _blank (last commit) in template
12 years ago
reger
43348a98a9
add some href target=_blank to ext. links with external icon
12 years ago
reger
82d81a57bd
info msg if no embedded Solr http://bugs.yacy.net/view.php?id=279
12 years ago
reger
02fe8b43ba
Field Re-Indexing: display list of fields in reindex queue
...
change servlet to display statistic on 1st click (instead after refresh)
12 years ago
sixcooler
7f501b7c38
clear some caches before reporting low Memory
...
do not break lines in Network-table-rows
12 years ago
reger
070bf85b33
css fix for IE10 showing border on all img within <a /> tag since introduction of external link icon (commit 112836dcc9
)
12 years ago
sixcooler
8a96140f92
fix / workaround for
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750
+ Seed.hash should be final
12 years ago
Michael Peter Christen
2674d28ef4
protection against self-ping (may be cause by fraud attempts)
12 years ago
orbiter
f3d001c7ab
more space in the about section
12 years ago
Michael Peter Christen
e879b97b0a
added line to enhance debugging
12 years ago
Michael Peter Christen
76afcccaaf
fix for default boolean post values: the default value MUST NOT be TRUE,
...
because it's normal that a boolean value is missing in the post argument
if a checkbox is not selected.
Added also some style enhancements to IndexFederated, removed the Solr
attachment manual and replaced it with a link to the wiki which explains
this in more detail.
12 years ago
orbiter
252c525709
fixed feed api servlet and and enhanced RSSReader class
12 years ago
Marc Nause
112836dcc9
Improved external links.
...
*) image links will not be marked (if they have class "yacylogo" or
"forceNoExternalIcon")
*) external links in menu on left (and "fork me"-banner) will open in
new tab/window now
12 years ago
Marc Nause
d64a094f0e
External links in HTML interface are marked as external with small icon.
...
*) added new icon
*) added CSS rules to mark all external links except search results
(target="_self")
12 years ago
Michael Peter Christen
58fe986cca
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
cf12835f20
replaced the single-text description solr field with a multi-value
...
description_txt text field
12 years ago
sixcooler
7d53ac86a3
fix for Blacklist (-Administration)
12 years ago
orbiter
f425b2c61c
re-try to fetch url after a soft commit
12 years ago
orbiter
bf0ad04e1b
apply load limitation also to dht-in
12 years ago
Roland Haeder
b58ca8622d
Some cleanups:
...
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
12 years ago
Roland Haeder
e2ee412160
Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS'
...
Conflicts:
htroot/api/blacklists_p.java
12 years ago
Roland Haeder
ae19401af0
Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER
12 years ago
Roland Haeder
59225487ea
Fix for blacklist export, also applied the filename filter here
12 years ago
Roland Haeder
952fc0e7bd
Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block
12 years ago
Roland Haeder
060fec1577
Reuse Blacklist.BLACKLIST_FILENAME_FILTER
12 years ago
Roland Haeder
29049c71f5
Possible fix for ticket http://bugs.yacy.net/view.php?id=270 , the filter for only including *.black must be applied
12 years ago
Michael Peter Christen
4c242f9af9
always use a default value for boolean options to have transparency for
...
the outcome if the attribute is missing in servlets
12 years ago
orbiter
9c681cc00d
added segment sizes, postprocessing status and cpu load to crawler
...
monitor
12 years ago
orbiter
86b514cf46
added load info to status_p.xml
12 years ago
orbiter
056b42f5aa
- added information about segment count to status_p.xml
...
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
12 years ago
orbiter
6fb2811e68
fixes for problems with remote solr and non-activated webgraph index
12 years ago
orbiter
e24016e30a
added the property federated.service.solr.indexing.timeout to yacy.init
...
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
12 years ago
orbiter
232100301c
removed double-ocurring value assignments
12 years ago
Roland Haeder
aaedc0405d
Fixes and avoid of catching bad exceptions (some):
...
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call
Conflicts:
source/net/yacy/repository/LoaderDispatcher.java
12 years ago
Roland Haeder
841a28ae76
Added 'final' for all exception blocks as this helps the Java compiler
...
to optimize memory usage
Conflicts:
source/net/yacy/search/Switchboard.java
12 years ago
Felix Ableitner
376f9cd9d0
Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure
12 years ago
Michael Peter Christen
89c0aa0e74
added collection_sxt to error documents
12 years ago
Michael Peter Christen
0df5195cb0
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
1fd006cc56
fixes using the embedded connector
12 years ago
orbiter
aba7cc5de7
added cpu load information to status page
12 years ago
Roland Haeder
59b4fdd5ad
Merge remote-tracking branch 'upstream/master'
12 years ago
orbiter
5493389576
stealth mode shall only be available for authorized users, because
...
unauthorized users can otherwise be monitored by authorized users
12 years ago
Roland Haeder
ebbb3bc5c1
Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet
12 years ago
Michael Peter Christen
bcc623a843
refactoring of load_delay: this is a matter of client identification
12 years ago
orbiter
2be456e7fb
added a postprocessing field into api/status_p.xml to show if the
...
postprocessing task is running at that time (status: busy) or not
(status:idle)
12 years ago
orbiter
575f913154
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
orbiter
c4efb612e2
added list of crawls to status_p.xml
12 years ago
Lotus
bb6caa346c
Do not allow automatic update in case YaCy is installed to the Program
...
Files folder on Windows. There are no permissions to write that folder
and update would fail.
12 years ago
orbiter
dac88561ae
minimum access time has a tight connection to ClientIdentification,
...
therefore it is defined there.
12 years ago
Felix Ableitner
a020697d64
Fixed problems with blacklist entry insertion.
12 years ago
sixcooler
bff8c753c6
re-insert this file - was deleted by mistake
...
+ correct an other case-typo
12 years ago
Michael Peter Christen
5878c1d599
- refactoring of log to ConcurrentLog:
...
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
12 years ago
orbiter
c79f687110
enhanced the network scanner: find more hosts automatically by removal
...
of common subdomains before application of protocol-specific prefix
12 years ago
orbiter
b4677d1cad
fix for bug #252
...
the naming of the servlet was wrong, the bug may not be present on
systems where upper/lowercase matching is lazy (windows)
12 years ago
Michael Peter Christen
07261fe274
Merge remote-tracking branch 'nutomics/blacklist_structure'
12 years ago
Michael Peter Christen
dea71851d2
- better concurrency for network scanner
...
- network scanner can now start from the list of all hosts in the search
index
12 years ago
orbiter
9f0cc9b401
enhanced network scanner
...
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
12 years ago
orbiter
f8c28efd66
fix for rssTerminal coloring
12 years ago
Felix Ableitner
44f8fcf62e
Changed class structure of Blacklist.
12 years ago
Michael Peter Christen
3054a6d4b9
added a patch from Sebastian M.B., submitted by email for coloring of
...
rss terminal
12 years ago
Michael Peter Christen
78af998f8f
Merge commit 'fd90fcc4e08f80acbfd1c9a7ec62ce04cd309594'
12 years ago
Michael Peter Christen
57ffdfad4c
added a crawl option to obey html-meta-robots-noindex. This is on by
...
default.
12 years ago
Felix Ableitner
fd90fcc4e0
Fixes #196 .
12 years ago
Michael Peter Christen
f1c5338210
prepartion for greedy crawl profiles and refactoring
12 years ago
Michael Peter Christen
e6f361f474
adding the canonical tag to crawl queues
12 years ago
Michael Peter Christen
203921006a
redesign of citation index storage
12 years ago
Michael Peter Christen
e92b9275ce
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen
56cdcfa2fa
fixed greedy learning mode - global is not a search attribute in
...
searchitems
12 years ago
Michael Peter Christen
32aa1d4569
removed unused option for queries
12 years ago
Michael Peter Christen
0c5bed7e2c
added configuration option for greedy learning function to ConfigPortal
...
servlet
12 years ago
sixcooler
5d1f619f07
possible helpful closing of solr-requests
12 years ago
Michael Peter Christen
9d291764d1
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
sixcooler
e5abccdfe4
added optimize-option
12 years ago
Michael Peter Christen
8ea6ddf636
removed attributes from ConfigPortal.html which are redundant to
...
ConfigSearchPage_p.html
12 years ago
Michael Peter Christen
64140f35cd
fix for solr requests if no query part is given (prevent npe)
12 years ago
Michael Peter Christen
23fb458963
- fix to gsa searchresult answer in case that no query part is given
...
- fix to gsa default number of results (is 'num')
12 years ago
Michael Peter Christen
660a196989
refactoring
12 years ago
Michael Peter Christen
54024958ac
added url_file_name_s in qeury for live-search of urls
12 years ago
Michael Peter Christen
16d1d744fa
added url_file_name_s in default collection schema for the file name
...
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.
The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
12 years ago
Michael Peter Christen
f542cf7d9c
fix for daterange: the to-date is inclusive
12 years ago
Michael Peter Christen
c36720d45f
added daterange option to gsa api
12 years ago
Michael Peter Christen
4e3007f4a0
typo
12 years ago
Michael Peter Christen
2cb6b6bc21
added target="_blank" to shutdown links
12 years ago
orbiter
c8e94ad7c7
fix for citation search in case that the citation is very fresh
12 years ago